MU in POCT -

Uncertainty of Measurement (MU) in Point-of-Care Testing: A Practical Guide

1. What MU really is (and isn’t) in POCT

Defining MU: Measurement uncertainty (MU) is “the doubt about the true value of the measurand that remains after making a measurement”. In plain terms, any quantitative result we report (glucose, blood gas, etc.) is best understood not as a single exact value, but as a value ± a margin reflecting the inherent uncertainty. This is not just the instrument’s imprecision (random error); MU is a comprehensive parameter that includes multiple error sources and tells us the range within which the true value is likely to lie (often expressed as a standard deviation or confidence interval of that range).

MU vs. imprecision vs. total error: It’s important to distinguish MU from imprecision alone and from “Total Error” (TE). Imprecision (often given as a coefficient of variation, CV) captures the random scatter in results when repeating the same test, but MU encompasses more than just that. MU typically combines the random error (imprecision) and any systematic error (bias) in the measurement process into one uncertainty figure. By contrast, TE or allowable error is usually a single-number performance goal (e.g. a glucose must be within ±10% of the true value) and often accounts for bias and imprecision in a simplified way. MU provides a probabilistic range (with a stated coverage probability, like 95%) rather than a hard pass/fail limit. Both concepts are related – for example, if MU is large relative to clinical allowable error, the method may not be fit for purpose – but MU is about quantifying confidence in results, whereas TE is about meeting predefined error criteria.

MU is often underestimated in POCT: In many point-of-care settings, formal MU analysis has historically been minimal. Laboratories routinely calculate imprecision from quality control, but comprehensive MU determination “outside the chemistry laboratory” (e.g. in hematology, immunology, and especially POCT contexts) has not been common. POCT devices are used by a wide variety of operators in less controlled environments, with intermittent QC and external quality assessment (EQA), so the true uncertainty of results can be higher than one might assume from the analyzer’s specs. Often, only the analyzer’s repeatability is considered, underestimating the effects of operator technique, sample quality, and other pre-analytical factors. As a result, POCT programs may be unaware of the full MU impacting their patient results.

ISO 15189:2022 expectations: The latest ISO 15189 standard makes MU a clear requirement for medical laboratories (including POCT services). Labs must plan for MU and determine & maintain an uncertainty estimate for each measurement procedure. You need to show that each device or method can achieve the required measurement accuracy or MU for its intended clinical use. The lab should define performance requirements for MU (often based on clinical needs or biological variation) and regularly review whether the measurement process meets those requirements. Importantly, ISO 15189:2022 explicitly states that the laboratory shall make MU estimates available to users upon request.

MU in POCT vs. the lab: A big challenge is that POCT often involves low testing volumes or qualitative/semi-quantitative tests where traditional MU evaluation (requiring lots of data) is difficult. ISO 15189 allows that if it’s not feasible or relevant to rigorously quantify MU for a particular test, you need a documented rationale for excluding it. For example, if a POCT is used very infrequently or yields only a positive/negative result, you might instead verify its accuracy by comparison to a lab method or reference, and perform a risk assessment showing that the test is still under control. In practice, POCT services should attempt to estimate MU for all quantitative tests, but if an estimate is not possible, they must assure result quality in another way (e.g. periodic method comparison studies) and have this approved with a sign-off explaining why formal MU calculation isn’t done.

Top-down approach encouraged: Unlike the classical engineering approach of budgeting every error source (“bottom-up” per the GUM/VIM methodology), medical laboratories are guided to use a pragmatic “top-down” approach. ISO/TS 20914:2019 (a technical guideline for labs on MU) emphasizes using routine data – particularly long-term internal QC and EQA – to evaluate uncertainty. This approach assumes that the routine QC results over time inherently capture most sources of variation affecting patient samples. It’s simpler and more feasible than exhaustively analyzing each potential contributor to uncertainty. Studies have found that top-down estimates (using QC/EQA data) are essentially equivalent to the more labor-intensive bottom-up estimates in practice, giving labs confidence that a well-designed QC/EQA-based MU can stand up to scrutiny. The ISO guide and the 2022 standard both underscore that major contributors to uncertainty should be accounted for, but you don’t need to include esoteric or tiny effects that don’t materially impact fitness for purpose.

2. Two main MU pathways in POCT

When implementing MU in a POCT program, there are two primary routes you might take (and they aren’t mutually exclusive). One is to leverage your device vendor’s or middleware’s built-in uncertainty reporting (if available). The other is to derive MU yourself from your local quality control and EQA data. We’ll call these (A) the middleware/vendor approach and (B) the DIY local approach. Each has strengths and limitations, and comparing them can be enlightening.

A. Middleware-reported MU (e.g. Radiometer AQURE)

Some modern POCT connectivity solutions automatically generate an MU estimate for results. For example, Radiometer’s AQURE point-of-care IT system (used with ABL blood gas analyzers) can report a “measurement uncertainty” for each analyte on each instrument. While the exact proprietary algorithms aren’t usually published, these systems typically draw on the instrument’s ongoing performance data. They may incorporate:

QC and calibration history: The middleware knows the outcomes of each calibration and quality control (QC) check the device performs. It can model the analyzer’s drift and precision over time. For instance, if the blood gas analyzer’s QC for pH has a long-term SD of 0.02 and it needed a significant calibration adjustment last week, the system factors that into the current MU.
Peer group data: In some cases, if the middleware is connected to many analyzers (e.g. across a hospital network or country), it might compare your instrument’s performance to its peers. For example, Radiometer’s Peer QC module can benchmark an analyzer’s QC against a peer group. Consistent peer agreement could tighten the MU, whereas outlying behavior could widen it.
Proprietary modeling: The vendor might use an internal model that weights recent performance more heavily (to reflect current state) or applies rules (e.g. excluding a QC outlier that was flagged and corrected). Some systems produce an expanded uncertainty at a specific level (say 95% confidence) for each analyte concentration. On an ABL90 printout or screen, you might see something like “Lactate uncertainty ±0.2 mmol/L” alongside the result.

Strengths: The middleware approach is automated and real-time. It requires little manual calculation by the POCT team. Every time a result is reported, the system can attach an MU based on up-to-date analyzer status. This is consistent and standardized – useful for comparing across devices. It also frees you from doing complex stats; the heavy lifting is done by the software. During audits or accreditation, you can demonstrate that the system continuously monitors uncertainty.

Limitations: The downside is opacity and possible incompleteness. These MU estimates are only as good as the model. They tend to be device-centric, meaning they consider the analyzer’s analytic performance but often do not (and cannot) include all the “use-case” factors like patient sample type or operator variability. For example, AQURE’s MU for glucose might account for strip lot calibration and instrument drift, but it won’t magically include the fact that night-shift operators often underfill capillary tubes. Also, proprietary algorithms may exclude certain data (like QC failures that were remedied) in ways you might not. Finally, if your process has unique quirks (e.g. using heparinized syringes from multiple manufacturers affecting blood gas results), the vendor’s model won’t know that. So, the middleware MU is convenient but might paint an overly optimistic picture by focusing on the analyzer in isolation.

Interpreting an AQURE MU: If you use Radiometer ABL analyzers, you may have seen an “Uncertainty” field in reports. Typically, this is an expanded uncertainty (k≈2) at a specified level (often a normal range or a decision level). For instance, for an ABL90 lactate of 2.0 mmol/L, AQURE might report uncertainty ±0.2 mmol/L. That suggests a 95% chance the true value lies between ~1.8 and 2.2 mmol/L. It’s crucial to know what’s included: usually, this encompasses analytical imprecision and maybe calibration bias. It likely does not include, say, any patient sample matrix bias between arterial and capillary blood. So if you rely on this number, understand its scope. It can be very useful for flagging when an instrument’s performance deteriorates (if the MU starts creeping up, something may be wrong), but you may need to add other components for a full picture (see Section 3).

B. “DIY” MU from local QC/EQA data

The second approach is to derive MU yourself using your laboratory’s own QC and EQA data, following guidance like ISO/TS 20914:2019, the Nordtest approach, or similar top-down methods. In practice, this often means:

Use long-term IQC data for imprecision: Gather a sizable set of internal quality control results at relevant concentration(s). Calculate the standard deviation (SD) or coefficient of variation (CV). This long-term SD_IQC should encompass day-to-day instrument variation, different reagent lots, calibrations, etc. It represents the random uncertainty component (u_imprecision). If QC is run at multiple levels, you may use the level closest to a medical decision point of interest (e.g. QC around 8 mmol/L for glucose thresholds).
Use EQA or reference comparisons for bias: Determine if there’s a systematic bias in your POCT results. For example, look at External Quality Assessment (proficiency testing) data: what’s the difference between your instrument’s result and the target value or peer group mean? Alternatively, compare against a reference method or the main lab analyzer using patient sample split comparisons. Quantify the average bias (Δ) at the relevant concentration. That contributes a bias uncertainty u_bias (the magnitude of the bias, or the bias divided by √3 or another factor if treating it as a rectangular distribution).
Combine them: Assuming bias and imprecision are independent, calculate the combined standard uncertainty:
u_c = √(u_imprecision² + u_bias²). This gives one standard deviation’s worth of uncertainty. Finally, decide on a coverage factor k (typically ~2 for ~95% confidence) to get the expanded uncertainty: U = k·u_c. You might say, for example, glucose = 8.0 ± 0.5 mmol/L (k=2).

This top-down MU approach is recommended by standards and literature for routine medical labs. It leverages the data you already have from quality management. Notably, ISO/TS 20914 and related references suggest that as long as your QC dataset covers the major sources of variability (different operators, reagent lots, etc.), this combined uncertainty is fit for purpose. Pioneering studies (Padoan et al., 2017; others) applied exactly this: using ≥6 months of QC data and EQA results to calculate MU, and showed it meets accreditation needs.

Why might middleware MU and DIY MU differ? It’s common to find that the vendor-reported MU (approach A) is smaller than your own calculated MU (approach B). For example, your Radiometer AQURE might claim ±2% for sodium, but your long-term QC+EQA analysis shows ±4%. Reasons include:

Data windows and weighting: The middleware might use a rolling short window (last few weeks of QC) whereas you used a year’s data including older lots and more drift.
Exclusions: Your manual analysis might include real-world “blips” (like a month where QC was erratic due to a bad reagent lot), while the automated model could have flagged those and adjusted the instrument or excluded those points. In essence, approach B can reflect more “warts and all.”
Bias inclusion: Many middleware systems don’t explicitly incorporate bias vs. an external reference – they assume calibration adjusts bias to zero. But your EQA revealed a consistent positive bias, which you included in MU. That alone will make your U larger than the vendor’s purely precision-based U.
Environmental and use factors: The DIY calculation might indirectly capture things like lot changes, manual QC mistakes, etc., if they affected QC/EQA results. The vendor’s algorithm might smooth or ignore those, focusing on ideal performance.

Neither approach is “wrong” – they simply answer slightly different questions. The key is transparency: if the two estimates diverge, investigate why. You might report both or use the larger one for safety. Many labs use the local QC/EQA-derived MU for their official documentation (since it’s more conservative and under their control) and keep an eye on the instrument’s internal MU as an ongoing quality indicator.

3. Beyond the instrument: an inclusive MU for POCT

Thus far, we’ve considered MU mainly from the standpoint of analytical performance of the instrument (the analyzer, its reagents, calibration and so on). But POCT happens in a holistic “system-of-use” context that can introduce additional uncertainty. Two patients both truly at 8 mmol/L glucose could get different POC results if one sample was capillary and clotted and another was venous and well-mixed, even on the same device. To capture this, it’s useful to conceptualize MU in two tiers:

Instrument MU (Analytical MU): This is the uncertainty inherent to the measurement procedure under controlled conditions. It’s what the instrument + reagents + calibration can achieve in the hands of trained operators following the SOP. If you run ideal samples and everything is done correctly, how variable and biased are the results? This largely comes from the factors we’ve already discussed: analytic imprecision and any calibration or method bias. For example, an ABL90 blood gas analyzer might have an instrumental MU on pH of ±0.02.
Use-context MU (System MU): This layer adds the real-world factors of using that instrument at the point of care. It includes pre-analytical and operator contributors that are not strictly instrument malfunctions but can affect results in routine use. Essentially, it’s the uncertainty of the whole process from patient to result, beyond just the analyzer’s technical performance.

In POCT, the use-context MU can be significant. Consider some examples of contributors:

Operator technique: Does the person doing the test collect the sample properly? E.g., for capillary blood gases or lactate, if they squeeze the finger or don’t discard the first drop, tissue fluid contamination can dilute the sample. In glucose testing, insufficient wiping of the first drop or timing errors in applying blood to a strip can introduce variability. Different operators may yield systematically different results due to technique.
Sample type and handling: Arterial vs. venous vs. capillary can cause systematic differences (capillary glucose tends to read lower in poor perfusion, etc.). Delays to analysis are critical for some analytes – a blood gas sample left sitting 20 minutes in a warm room will show a different pO₂ and pCO₂ than one run immediately. Even the type of collection tube or syringe (and its anticoagulant) can affect results (e.g. dilution by heparin volume). These pre-analytical variations contribute to overall uncertainty.
User and environmental factors: The ward setting might have temperature and humidity swings that slightly affect devices (even if within stated ranges). Busy ICU settings might have more frequent sample mixing issues or delays than a lab environment. If operators sometimes override lockouts or run tests when QC is overdue, those episodes might have higher error rates. Likewise, if a meter’s strips are sometimes stored improperly on a ward (too hot/cold), results might drift – variability not seen when everything is ideal.

In short, the uncertainty seen in daily POCT practice can be larger than the analyzer’s spec sheet uncertainty. Our goal should be to estimate that total uncertainty in a pragmatic way, so we ensure the results are still “fit for purpose” clinically.

Quantifying use-context MU: The most practical method is to perform paired measurements or audits to gauge the extra variability. Some approaches include:

Operator comparison studies: Have multiple operators (representing the range of skill in your facility) run the same control material or patient samples in duplicate. Calculate the extra SD due to operator differences. For example, if the analyzer’s own precision is 3% CV but operator-to-operator variability adds another 2% CV, then u_operator ≈ 2%. Add this as an independent component to your MU budget.
Sample type studies: Compare results on arterial vs. venous vs. capillary samples for the same patient (where feasible). The systematic differences and scatter give insight into pre-analytical uncertainty. For instance, if capillary glucose averages 5% lower than arterial with an extra scatter of 4%, you have both a bias and an uncertainty component from sample type. Include these in your MU for capillary sampling.
Split-sample audit: Periodically send duplicate POCT samples to the central lab for comparison. The difference distribution (POCT minus lab) reflects the combined effects of your entire POCT process vs. the reference method. If this includes operator variability, sample handling time differences, etc., it’s a “real-world” MU that encompasses analytical + use-context uncertainty.

The math is straightforward: if you have an instrument MU (u_instrument) and you’ve estimated a use-context component (u_use), combine them as independent sources: u_total = √(u_instrument² + u_use²). For example, if an ABL90’s analytical MU for lactate is ±0.25 mmol/L, but ward sampling/handling adds ±0.15 mmol/L of scatter, the total becomes u_total = √(0.25² + 0.15²) = √(0.0625 + 0.0225) = √0.085 = 0.29 mmol/L. With k=2, that’s ±0.6 mmol/L expanded uncertainty – noticeably larger than the ±0.5 you might have assumed from instrument specs alone.

Why this matters: A glucose reading of 3.0 mmol/L with “instrument-only” MU ±0.2 suggests the true value is 2.8–3.2. If the real-world MU is ±0.4 (including user factors), the range becomes 2.6–3.4. That’s a significant difference for hypoglycemia management – it might influence whether a nurse gives immediate glucose or waits to confirm. Being honest about total uncertainty prevents over-confidence in borderline results.

To be pragmatic, not every POCT program needs to exhaustively quantify use-context MU for every analyte. Focus on the high-impact tests (glucose in diabetes, lactate in sepsis, troponin in chest pain, etc.) and the scenarios where operator or sample variability is likely significant (ward-based testing, multiple shifts, etc.). For low-risk tests or highly controlled environments, the instrument MU may suffice.

4. Building MU from QC/EQA logs: robust data cleaning

When you set out to calculate MU from your internal quality control (IQC) and external quality assessment (EQA) data, one of the biggest challenges is deciding what data to include and exclude. The goal is to ensure your MU reflects the routine, controlled performance of your POCT system – not aberrant events that don’t represent normal operation.

Systematic approach to data filtering: Here’s a step-by-step method to clean your data responsibly:

Start with a comprehensive dataset: Gather at least 6–12 months of QC results (more if you have low testing volume). The goal is to capture different reagent lots, seasons, operators, etc., so your MU reflects realistic long-term variation. For EQA, include all relevant samples in that same period. Download the raw data with dates, operators, reagent lot numbers, and any QC flags or comments.
Review and categorize anomalous results: Look for obvious outliers or QC failures. But don’t automatically exclude them – instead, categorize them:
- Equipment malfunction that was corrected: E.g., QC failed because a sensor needed replacing; after replacement, QC came back in range. You might exclude the pre-fix QC but include the post-fix data.
- Operator error that was identified and corrected: E.g., wrong QC lot used, or sample was contaminated. If it was caught and corrected immediately, exclude those specific results.
- Reagent quality issue with a bad lot: E.g., control ampoule was expired or compromised. Exclude data from that bad lot if it was recalled or replaced.
- Obvious outlier that coincided with an external event: E.g., a control ampoule was found to be expired or the fridge was found off overnight affecting QC stability.
We exclude these because they don’t represent the routine, managed process – they’re exceptions that should be corrected via quality management rather than included in MU. Think of this as trimming “blunders” so your MU reflects the process when it’s under proper control.
Include normal variation sources: Don’t over-clean the data. It’s important to include the routine shifts and lot-to-lot changes that occur. For instance, if a new test strip lot in March had a slightly different mean and the QC shifted upward, that is a genuine component of uncertainty (systematic difference) that patients experienced until you perhaps adjusted targets. Your MU analysis should capture that kind of variability. So, do not remove data just because it spans different reagent lots or calibration intervals – those should stay in.
Calculate imprecision from QC: With the cleaned dataset, calculate the mean and SD (or CV) of the QC. Consider if the QC mean is at a different concentration than your clinical decision level of interest; you might need to extrapolate the CV% to that level (assuming CV is relatively stable in that range) or perform a brief study at that level (for example, running patient samples or standards in replicates around that concentration to estimate imprecision there). In many cases, you’ll use the QC CV% directly as the imprecision component.
Determine bias from EQA or comparison: Gather external assessment data. For each EQA sample over the period, look at your instrument’s result vs. the target or peer group mean. If available, use commutable materials (samples that behave like patient blood). Calculate the average bias. If there’s a clear bias trend depending on concentration, you might quantify bias at the medical decision level of interest (e.g., perhaps your glucose meters run 5% high at low glucose, but are unbiased at high glucose). In absence of commutable EQA, another approach is split-sample comparisons with the central lab: e.g., for 20 patient samples covering the range, find the average difference between POCT and the lab (after any necessary unit or reference adjustments). That difference, if significant, is your bias. For MU, we treat the bias magnitude as something we’ll include (because a consistent bias means uncertainty about true value – the true value might be lower or higher by that amount).
Calculate combined uncertainty: Now combine as discussed: u_c = √(σ_IQC² + bias²). Use the same units for both (if bias is a percentage and CV is a percentage, combine in percentage terms; if bias is in absolute units, convert CV to absolute SD at that level first). Expand with k≈2 for 95% interval. This gives you an MU for that analyte at the concentration in question.
Assess “routine” vs “worst-case” MU: It can be informative to compute two versions:
- Routine MU (cleaned): This is the one you just calculated with bad data excluded. It reflects the measurement uncertainty when the process is in control. Use this for claims about assay performance.
- Observed MU (raw/unfiltered): You can also compute what the MU would be if you included all the data points, even the outliers and mistakes. Often, this will be larger. This number isn’t for official use, but it’s an “eyes wide open” metric of what a patient might experience if all those mishaps weren’t caught. If the observed MU is dramatically larger than the routine MU, it flags that there are a lot of operational issues (operator errors, etc.) that need addressing via training or process changes.

By following a documented filtering method, you ensure your MU estimates are reliable and reproducible. During audits, you should be able to justify any exclusions as non-representative events. It’s also wise to store the raw data and analysis so you can update the MU later with new data or if a reagent change occurs.

Handling missing decision-level data: If you find that your QC materials don’t cover an important clinical decision threshold, you have a gap. For example, maybe your glucose QC levels are 3 mmol/L and 15 mmol/L, but you want MU at 4 mmol/L (hypoglycemia cutoff). You have a few options: (a) assume the CV% at 3 is similar at 4 and use that; (b) perform a quick validation study – e.g., take a patient sample around 4, run it 20 times on different meters to directly estimate SD at 4; or (c) use manufacturer precision data at that level if trustworthy. The key is to not leave a decision limit without an MU estimate.

5. MU, TE, and APS: judging “good enough” performance

Having an MU number is only half the battle; you need to interpret it. Is an uncertainty of ±0.5 mmol/L for lactate acceptable? To answer that, we use Analytical Performance Specifications (APS) – criteria often based on either clinical requirements, biological variation, or regulatory limits.

MU vs Total Error allowable (TE_a): Many performance standards (like CLIA regulations, or lab-developed specs) are given in terms of allowable total error – the maximum deviation from truth that’s clinically tolerable. For instance, an allowable total error for glucose might be ±10% or ±0.3 mmol/L (whichever is larger) in the hypoglycemic range. MU, on the other hand, gives a range of likely error. One common approach is to ensure that the expanded MU is less than some fraction of the allowable error. For example, a rule of thumb: 95% MU should be < 50% of the allowable error, so that almost all results fall within TE_a comfortably. If your MU is too large relative to TE_a, the method may not be fit.

Biological Variation-based APS: The European Federation of Clinical Chemistry (EFLM) provides a comprehensive Biological Variation (BV) database, and from BV data, one can derive performance goals. These include desirable imprecision, bias, and even an “allowable uncertainty.” Typically:

Desirable imprecision (CV): less than ~0.5 times the within-subject biological CV (CVI).
Desirable bias: less than ~0.25 times the combined biological variation (√(CVI² + CV_G²)).
Desirable MU: there are formulas for allowable MU as well – effectively combining those bias and imprecision criteria. In practice if both bias and imprecision meet desirable BV goals, the MU will be within a desirable range too.

In the EFLM model, there are tiers: “optimum,” “desirable,” and “minimum” performance. For example, for an analyte with CVI = 5%, desirable imprecision would be <2.5% CV, while minimum might be <3.75% CV. These can serve as yardsticks for your POCT MU. If your MU expanded (95%) is, say, ±8% and desirable total allowable uncertainty was ±5%, you know your method is off the pace and either risk acceptance or improvements are needed.

State-of-the-art or clinical outcome specifications: In some cases, especially for new tests, one sets APS by what top performers do (state-of-art) or by modeling clinical impact. For example, for troponin in acute MI, one might say we need a CV <10% at the 99th percentile (to reliably distinguish small increases) – that's a clinical outcome-derived goal. For blood glucose in ICU, one might model insulin dosing errors and conclude an uncertainty above ±15% leads to too many hypoglycemia incidents, so they set ±15% as max allowable error. Whenever possible, tie your MU to such meaningful criteria.

Comparing MU to APS: Once you have your MU (ideally at relevant decision levels), compare it to the spec:

If MU (95% interval) is comfortably smaller than the allowable error or the clinical significant change, you’re in good shape. E.g., if allowable error for lactate is ±0.5 mmol/L at 2.0 and your MU is ±0.2 mmol/L, then uncertainty isn’t likely to impede clinical decisions – results are plenty reliable.
If MU is close to or exceeds the spec, that’s a red flag. E.g., say for pO₂ in neonatal care you need ±5 mmHg accuracy, but your POCT blood gas MU is ±8 mmHg at the low end – that means a reading of 50 could truly be 42 to 58, which might be clinically unacceptable. You either need to improve it or ensure clinicians understand the limitation (maybe they’ll send critical samples to the lab analyzer instead).
Look at bias and imprecision components relative to specs too. If most of the MU is coming from bias, you might adjust calibration or apply a correction factor to improve things. If it’s mostly imprecision, more calibration or better maintenance might not help and you might need a different method or accept the risk.

Ultimately, MU should feed into a risk assessment: is the test still fit for purpose? ISO 15189 now demands that labs assess and ensure the measurement uncertainty is suitable for patient care – meaning you have to decide based on APS whether the uncertainty is acceptable. If not, document the risk and what you’re doing about it (which could range from “we decided it’s fine for now given lack of alternatives” to “we will switch to a new device that meets the spec”).

Why it matters (examples): Consider a lactate result of 3.8 mmol/L in a sepsis patient. If MU is ±0.4, the true value could be 3.4 (below the 4.0 cutoff for the sepsis bundle) or 4.2 (above it). Clinicians might hesitate: repeat the test or act? Knowing the MU can guide them – for instance, if result is very close to 4 and patient is on the line, a repeat or lab confirmation might be in order if time permits. For glucose, if a meter reads 2.8 mmol/L (hypoglycemic) with MU ±0.3, the true value could be ~3.1 (perhaps not truly hypoglycemic). In insulin dosing protocols, such uncertainty might lead to a safety step like “if a reading is below 3 but above 2.5, and patient is asymptomatic, confirm with lab test before giving dextrose.” Clearly communicating MU-driven retest rules can prevent patient harm from over-treatment. In blood gases, if an arterial pCO₂ comes as 6.0 kPa ±0.3 (5% MU) one can be fairly confident a small change to 6.5 is real; but if MU were ±0.6 (10%), that same change could be noise – one wouldn’t rush to change ventilator settings unless the trend continues. Thus, MU tied to APS ensures we maintain appropriate trust and actions on POCT results.

6. Worked examples

Let’s bring this together with two practical examples, applying both the middleware MU and DIY calculation, and showing how to document and interpret the results. We’ll use an ABL90 blood gas analyzer (for pH, pO₂, pCO₂, lactate) and a Nova StatStrip glucose meter.

A. ABL90 Blood Gases and Lactate (Radiometer AQURE + DIY)

Scenario: You manage several Radiometer ABL90 FLEX blood gas analyzers in an ICU and OR setting. These analyzers are connected to Radiometer’s AQURE middleware which provides an MU estimate. You want to compare that to your own QC/EQA-based MU and ensure it meets clinical needs. Let’s focus on lactate (since it’s critical in sepsis management) and pO₂ (critical for ventilation management).

Vendor MU (AQURE): Suppose the AQURE system reports the following for lactate on one analyzer:

Lactate current MU (k=2) = ±0.25 mmol/L at 2.0 mmol/L

This likely came from the instrument’s calibration and QC stats. It means if the analyzer shows 2.0, the middleware thinks the true value is 95% likely to be between 1.75 and 2.25. For pO₂, AQURE might show:

pO₂ current MU (k=2) = ±3 mmHg at 80 mmHg

(This is an example, not actual Radiometer spec.) As a manager, note these values and understand they include analytic factors but exclude patient/sample effects.

DIY MU from QC/EQA: Now, you extract 6 months of lactate QC data from the ABL90. The level-2 control (~2.5 mmol/L target) has 180 data points over 6 months, mean = 2.45, SD = 0.10 mmol/L. That’s CV ≈ 4%. Over the same period, EQA for lactate (whole blood material) had an average bias of +0.1 mmol/L (your analyzers read slightly higher than the target on average). You calculate:

u_imprecision = 0.10 mmol/L (the SD from IQC at ~2.5 mmol/L).
u_bias = 0.1 mmol/L (the absolute bias magnitude, treating the target as truth).
Combined standard uncertainty: u_c = √(0.10² + 0.10²) = 0.14 mmol/L.
Expanded uncertainty U (95%): = 2 × 0.14 = 0.28 mmol/L.

Rounded, you’d report: “Lactate ~2.5 mmol/L, U ≈ ±0.3 mmol/L (95% confidence).” If you do the same for pO₂, using QC data (say SD = 2 mmHg at 80 mmHg, bias negligible after calibration), you might get U ≈ ±4 mmHg.

Comparison and interpretation: Your DIY MU for lactate (±0.3) is slightly larger than AQURE’s ±0.25. Investigating why: you included bias from EQA (the device runs a bit high) whereas the device’s own calculation probably assumed it’s perfectly calibrated. Also, your SD was over 6 months including some older sensor cartridge data; AQURE might be using a shorter window. Both numbers are in the same ballpark (~10-12% of the value). For pO₂, maybe both came out similar (~3 vs 4 mmHg). If your MU had been much larger than AQURE’s, it could indicate extra variability (perhaps one operator was not following protocol, affecting QC). Conversely, if the vendor’s MU was higher than yours, that might signal the instrument has detected instability that you haven’t noticed yet.

MU statement for clinicians: From the above, you might distill a simple statement to put in the ICU POCT SOP or user manual: “Lactate results around 2–4 mmol/L have an uncertainty of approximately ±0.3 mmol/L (95% confidence). For example, a lactate of 2.0 could be between ~1.7 and 2.3. pO₂ results around 80 mmHg have uncertainty ~±4 mmHg.” Such statements help clinicians understand the possible variation.

Why it matters (sepsis example): Surviving Sepsis guidelines often use lactate ≥4.0 mmol/L as a trigger for aggressive resuscitation. If your device reads 3.8, it’s near that cut-off. Knowing the MU is ±0.3, the true lactate might actually be >4 – so one might err on side of treating, or at least rechecking quickly. Conversely, if a lactate of 4.2 is obtained, it might not be truly above 4.0. Communicating this nuance (perhaps “4.2 ±0.3”) can prompt a repeat or careful clinical correlation rather than automatic entry into a protocol. This can personalize patient care and avoid over-treatment or under-treatment at threshold values.

In our example, we would document both the instrument MU and the use-case MU. Perhaps our data showed that samples from the OR, which are often arterial and analyzed immediately, have little extra variance, but samples from the general ward (tourniquet, delayed transport on ice) add ±5% uncertainty. We could then set u_use accordingly for ward lactates. If overall lactate MU becomes, say, ±0.4 mmol/L after adding that, we include that in risk assessment.

B. Nova StatStrip Glucose (ward glucose meter)

Scenario: The hospital uses Nova StatStrip glucose meters on multiple wards for bedside glucose, including critical care and general wards. You want to estimate MU for glucose around the hypoglycemia range (~3–4 mmol/L) and around an important decision range (say ~10 mmol/L for insulin sliding scale adjustments). Also, you want to account for the fact that sometimes nurses use fingerstick capillary samples and other times arterial line samples (in ICU) which may have slight differences.

Gather data: You pull 3 months of QC data for the StatStrip. Let’s say Level 1 control (approx 2.5 mmol/L) mean = 2.5, SD = 0.15 (CV = 6% – glucose meters often have higher CV at low end). Level 2 control (10 mmol/L) mean = 10.2, SD = 0.3 (CV = 3%). EQA results: at ~2–4 mmol/L, your meters on average read 5% low against the reference (perhaps due to matrix differences of the EQA material), and at ~10 mmol/L nearly unbiased. You also did a quick comparison of 20 patient samples between StatStrip and the central lab hexokinase method: on average, StatStrip was 0.2 mmol/L higher than lab at ~10 mmol/L, and 0.1 mmol/L lower at ~3 mmol/L (small biases).

Calculate MU: For the low end (~3 mmol/L):

u_imprecision = 6% of 3 mmol/L = 0.18 mmol/L.
u_bias = perhaps we take the EQA bias magnitude 5% of 3 = 0.15 mmol/L (ignoring sign, just magnitude for uncertainty).
u_c = √(0.18² + 0.15²) = √(0.0324 + 0.0225) = √0.0549 = 0.234 mmol/L.
U = 2 × 0.234 ≈ 0.47 mmol/L (95% interval).

So roughly ±0.5 mmol/L at 3 mmol/L. That is about ±16%. For the high end (~10 mmol/L):

u_imprecision = 3% of 10 = 0.30 mmol/L.
u_bias = maybe ~2% (0.2 mmol/L) as observed vs lab.
u_c = √(0.30² + 0.20²) = √(0.09 + 0.04) = √0.13 = 0.36 mmol/L.
U = 2 × 0.36 = 0.72 mmol/L (±0.7 mmol/L, ~7%).

These are the analytical MU estimates. Now consider use-context: On general wards, all samples are capillary and some may be from patients with poor circulation (edema, etc.), which could add extra scatter. In ICU, many measurements are arterial line draws (which may be more reliable). You look at data of paired glucose readings: in ICU, arterial vs lab difference SD was small (~3%), but in ward fingersticks vs lab, difference SD was ~8%. That suggests a higher u_use for the ward setting due to sample type and technique (factors like incomplete drop, etc.). You might quantify u_use ~5% (half of that 8%, assuming some captured in QC anyway) for capillary use.

Use-context adjustment: For the ward capillary case at low glucose, adding u_use ~5% of 3 = 0.15 mmol/L. Then:

u’_c = √(0.18² + 0.15² + 0.15²) = √(0.0324 + 0.0225 + 0.0225) = √0.0774 = 0.278.
U’ ≈ 0.56 mmol/L.

So about ±0.6 mmol/L. At 10 mmol/L, if we say u_use ~5% = 0.5 mmol/L:

u’_c = √(0.30² + 0.20² + 0.50²) = √(0.09 + 0.04 + 0.25) = √0.38 = 0.616.
U’ ≈ 1.23 mmol/L (~±1.2).

Now our MU on the ward for glucose is roughly ±0.6 at 3 mmol/L and ±1.2 at 10 mmol/L.

Interpretation: Compare these with goals: For glucose, an oft-cited performance goal (ISO 15197, FDA, etc.) is within ±0.83 mmol/L at <5.5 and ±15% at higher levels. Our MU of ±0.6 at 3 is within ±0.83, so that seems acceptable. At 10, ±1.2 is 12%, within 15%. So our POCT meets those minimal accuracy requirements. However, if we strive for tighter control (e.g. some hospitals want glucometers within 10%), our ±12% might be slightly above optimum. This might be flagged for potential improvement or at least awareness. We also see that ICU arterial use has lower uncertainty than general ward capillary use – perhaps an argument to prefer sending critical samples (neonatal, shock patients) to lab or to ensure multiple measurements if a weird result in a capillary sample.

Communicating to staff: We could summarize in the POCT policy: “At 3 mmol/L, meter uncertainty ~±0.5 mmol/L (95% CI); at 10 mmol/L, ~±1 mmol/L. Capillary sampling technique can add variability – if a glucose result is unexpected or borderline (e.g., 3–4 mmol/L in a patient without symptoms), consider repeating or confirming with a lab sample.” This empowers nurses to make informed decisions rather than overreacting to single numbers.

Why it matters (insulin dosing): In sliding-scale or intensive insulin protocols, decisions are made on thresholds (e.g., if glucose <4, hold insulin; if >10, increase dose). If the MU is large, a single reading of 10.2 might actually be below 10 and not truly call for more insulin, or a 3.9 might be above 4. With an understanding of MU, protocols can be written with a buffer (some hospitals already do this intuitively – e.g., they might not escalate insulin unless two consecutive readings are high, implicitly acknowledging uncertainty). Explicit MU knowledge can refine those rules: for instance, “If glucose is within ±0.5 of a decision threshold, recheck to confirm before action.” This prevents patient harm from over-correcting based on a spurious reading.

These examples illustrate how MU is calculated and then translated into practice. Always document your workings (QC data used, bias sources, etc.) so you can justify the numbers. If challenged by an assessor or clinician (“how did you get ±0.5?”), you can show the data and math behind it.

7. Documenting MU (auditable, ISO-aligned)

A thorough documentation of MU for each POCT method is not only good practice but also expected by ISO 15189:2022. Here we outline a template for an “MU Dossier” or record that you can maintain for each device/analyte combination. This can be a section in your method SOP or a standalone uncertainty file. Key elements to include:

Method and scope: Identify the device and assay (e.g., “Radiometer ABL90 FLEX – Lactate in heparinized arterial whole blood”). State the measurement interval it covers and the clinical areas using it (ICU, ED, etc.). Mention important clinical decision limits (e.g., lactate 2.0 and 4.0 mmol/L) that the MU will be evaluated against.
Data sources and period: Describe what data you used to determine MU. “Calculated from internal QC data (Level 2, target ~2.5 mmol/L) from Jan–Jun 2025 (n=180 data points, 3 lots of QC), and EQA results from WEQAS scheme (4 samples in same period).” If you used any validation or verification study data, note that too. This helps anyone reviewing to know the basis of the estimate.
Inclusion/Exclusion criteria: Document how you cleaned the data. For example, “Excluded one QC result on 3-Mar (operator error, used wrong lot). Excluded EQA sample in April due to commutability issue. All other data included, spanning 2 cartridge lot numbers.” This level of detail shows that you deliberately considered anomalies. It’s good to version-control this: e.g., “MU version 1.3 updated after reagent lot change in Oct 2025.”
MU calculation and contributors: Show the formula and values. Example: “Imprecision: QC SD = 0.10 mmol/L at 2.5 mmol/L (CV=4%). Bias: +0.1 mmol/L vs EQA target. Combined u_c=0.14 mmol/L; Expanded U (k=2) = 0.28 mmol/L.” If you included an operator/use uncertainty, state how: “Added u_use=0.1 mmol/L based on duplicate sample study, for capillary sampling variability.” It’s helpful to include the equations (as done above) so that an assessor or another staff member can follow the logic. Keep the math as simple and clear as possible; avoid burying it in prose.
Coverage factor and confidence level: State what k you used and roughly what confidence that corresponds to. Usually: “Expanded uncertainty U is quoted at approximately 95% confidence (coverage factor k=2).” If you used a different coverage or if the distribution is not normal, clarify that (but for most practical purposes k=2 is standard).
Instrument vs. use-context MU: If you have distinguished analytic MU from total MU, document both. For example, “Instrument-only MU (excluding user/sample factors): ±0.25 mmol/L. Including user variability (based on ward data): ±0.35 mmol/L.” This tells the story that under ideal conditions it’s tighter, but in real life it’s a bit wider. You might report both or just the latter depending on audience, but recording both helps internal analysis.
Acceptability criteria and fitness assessment: Here you tie MU to APS. Write something like, “Allowable total error for lactate = ±0.5 mmol/L (per clinical requirement of sepsis protocol). The observed MU (±0.28) is within this limit, meeting the performance specification.” Or if not, “MU exceeds the desirable goal of ±5% (achieved ~±12% vs goal 5%, see Biological Variation criteria). This is a concern – see risk assessment below.” Essentially, state whether the MU is judged acceptable or not and by what benchmark. If not acceptable, note what’s being done: “Instrument still under evaluation – if bias not reduced with next calibration, will consider applying correction factor or using lab bench method for critical lactates.” This fulfills the ISO requirement of ensuring equipment can achieve required uncertainty or documenting the plan if not. It shows you have a plan for MU beyond just calculating it.

Updating and version control: Set a review schedule for your MU estimates – perhaps every 6–12 months or when there are significant changes (new reagent supplier, device upgrade, etc.). Keep a log of changes: “Version 1.0 (Jan 2025): Initial calculation. Version 1.1 (Apr 2025): Updated bias component after new EQA data. Version 1.2 (Aug 2025): Added use-context uncertainty for ward capillary samples.” This complements the usual IQC and EQA records.

Storage and accessibility: Keep MU records in a place where staff can find them easily, and where they’ll be available during inspections. Many labs create an “MU folder” in their quality system, cross-referenced from method SOPs. You might also create a simple summary table (device, analyte, MU estimate, last reviewed) that can be posted in the POCT area for quick reference by users. Remember, ISO 15189:2022 says labs must make MU available to users on request – so it shouldn’t be buried in a file cabinet.

This documentation approach turns MU from a vague requirement into a concrete, manageable task that satisfies both regulatory expectations and clinical utility.

8. Review and continual improvement of MU

MU is not a “calculate once and forget” parameter. Like other quality indicators, it needs periodic review and improvement. Changes in equipment, reagents, operators, or clinical requirements can all affect your uncertainty estimates.

Regular review schedule: Set up a systematic review, perhaps every 6–12 months. The frequency depends on your test volume, stability, and risk level. High-volume, critical tests (like glucose or blood gases) might warrant more frequent review (e.g., quarterly), while less critical or stable tests might be annual. Align the review with other quality activities like method validations or proficiency testing assessments.

Triggers for immediate review: Some events should prompt an immediate MU reassessment:

Equipment change or major maintenance (new analyzer, software update)
Reagent supplier change or significant lot-to-lot variations observed
Training of new operators or change in testing personnel
Change in sample types or testing locations (e.g., expanding POCT to new wards)
Persistent QC issues or EQA failures that required corrective action
Clinical feedback suggesting results are unreliable or inconsistent

Improvement strategies: If your MU is larger than desired, consider these approaches:

Address bias: If bias is the main contributor, look at calibration practices, reagent storage, or systematic differences in your process. Sometimes a simple calibration adjustment or correction factor can significantly improve MU.
Reduce imprecision: If random error dominates, examine maintenance schedules, operator training, sample handling, or environmental factors. More frequent QC, better temperature control, or standardized procedures might help.
Operator training: If use-context variability is large, focused training on technique, sample collection, or result interpretation can reduce the uncertainty.
Equipment upgrade: In some cases, the current technology may have reached its limits. A newer analyzer or method might achieve better performance.

Trending and monitoring: Create simple trend charts of your MU estimates over time. If MU is gradually increasing, it might indicate drift, aging equipment, or process degradation. If it suddenly jumps, investigate for specific causes. This trending complements your regular QC charts and gives a higher-level view of system performance.

Benchmarking: Where possible, compare your MU to published data from similar labs or manufacturers’ specifications. If your uncertainty is much higher than peers, there might be room for improvement. Conversely, if you’re performing well, that supports confidence in your results.

Remember, the goal isn’t to achieve the smallest possible MU, but to ensure it’s appropriate for clinical use. Sometimes “good enough” MU with robust, simple processes is preferable to marginal improvements that require complex maintenance.

9. Governance artifacts to include

For a complete POCT MU program that satisfies regulatory requirements and supports good clinical practice, consider including these documents and records in your quality system:

Policy and procedure documents:

POCT MU Policy: High-level statement of your organization’s approach to MU, responsibility assignments, and review schedule
MU Calculation Procedure: Step-by-step method for determining MU from QC/EQA data, including data inclusion/exclusion criteria
MU Communication Guidelines: How to present MU information to clinicians, when to include uncertainty in reports, and escalation procedures

Technical records:

MU Calculations Worksheets: The detailed calculations for each device/analyte, with raw data and formulas
MU Summary Tables: Quick reference showing current MU estimates for all POCT methods
Comparison Studies: Any split-sample or operator variability studies used to determine use-context uncertainty

Quality management records:

MU Review Meeting Minutes: Documentation of periodic MU assessments, decisions made, and actions taken
Change Control Records: When MU estimates are updated due to equipment changes, process improvements, etc.
Risk Assessments: For any tests where MU exceeds desired specifications, document the clinical risk and mitigation strategies

Training and competency records:

MU Training Materials: Education for POCT operators on uncertainty concepts and practical implications
Competency Assessments: Verification that staff understand how to interpret and communicate MU appropriately

These documents demonstrate a systematic, professional approach to MU that will satisfy accreditation bodies and support quality patient care.

10. Caveats, boundaries, and special cases

While this guide provides a practical framework for MU in POCT, there are some important limitations and special situations to consider:

Low-volume testing: If you perform a test very infrequently, accumulating enough QC data for reliable MU calculation can take years. In these cases, you might need to rely more heavily on manufacturer data, method comparison studies, or accept a higher degree of uncertainty in your MU estimate.

Qualitative and semi-quantitative tests: Traditional MU calculation doesn’t apply to yes/no results (like pregnancy tests) or semi-quantitative results (like urine dipsticks). For these, focus on analytical sensitivity, specificity, and comparison to reference methods rather than numerical uncertainty.

New or innovative tests: For newly introduced POCT methods, you might not have enough historical data for robust MU calculation. Consider starting with manufacturer estimates and updating as you accumulate experience.

Matrix effects: Some patient samples (hemolyzed, lipemic, or from patients with unusual conditions) might behave differently than standard QC materials. While routine MU calculation captures typical performance, be aware that special patient populations might have different uncertainty characteristics.

Regulatory differences: Different countries and accreditation bodies might have varying expectations for MU documentation and reporting. Ensure your approach aligns with local requirements.

Resource limitations: Comprehensive MU analysis requires time, expertise, and data management resources that might not be available in all settings. Focus on the highest-risk, highest-volume tests first, and build your MU program gradually.

Clinical acceptance: Even technically sound MU estimates are only valuable if clinicians understand and use them appropriately. Invest in education and communication to ensure your MU work translates into better patient care.

Remember, MU is a tool to support clinical decision-making, not an end in itself. The most sophisticated uncertainty calculation is worthless if it doesn’t help deliver safer, more effective patient care.

This guide provides a practical framework for implementing measurement uncertainty in POCT programs. For specific technical questions or guidance on complex situations, consider consulting with clinical laboratory scientists, metrology experts, or accreditation body guidance documents.