Clinical Utility Index Calculator (CUI+ CUI-) v5

Quickly find the Qualitative and Quantitative Accuracy & Utility of Diagnostic Predictive and Screening Tests

Creator: Dr Alex J Mitchell | Programming: Dr John Pezzullo | Revised: 31-Oct-2012 - v3 | Try free Full GoogleSheets Version | Queries/permission ajm80@le.ac.uk

Citation: Mitchell AJ. Sensitivity × PPV is a recognized test called the clinical utility index (CUI+). Eur J Epidemiol. 2011 Mar;26(3):251-2

New! Try our live (free) google sheets calculator with your own data https://docs.google.com/spreadsheets/d/1UtevFDeZl8PDr7rdSuxy0Rpq3OdIEGPmv1lqUiArvjs/edit?usp=sharing

Background: Clinical utility is the degree to which a diagnostic test is useful in clinical practice, information not clear from conventional sensitivity, specificity, ROC scores alone. Clinical utility depends on three major factors: discrimination, occurrence and acceptability. The third is hard to quantify mathematically. Most diagnostic methods look at one factor. Clinical utility index (CUI) calculates discrimination and occurrence and has been used in more than 30 published studies to date. This page is a quick web based app to help with test interpretation. CUI also works with predictive risk based data. Note: qualitative grades are rules-of-thumb as local test value will depend on the impact of false positives and false negatives.

Further reading: on diagnostic test interpretation here or Adobe PDFs: MS Powerpoints:

Instructions: Enter your data from diagnostic/screening test results in green cells in section (a). All results are automatically calculated under b) accuracy and c) clinical utility (NB that confidence intervals are only shown in the XLS and google docs versions).

Test Interpretation: qualitative interpretation included in google sheets version

Fair use: This page and calculators on this page are free to use for non-commericial basis and in research. Please cite Eur J Epidemiol. 2011 Mar;26(3):251-2

Version history: There was a minor error in the calculation of CUI+ CUI- confidence intervals in previous versions; now corrected in v4 in the XLS excel sheet.

Page Warning: Do not enter cell counts with a leading zero! eg 34 not as 034. Some browsers will misinterpret some numbers entered with leading zeros, and will produce wrong results.

a) YOUR Data Entry	Counts
Enter Total # Cases:
Enter # Cases Detected (aka True positives TP)
Enter Total # Non-Cases:
Enter # Non-Cases Detected (aka True negatives TN):

Note you can easily backwards calculate TP from # cases x sensitivity)

Confidence Level: %

NB confidence intervals are only shown in the XLS and google docs versions

b) Accuracy Statistics

Value

Sensitivity (Se) =

Specificity (Sp) =

Positive Predictive Value (PPV) =

Negative Predictive Value (NPV) =

Positive Likelihood Ratio (+Ve) =

Negative Likelihood Ratio (-Ve) =

Test Score (aka fraction correct) % =

Note: 100 minus test score is known as "error rate"

Prevalence =

c) Utility Statistics	Rating	Value
Clinical Utility (+Ve) =
Clinical Utility (-Ve) =

d) Explanation of Clinical Utility

Mitchell developed the clinical utility index in 2007 in order to take into account both occurrence and discrimination when evaluating test performance and also to offer a reliable qualitative interpretation of diagnostic test results. Both occurrence and discrimination are important aspects of test performance for clinicians. Further it is important to realize that test results can be valuable for ruling-in a diagnosis (aka confirmation or case-finding) and for ruling out those unlikely to have a condition (aka screening). In terms of longitudinal risk studies positive risk prediction of a future event (eg suicide after a positive PHQ9 score) is the equivalent of "case-finding" in diagnostic studies and negative risk prediction for the future non event (eg no suicide after a previous low PHQ9 score) is equivalent to screening. Epidemiologists use the term screening to refer to mass testing in the general population.

If a single test is used then it should perform well in both directions. There is considerable misunderstanding of sensitivity and specificity. These require an objective "gold standard" comparison and simply report on the occurrence of a test result in cases (sensitivity) or non-cases (specificity). Statistics based solely on sensitivity and specificity like the likelihood ratios have serious limitation (not least the divide by 0 error!). I do not see any purpose in likelihood ratios and diagnostic odds ratios esp as their interpretation is obscure. In clinical practice predictive values (PPV and NPV) are usually more useful, as these are measures of discrimination and can be interpreted literally in the clinic and even though they are strongly influenced by case prevalence. If PPV and NPV are known, clinicians can work out the error rate without having a gold standard to hand because they are calculated as a proportion of positive or negative tests not as a proportion of cases / non-cases.

In clinical practice both discrimination and occurrence are important for knowing how useful a particular test will be. In the case of a high PPV or a high NPV, a correction is needed for occurrence of that test in each respective population because a positive test may be highly predictive but it may be extremely rare. Hence the product of sensitivity (occurrence of a positive test in cases) and PPV (discrimination of a positive test) is the most useful calculation when a positive result of a test is under scrutiny in clinical practice. A mirror image calculation namely specificity x NPV is recommended when a negative test is under scrutiny.

Take for example the occurrence of all five symptoms from DSM-IV as a “test” for depression. In those who test positive (that is those who suffer all five symptoms) let us say the PPV is hypothetically 88% but often actually having all five symptoms is rare (say 28%) in depressed patients in clinical practice. Any test with a high PPV will be devalued if it occurs rarely in true cases as clinicians can only use the test when it is positive as a confirmatory method. Clinically relevant rule-in accuracy can be considered a product of the PPV and sensitivity and in this example the CUI+ for all five symptoms is 0.88 x 0.28 = 0.32.

Now consider the calculation needed for ruling-out a diagnosis. For example, the absence of the symptom "loss of motivation" might have a high NPV of 96% but might only be absent (negative) in 70% of non-depressed patients. Thus the negative clinical utility index (CUI-) would be 0.96 x 0.70 = 0.67.

To help with the application of this index I proposed and published scores can be converted into qualitative grades as follows. Please note the importance of false positive and false negatives actually depends on the consequences of each type of error (eg misdiagnosis, erroneous treatment etc). Hence these figures should be considered approximate qualitative valuations.

Published Qualitative Grades

"excellent" utility >= 0.81,
"good" utility >=0.64 but <0.81
"fair" utility >=0.49 but <0.64
"poor" utility >=0.36 but <0.49
"very poor" utility < 0.36

e) Citations and Further Reading

Mitchell AJ. The clinical significance of subjective memory complaints in the diagnosis of mild cognitive impairment and dementia: a meta-analysis. Int J Geriatr Psychiatry. 2008 Nov;23(11):1191-202.
Pentzek M, Wollny A, Wiese B et al AgeCoDe Study Group. Apart from nihilism and stigma: what influences general practitioners' accuracy in identifying incident dementia? Am J Geriatr Psychiatry. 2009 Nov;17(11):965-75.
Goncalves et al. Case ﬁnding in dementia: comparative utility of three brief instruments in the memory clinic setting. Int Psychogeriatr. 2011 Jan 12:1-9.
Mitchell AJ. 5. How Do We Know When a Screening Test is Clinically Useful? In Screening for Depression in Clinical Practice: An Evidence-Based Guide (Eds) Alex J. Mitchell and James C. Coyne ISBN10: 0195380193 OUP 2009
R. Rhys Davies and AJ Larner. Addenbrooke’s Cognitive Examination (ACE) and Its Revision (ACE-R). In A.J. Larner (ed.), Cognitive Screening Instruments, © Springer-Verlag London 2013
Mitchell AJ. Sensitivity × PPV is a recognized test called the clinical utility index (CUI+). Eur J Epidemiol. 2011 Mar;26(3):251-2

f) A Worked Example

Williams et al recently published an impressive comparison of 9x self-report depression scales against a blinded gold standard in Parkinson's disease (Neurology. 2012 Mar 27;78(13):998-1006). Cut-offs were chosen by ROC curve to maximise the Youden Score (sum of sensitivity+specificity). They conclude that "all scales studied, except for the UPDRS Depression, are valid screening tools." But what does "valid" mean? Here is the abstract (http://www.ncbi.nlm.nih.gov/pubmed/22422897):

"Patients with PD (n = 229) from community-based neurology practices completed 6 self-report scales (Beck Depression Inventory [BDI]-II, Center for Epidemiologic Studies Depression Rating Scale-Revised [CESD-R], 30-item Geriatric Depression Scale [GDS-30], Inventory of Depressive Symptoms-Patient [IDS-SR], Patient Health Questionnaire-9 [PHQ-9], and Unified Parkinson's Disease Rating Scale [UPDRS]-Part I) and were administered 3 clinician-rated scales (17-item Hamilton Depression Rating Scale [HAM-D-17], Inventory of Depressive Symptoms-Clinician [IDS-C], and Montgomery-Åsberg Depression Rating Scale [MADRS] and a psychiatric interview. DSM-IV-TR diagnoses were established by an expert panel blinded to the self-reported rating scale data. Receiver operating characteristic curves were used to estimate the area under the curve (AUC) of each scale. RESULTS: All scales performed better than chance (AUC 0.75-0.85). Sensitivity ranged from 0.66 to 0.85 and specificity ranged from 0.60 to 0.88. The UPDRS Depression item had a smaller AUC than the BDI-II, HAM-D-17, IDS-C, and MADRS. The CESD-R also had a smaller AUC than the MADRS. The remaining AUCs were statistically similar. CONCLUSIONS: The GDS-30 may be the most efficient depression screening scale to use in PD because of its brevity, favorable psychometric properties, and lack of copyright protection. However, all scales studied, except for the UPDRS Depression, are valid screening tools when PD-specific cutoff scores are used."

Now lets have a closer look at their results (to keep things simple at the cut-offs reported). The optimal scale is the MADRS but it is only +ve in 3/4 of depressed patients and would have only "fair" clinical utility when case-finding depression. It did have "good" screening properties judging by the CUI-; as evidenced by the specificity of 88% and NPV of 83%.

Scale	SE	SP	PPV	NPV	CUI+ve Score	CUI-ve Score	CUI+ve Qualitative	CUI-ve Qualitative	AUC
BDI-II ≥ 7	0.95	0.6	0.62	0.94	0.589	0.564	Fair	Fair	0.85
CESD-R ≥ 12	0.72	0.7	0.62	0.79	0.446	0.553	Poor	Fair	0.79
GDS-30 ≥ 10	0.72	0.82	0.73	0.81	0.526	0.664	Fair	Good	0.83
IDS-SR ≥ 14	0.9	0.6	0.61	0.9	0.549	0.540	Fair	Fair	0.83
PHQ-9 ≥ 6	0.66	0.8	0.69	0.77	0.455	0.616	Poor	Fair	0.81
UPDRS Depression ≥ 1	0.7	0.77	0.68	0.79	0.476	0.608	Poor	Fair	0.75
HAM-D-17 ≥ 7	0.77	0.76	0.69	0.83	0.531	0.631	Fair	Fair	0.86
IDS-C ≥ 12	0.81	0.79	0.73	0.86	0.591	0.679	Fair	Good	0.88
MADRS ≥ 8	0.74	0.88	0.81	0.83	0.599	0.730	Fair	Good	0.88

Williams JR, Hirsch ES, Anderson K, Bush AL, Goldstein SR, Grill S, Lehmann S, Little JT, Margolis RL, Palanci J, Pontone G, Weiss H, Rabins P, Marsh L.

A comparison of nine scales to detect depression in Parkinson disease: which scale to use? Neurology. 2012 Mar 27;78(13):998-1006. Epub 2012 Mar 14.

CUI (c) Dr Alex J Mitchell; thanks to Dr John Pezzullo for Java | Queries/permission ajm80@le.ac.uk

Keywords: Clinical Utility Index Calculator - Sensitivity specificity ROC PPV NPV Youden

Youden Index
The Youden index (Youden's J) is based on the characteristics of sensitivity and specificity. Sensitivity and specificity are essentially measures of occurrence rather than gain or clinical value. For example, an 80% sensitivity simply describes that a result occurs in 8 out of 10 of those with the index condition. Yet a test that was positive in 80% of those with a condition might or might not be valuable depending on the prevalence of that condition and also the number of times the test was positive in those without the condition.
Sensitivity and specificity are often considered a hypothetical rather than clinical measure because their calculation requires application of a reference (or criterion) standard. In clinical practice, a reference standard is not usually calculated for all patients, hence the need for the test itself. In 1950, William John Youden (1900–1971) proposed the Youden index. It is calculated as follows: [J = 1 - (α + β) or sensitivity + specificity - 1]. If a test has no diagnostic value, sensitivity and specificity would be 0, and hence J = -1; a test with modest value where sensitivity and specificity = .5 would give a J = 0. If the test is perfect, then J = +1. The Youden index is probably most useful where sensitivity and specificity are equally important and where prevalence is close to .5. As these conditions often do not apply, other methods of assessing the value of diagnostic tests have been developed.

The Predictive Summary Index
In most clinical situations when a diagnostic test is applied, the total number of positive test results (TP + FP) (true positive + false positive) and negative test results (TN + FN) (true negative + false negative) is known although the absolute number of TP and TN is not. In this situation, the accuracy of such a test may then be calculated from the positive predictive value (PPV) and negative predictive value (NPV). Several authors have suggested that PPV and NPV are preferable to sensitivity and specificity in clinical practice. Unlike sensitivity and specificity, PPV and NPV are measures of discrimination (or gain). The gain in the certainty that a condition is present is the difference between the posttest probability (the PPV) and the prior probability (the prevalence) when the test is positive. The gain in certainty that there is no disease is the difference between posttest probability of no disease (the NPV) and the prior probability of no disease (1 - Prevalence). This is best illustrated in a Bayesian plot. In the Bayesian plot shown in Figure 1, the pretest probability is plotted in dark shading, and the posttest probability is plotted in gray shading where a test is positive and without shading where the test is negative. Thus, using the example of 80% sensitivity and specificity, the thick black line illustrates the posttest probability given a pretest probability of .5 and thus the overall gain in probability of an accurate diagnosis compared with the baseline probability. In this case, it is pre-post gain +ve (.8 - .5) plus pre-post gain -ve (.5 - .2) = .6.

Figure 1 Pretest and posttest probability, given 80% sensitivity and 80% specificity

Considering the overall benefit of a test from positive to negative, then, the net gain in certainty is a summation of [PPV - Prevalence] + [NPV - (1 - Prevalence)] = PPV + NPV - 1. This is the PSI. The PSI is usually a better measure of applied test performance than the Youden score. However, its strength is also its limitation as it is dependant on the underlying prevalence, reflecting real-world probabilities. This may be an advantage where the performance of a test must be calculated for particular settings, but occasionally it can be a disadvantage where test performances need to be compared across varying settings.

Overall Accuracy (Fraction Correct)
A third approach to calculating accuracy is to measure the overall fraction correct (FC). The overall FC is given by (TP + TN)/(TP + FP + TN + FN) or (A + D)/(A + B + C + D) from Table 1. 1 - FC is the fraction incorrect (or [FP + FN]/[TP + FP + TN + FN]). Arguably, the FC is not as clinically applicable as the PSI because the actual number of TP and TN must be known to calculate overall accuracy. However, if known, FC can be useful because it reveals the real number of correct versus incorrect identifications. It places equal weight on TP and TN, which may be misleading in some circumstances; for example, where an FP leads to retesting but an FN leads to no treatment. Recently, Alex Mitchell proposed a method to aid interpretation of the FC. The fraction correct minus the fraction incorrect might act as a useful “identification index,” which can be converted into a number needed to screen. Thus,

Identification index = FC - (Fraction incorrect)
Identification index = FC - (1 - FC)
Identification index = 2 × FC - 1.

Reciprocal Measures of Accuracy

Number Needed to Diagnose
The reciprocal of Youden's J was suggested as a method to calculate the number of patients who need to be examined in order to correctly detect one person with the disease. This has been called the number needed to diagnose (NND) originally suggested by Bandolier. Thus, NND = 1/[sensitivity - (1 - specificity)]. However, the NND statistic is hampered by the same issues that concern the Youden score, namely, that it is insensitive to variations in prevalence and subject to confusion in cases where sensitivity is high but specificity low (or vice versa). Additionally, the NND becomes artificially inflated as the Youden score approaches 0, and this is misleading because the Youden varies between -1 and +1, not +1 and 0. In short, the reciprocal of Youden's J is not a clinically meaningful number.

Number Needed to Predict
An improvement on the NND is to take the reciprocal of the PSI. This was proposed by Linn and Grunau and called the number needed to predict (NNP), which is the reciprocal of the PSI. Unlike the NND, this does reflect the local conditions of the test, that is, the current prevalence. However, it assumes equal importance of the PPV and NPV and may be prone to error when the sum of the PPV and NPV equals 1.0.

Number Needed to Screen
Mitchell recently suggested a new method called the number needed to screen (NNS) based on the difference between the real number of correctly diagnosed and incorrectly diagnosed patients. The number needed to screen = 1/FC - (Fraction incorrect) or 1/Identification index. Take a hypothetical example of a new screening test for Alzheimer's disease tested in 100 with the condition and 1,000 without which yields a sensitivity of .90 and a specificity of .50. The Youden score is thus .4 and the NND 2.5, suggesting 2.5 individuals are needed to diagnose one person with Alzheimer's disease. In fact, out of every 100 applications of the test, there would be 9 people with Alzheimer's disease (Prevalence × 100) of whom 90% would be true positives (= 8.2) and 81 without Alzheimer's disease (1 - Prevalence × 100) of whom 50% would be negatives (= 45.5). In this example, there would be 53.6 true cases per 100 screened (FC per 100 cases) but at the expense of 46.4 errors (Fraction incorrect) per 100 screened, a net gain of 7.3 identified cases per 100 screened. Thus, the NNS would be 13.75 applications of the test to yield one true case without error.
Unlike the Youden score or the NND, the clinical interpretation of the NNS is meaningful. It is the actual number of cases that need to be screened to yield one additional correct identification (case or noncases) beyond those misidentified. Unlike the Youden score and NND, which equally favor sensitivity and specificity regardless of baseline prevalence, the NNS emphasizes minimum errors taking into account the prevalence (or study sample). The NNS of a test will approach 1 as it reaches perfect accuracy. The unusual but not impossible situation in which a test that yields more errors than correct identifications will have a negative identification index, in which case the magnitude of the NNS can be interpreted as the actual number of cases that need to be screened to yield one additional mistaken identification (case or noncases) beyond those correctly identified. This is akin to the concept of number needed to harm (NNH) and requires no additional calculation in this case.

Discussion
Various methods have been suggested to examine the accuracy of diagnostic (or prognostic) tests when data are presented in a 2 × 2 format. Although the sensitivity, specificity, PPV, and NPV are often used by default, their strengths and weaknesses should be considered. Summary methods are most appropriate where one test must be used for both making and excluding a diagnosis. In many situations, the optimal test for diagnosis (case finding) may be different from the optimal test for exclusion. Therefore, where possible, clinicians should examine rule-in and rule-out accuracy separately and thus not rely on Youden (NND), PSI (NNP), or even the NNS.
Summary measures such as the Youden score and its derivative, the NND, appear to be also limited by scores that are not clinically interpretable. Youden is best confined to pooled and meta-analytic comparison of multiple tests under differing conditions where prevalence varies. The number needed to treat is a clinically interpretable statistic although it is not without problems. Mitchell proposes that the optimal equivalent statistic for diagnosis is the NNS and not the NND. In clinical practice, choice of the optimal screening (diagnostic) or predictive method will also rely on issues of cost, acceptability, and practicality.

Alex J Mitchell

See also Diagnostic Tests; Number Needed to Treat; Receiver Operating Characteristic Curve(ROC Curve)

Further Readings
Altman, D. G. (1998). Confidence intervals for the number needed to treat. British Medical Journal, 317, 1309–1312. Retrieved January 16, 2009, from http://bmj.com/cgi/content/full/317/7168/1309
Bandolier. (1996). How good is that test II. BNET Healthcare, 27, 2. Retrieved December 29, 2008, from http://findarticles.com/p/articles/m...g=artBody;col1
Connell, F. A., & Koepsell, T. D. (1985). Measures of gain in certainty from diagnostic test. American Journal of Epidemiology, 121, 744–753.
Hutton, J. L. (2000). Number needed to treat: Properties and problems. Journal of the Royal Statistical Society: Series A (Statistics in Society), 163(3), 381–402.
Laupacis, A., Sackett, D. L., & Roberts, R. S. (1988). An assessment of clinically useful measures of the consequences of treatment. New England Journal of Medicine, 318, 1728–1733.
Linn, S., & Grunau, P. D. (2006). New patient-oriented summary measure of net total gain in certainty for dichotomous diagnostic tests. Epidemiologic Perspectives & Innovations, 3, 11.
Moons, K. G. M., & Harrell, F. E. (2003). Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. Academic Radiology, 10, 670–672.
Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. In Oxford Statistical Science Series 28. Oxford, UK: Oxford University Press.
Salmi, L. R. (1986). Re: Measures of gain in certainty from a diagnostic test. American Journal of Epidemiology, 123, 1121–1122.
Youden, W. J. (1950). Index for rating diagnostic tests. Cancer, 3, 32–35.

Decisional Analysis and Statistics Performance measure Incidence Rate Prevalence Sensitivity Specificity False-Negative Rate False-Positive Rate Positive Predictive Value Negative PredictiveValue Overall Accuracy/Diagnostic Efficiency Deriving Missing Performance Measures Youden's Index Bayes's Theorem and Modifications Odds and Likelihood Ratios Odds & Likelihood Ratios for Seq.Testing Risk Sensitivity and Risk Specificity Statistics for Use in Quality Control Mean Standard Deviation (SD) Coefficient of Variation (CV) Standard Deviation Interval (SDI) Coefficient of Variation Interval (CVI) Total Allowable Error Chi Square Outcome Comparison of Two Groups Comparison of Two Observers Receiver Oper.Characteristics Plots Z-score Westgard Control Rules Series of Control Rules Evaluating the Medical Literature Assessing the Method.Qual.of Clin.Stud. Measures of the Consequence of Treatment Number Needed to Treat Corr.Risk Ratio & Estimating Relat. Risk Confidence Intervals Confidence Interval for a Single Mean Confid. Interv. for Diff. between Two Means Confidence Interv. for a Single Proport. Confidence Interv. Observations is 0 or 1 Confidence Interv. btwn Odds Ratio Confidence Interv Difference Btwn etc. Odds and Percentages Benefit, Risk and Threshold for an Action Testing and Test Treatment Thresholds Performance Measures Overview: The usefulness of a test is often judged in how well it makes the diagnosis for the presence or absence of a disease. A person with the disease who has a "positive" test is termed a true positive, whereas a person with the disease but a "negative" test result is termed a false negative. A person without disease who has a "positive" result is termed a false positive, while a person without disease having a "negative" result is termed a true negative. In real life things are not always clear cut; the distinction between positive and negative in a test result is sometimes artificial while it is not always possible to say if a person does or does not have a disease. Positive for Disease (+) Negative for Disease (-) Result Positive (+) a = true positive b = false positive Result Negative (-) c = false negative d = true negative TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Incidence Rate Overview: The incidence rate is the number of new cases of a disease in the total population per unit time. incidence rate = ((A) / (a + b + c + d)) / T) where: A = number of new cases of a disease for given time period, which is a subset of all the true positives (a + c) ; (a + b + c + d) = sum of (true positives, false positives, false negatives, true negatives) = total population T = unit of time TOP References: Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983., page 51 Prevalence Overview: Prevalence is all patients with disease divided by all patients tested. This is also termed the "prior probability." prevalence = (a + c) / (a + b + c + d) where: a + c = true positives + false negatives = all people with disease (a + b + c + d) = sum of (true positives, false positives, false negatives, true negatives) = total population TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Sensitivity Overview: Sensitivity is the true-positive test results divided by all patients with the disease. sensitivity = (a / (a + c)) where: a = true positives a + c = true positives+ false negatives = all people with disease Comments The better the seNsitivity of the test, the fewer the false Negatives. TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Specificity Overview: The specificity of a test is the true-negative test results divided by all patients without the disease. specificity = (d / (b + d)) where: d = true negatives (b + d) = sum of ( false positives, true negatives) = all people without disease Comments The better the sPecificity of the test, the fewer the false Positives. TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 False-Negative Rate Overview: The false negative rate for a test is the false-negative test results divided by all patients with the disease. false-negative rate = (c / (a + c)) where: c = false negatives (a + c ) = sum of (true positives, false negatives) = all people with disease TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 False-Positive Rate Overview: The false positive rate for a test is the false-positive test results divided by all patients without the disease. false-positive rate = (b / (b + d)) where: b = false positives (b + d) = sum of (false positives, true negatives) = all people without disease TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Positive Predictive Value Overview: The positive predictive value is true-positive test results divided by all positive test results. This is also referred to as the predictive value of a positive test. This is equivelent to Bayes's formula for post-test probability given a positive result. positive predictive value = (a / (a + b)) where: a = true positives (a + b ) = sum of (true positives, false positives) = all positive test results TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Negative Predictive Value Overview: The negative predictive value is the true-negative test results divided by all patients with negative results. This is also referred to as the predictive value of a negative test. This is equivelent to Bayes's formula for post-test probability given a negative result. negative predictive value = (d / (c + d)) where: d = true negatives (c + d) = sum of ( false negatives, true negatives) = all negative test results TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Panzer RJ, Black ER, Griner PF. Interpretation of diagnostic tests and strategies for their use in quantitative decision making. pages 17-28. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Overall Accuracy, or Diagnostic Efficiency Overview: The overall accuracy of a test is the measure of "true" findings (true-positive + true-negative results) divided by all test results. This is also termed "the efficiency" of the test. overall accuracy = ((a+d) / (a + b + c + d)) where: a + d = true positives + true negatives = all people correctly classified by testing (a + b + c + d) = sum of (true positives, false positives, false negatives, true negatives) = total population TOP References Braunwald E, Isselbacher KJ, et al (editors). Harrison's Principles of Internal Medicine, 11th edition. McGraw-Hill Book Publishers. 1987. page 7 Goldman L. Chapter 10: Quantitative aspects of clinical reasoning. pages 43-48. IN: Isselbacher KJ, Braunwald E, et al. Harrison's Principles of Internal Medicine, Thirteenth Edition. McGraw-Hill. 1994. Clave P, Guillaumes S, Blanco I, et al. Amylase, Lipase, Pancreatic Isoamylase, and Phospholipase A in Diagnosis of Acute Pancreatitis. Clin Chem. 1995; 41:1129-1134. Speicher C, Smith JW Jr.. Choosing Effective Laboratory Tests. WB Saunders. 1983. pages 50-51, and 210 Deriving Missing Performance Measures When Only Some Are Known Overview: If some performance measures for a test are known but others are not, it is often possible to calculate the missing values from those that are known. Key for equations SE = sensitivity SP = specificity PPV = positive predictive value NPV = negative predictive value ACC = accuracy (1)sensitivity = (( 1 + (((NPV^(-1)) - 1) * (((SP^(-1)) - 1)^(-1)) * ((PPV^(-1)) - 1)))^(-1)) (2) specificity = (( 1 + (((PPV^(-1)) - 1) * (((SE^(-1)) - 1)^(-1)) * ((NPV^(-1)) - 1)))^(-1)) (3) positive predictive value = (( 1 + (((SP^(-1)) - 1) * (((NPV^(-1)) - 1)^(-1)) * ((SE^(-1)) - 1)))^(-1)) (4) negative predictive value = (( 1 + (((SE^(-1)) - 1) * (((PPV^(-1)) - 1)^(-1)) * ((SP^(-1)) - 1)))^(-1)) (5) accuracy = ((1 + (((((PPV ^ (-1)) - 1) ^ (-1) ) + (((SP ^ (-1)) - 1) ^ (-1)))^(-1)) + (((((SE ^ (-1)) - 1) ^ (-1) ) + (((NPV ^ (-1) ) - 1) ^ (-1)))^(-1))) ^ (-1)) (6) positive predictive value = ((1 + (((SE ^ (-1)) - (ACC ^ (-1))) * (((((ACC ^ (-1)) - 1) * (((SP ^ (-1)) - 1) ^ (-1))) - 1) ^(-1)))) ^ (-1)) where: The equation does not apply if SE = SP = ACC (7) sensitivity = ((1 + (((PPV ^ (-1)) - (ACC ^ (-1))) * (((((ACC ^ (-1)) - 1) * (((NPV ^ (-1)) - 1) ^ (-1))) - 1) ^(-1)))) ^ (-1)) where: The equation does not apply if PPV = NPV = ACC (8) specificity = (( 1 + ((((SE ^ (-1)) + (PPV ^ (-1)) - (ACC ^ (-1)) - 1) ^ (-1)) * ((ACC ^ (-1)) - 1) * ((PPV ^ (-1)) -1))) ^ (-1)) (9) positive predictive value = (( 1 + ((((SP ^ (-1)) + (NPV ^ (-1)) - (ACC ^ (-1)) - 1) ^ (-1)) * ((ACC ^ (-1)) - 1) * ((SP ^ (-1)) -1))) ^ (-1)) (10) specificity = ((((((ACC ^ (-1)) - 1) * (((SE ^ (-1)) - 1) ^ (-1)) * ((NPV ^ (-1)) - 1)) + (ACC ^ (-1)) - (NPV ^ (-1)) + 1)) ^ (-1)) (11) sensitivity = ((((((ACC ^ (-1)) - 1) * (((SP ^ (-1)) - 1) ^ (-1)) * ((PPV ^ (-1)) - 1)) + (ACC ^ (-1)) - (PPV ^ (-1)) + 1)) ^ (-1)) Parameter Known? Equations to SE SP PPV NPV ACC Apply Y Y Y N N 4 (NPV) 5 (ACC) Y Y N Y N 3 (PPV) 5 (ACC) Y Y N N Y 6 (PPV) 4 (NPV) Y N Y Y N 2 (SP) 5 (ACC) Y N Y N Y 8 (SP) 4 (NPV) Y N N Y Y 10 (SP) 3 (PPV) N Y Y Y N 1 (SE) 5 (ACC) N Y Y N Y 11 (SE) 4 (NPV) N Y N Y Y 9 (PPV) 1 (SE) N N Y Y Y 7 (SE) 2 (SP) Implementation Notes Some of the equation numbers differ from that in Einstein et al (1997). Equations with similar structure are grouped together Substituting variables for some of the more complex structures makes implementing the equations somewhat easier. TOP References: Einstein AJ, Bodian CA, Gil J. The relationship among performance measures in the selection of diagnostic tests. Arch Pathol Lab Med. 1997; 121: 110-117. Youden's Index Overview: Youden's index is one way to attempt summarizing test accuracy into a single numeric value. Youden's index = 1 - ((false positive rate) + (false negative rate)) = 1 - ((1 - (sensitivity)) + (1 - (specificity))) = (sensitivity) + (specificity) - 1 It may also be expressed as: Youden's index = ( a / (a + b)) + (d / (c + d)) - 1 = ((a * d) - (b * c)) / ((a + b) * (c + d)) where: � a + b = people with disease � c + d = people without disease � a = people with disease identified by test (true positive) � b = people with disease not identified by test (false negatives) � c = people without disease identified by test (false positives) � d = people without disease not identified by test (true negatives) Interpretation � minimum index: -1 � maximum index: +1 � A perfect test would have a Youden index of +1. Limitation � The index by itself would not identify problems in sensitivity or specificity. TOP References: Hausen H. Caries prediction - state of the art. Community Dentistry and Oral Epidemiology. 1997; 25: 87-96. Hilden J, Glasziou P. Regret graphs, diagnostic uncertainty and Youden's index. Statistics in Medicine. 1996; 15: 969-986. Youden WJ. Index for rating diagnostic tests. Cancer. 1950; 3: 32-35. Bayes's Theorem and Modifications Bayes's Theorem Overview: Bayes's theorem gives the probability of disease in a patient being tested based on disease prevalence and test performance. post-test probability disease present given a positive test result = = ((pretest probability that disease present) * (probability test positive if disease present)) / (((pretest probability that disease present) * (probability test positive if disease present)) + ((pretest probability that disease absent) * (probability test positive if disease absent))) post-test probability disease present given a negative test result = = ((pretest probability that disease present) * (probability test negative if disease present)) / (((pretest probability that disease absent) * (probability disease absent when test negative)) + ((pretest probability that disease present) * (probability test negative if disease present))) Variable Alternative Statement pretest probability that disease present prevalence probability test positive if disease present sensitivity pretest probability that disease absent (1 - (prevalence)) probability test positive if disease absent false positive rate = (1 - (specificity)) probability test negative if disease present false negative rate = (1 - (sensitivity)) probability disease absent when test negative specificity Bayes's formula can also be expressed in the positive and negative predictive values: post-test probability given a positive result = = positive predictive value = = (true positives) / (all positives) = = (true positives) / ((true positives) + (false positives)) post-test probability given a negative result = = negative predictive value = = (false negatives) / (all negatives) = = (false negatives) / ((true negatives) + (false negatives)) Limitations of Bayes's theorem � Bayes's theorem assumes test independence, which may not occur if multiple tests are used for diagnosis TOP References Einstein AJ, Bodian CA, Gil J. The relationship among performance measures in the selection of diagnostic tests. Arch Pathol Lab Med. 1997; 121: 110-117. Nicoll D, Detmer WM. Chapter 1: Basic principles and diagnostic test use and interpretation. pages 1 - 16. IN: Nicoll D, McPhee SJ, et al. Pocket Guide to Diagnostic Tests, Second Edition. Appleton & Lange. 1997. Noe DA. Chapter 3: Diagnostic Classification. pages 27-43. IN: Noe DA, Rock RC (Editors). Laboratory Medicine. Williams and Wilkins. 1994. Schultz EK. Chapter 14: Analytical goals and clinical interpretation of laboratory proceedures, pages 485-507. IN: Burtis C, Ashwood E. Tietz Textbook of Clinical Chemistry, Second edition. W.B. Saunders Company. 1994. Scott TE. Chapter 2: Decision making in pediatric trauma. pages 20-40. IN: Ford EG, Andrassy RJ. Pediatric Trauma - Initial Assessment and Management. W.B. Saunders. 1994 Suchman AL, Dolan JG. Odds and likelihood ratios. pages 29-34. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Weissler AM. Chapter 11: Assessment and use of cardovascular tests in clinical prediction. pages 400-421. IN: Giuliani ER, Gersh BJ, et al. Mayo Clinic Practice of Cardiology, Third Edition. Mosby. 1996 Odds and Likelihood Ratios Overview: One form of Bayes's theorem is to calculate the post-test odds for a disorder from the pre-test odds and performance characteristics for the test. odds ratio = (probability of disease) / (1 - (probability of disease)) likelihood ratio = (probability of a test result in a person with the disease) / (probability of a test result in a person without the disease) post-test odds = (pre-test odds) * (likelihood ratio) where: � disease prevalence in the population can be used as the pretest odds � likelihood ratios can be expressed in terms of the sensitivity and specificity of the test for the diagnosis � positive likelihood ratio is the likelihood ratio for a positive test result; it is the true-positive rate divided by the false-positive rate, or (sensitivity) / (1 - (specificity)) � negative likelihood ratio is the likelihood ratio for a negative test result; it is the false negative rate divided by the true negative rate, or (1 - (sensitivity)) / (specificity) post-test odds that the person has the disease if there is a positive test result = (pre-test odds) * (positive likelihood ratio) post-test odds that the person has the disease if there is a negative test result = (pre-test odds) * (negative likelihood ratio) TOP Calculating Post-Test Odds Step 1: Calculate the positive and negative likelihood ratios for the test � positive likelihood ratio = = (sensitivity) / (1 - (specificity)) � negative likelihood ratio = = (1 - (sensitivity)) / (specificity) Step 2: Convert the prior probability to prior odds: ((prior probability) * 10) : ((1 - (prior probability)) * 10) Step 3: Multiply the prior odds by the likelihood ratios to obtain the post-test odds � ((positive likelihood ratio) * (prior probability) * 10) : ((1 - (prior probability)) * 10) � ((negative likelihood ratio) * (prior probability) * 10) : ((1 - (prior probability)) * 10) Step 4: Convert the post-test odds to post-test probabilities � positive post-test probability = = ((positive likelihood ratio) * (prior probability) * 10) / (((positive likelihood ratio) * (prior probability) * 10) + ((1 - (prior probability)) * 10)) � negative post-test probability = = ((negative likelihood ratio) * (prior probability) * 10) / (((negative likelihood ratio) * (prior probability) * 10) + ((1 - (prior probability)) * 10)) TOP References: Einstein AJ, Bodian CA, Gil J. The relationship among performance measures in the selection of diagnostic tests. Arch Pathol Lab Med. 1997; 121: 110-117. Noe DA. Chapter 3: Diagnostic Classification. pages 27-43. IN: Noe DA, Rock RC (Editors). Laboratory Medicine. Williams and Wilkins. 1994. Scott TE. Chapter 2: Decision making in pediatric trauma. pages 20-40. IN: Ford EG, Andrassy RJ. Pediatric Trauma - Initial Assessment and Management. W.B. Saunders. 1994 Suchman AL, Dolan JG. Odds and likelihood ratios. pages 29-34. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Weissler AM. Chapter 11: Assessment and use of cardovascular tests in clinical prediction. pages 400-421. IN: Giuliani ER, Gersh BJ, et al. Mayo Clinic Practice of Cardiology, Third Edition. Mosby. 1996 Odds and Likelihood Ratios for Sequential Testing Overview: If more than one test or finding is used for diagnosis, the final post-test probability can be calculated by combining the likelihood ratio for each test. post-test odds = (pre-test odds) * (likelihood ratio for test 1) * (likelihood ratio for test 2) * .... * (likelihood ratio for test n) Limitation � For valid results, tests must be conditionally independent of each other, where conditionally independent indicates that the results of the tests are not associated with each other. � If conditionally dependent tests are used, then the calculated post-test probability will be over-estimated. TOP References: Nicoll D, Detmer WM. Chapter 1: Basic principles and diagnostic test use and interpretation. pages 1 - 16. IN: Nicoll D, McPhee SJ, et al. Pocket Guide to Diagnostic Tests, Second Edition. Appleton & Lange. 1997. Schultz EK. Chapter 14: Analytical goals and clinical interpretation of laboratory proceedures, pages 485-507. IN: Burtis C, Ashwood E. Tietz Textbook of Clinical Chemistry, Second edition. W.B. Saunders Company. 1994. Suchman AL, Dolan JG. Odds and likelihood ratios. pages 29-34. IN: Panzer RJ, Black ER, et al. Diagnostic Strategies for Common Medical Problems. American College of Physicians. 1991. Weissler AM. Chapter 11: Assessment and use of cardovascular tests in clinical prediction. pages 400-421. IN: Giuliani ER, Gersh BJ, et al. Mayo Clinic Practice of Cardiology, Third Edition. Mosby. 1996 Risk Sensitivity and Risk Specificity Overview: Risk sensitivity and specificity can be used to evaluate how good a risk factor is for predicting mortality in the population. � Risk sensitivity is the proportion of people who die during the follow-up period who were identified as high risk. � Risk specificity is the proportion of people who survive during the follow-up period who were identified as low risk. Patient subgroups � high risk fraction = those with risk factor � low risk fraction = those without risk factor risk sensitivity in percent = (mortality for high risk subgroup in percent) * (percent of population identified as high risk) / (cumulative mortality in percent for the whole population) risk specificity in percent = (survival for low risk subgroup in percent) * (percent of population identified as low risk) / (cumulative survival in percent for the whole population) percent of population in high risk group = 100 - (percent of population in low risk group) cumulative survival of high risk group = 100 - (cumulative mortality of high risk group) cumulative survival of low risk group = 100 - (cumulative mortality of low risk group) cumulative survival of population = 100 - (cumulative mortality of population) TOP References: Weissler AM. Chapter 11: Assessment and use of cardovascular tests in clinical prediction. pages 400-421. IN: Giuliani ER, Gersh BJ, et al. Mayo Clinic Practice of Cardiology, Third Edition. Mosby. 1996 Statistics for the Normal Distribution and Use in Quality Control Mean of Values in a Normal Distribution Overview: If data follows a normal Gaussian distribution, then the mean of the data can be calculated. mean of values = (sum of all values) / (number of values) TOP References: Woo J, Henry JB. Chapter 6: Quality management. pages 125-136 (128). IN: Henry JB (editor-in-chief). Clinical Diagnosis and Management by Laboratory Methods, 19th edition. WB Saunders.1996. Standard Deviation (SD) Overview: The standard deviation is a measure of the dispersion of data about the mean. standard deviation = absolute value [square root of the variance] where � variance = ((sum of ((each value) - (mean of values))^2) / ((number of values) - 1)) TOP References: Barnett RN. Clinical Laboratory Statistics, Second Edition. Little, Brown and Company. 1979. page 4 Woo J, Henry JB. Chapter 6: Quality management. pages 125-136 (128). IN: Henry JB (editor-in-chief). Clinical Diagnosis and Management by Laboratory Methods, 19th edition. WB Saunders.1996. Coefficient of Variation (CV) Overview: The coefficient of variation for a test (CV) gives a true picture of deviation regardless of the nature of the measurement or the methodology. CV (expressed as a percent) = ((standard deviation) * 100 / (mean)) TOP References: Dharan, Murali. Total Quality Control in the Clinical Laboratory. C.V. Mosby Co. 1977. page 22 Standard Deviation Interval (SDI) Overview: The Standard Deviation Interval gives information about how a given laboratory's mean differs from the mean of a group of comparable laboratories, taking into account the variation among the laboratories. This is also called the Standard Deviation Index. This is a measure of accuracy. SDI = (((mean) - (average of all means)) / (standard deviation of all means)) where: � average of all means = ((sum of all means) / (number of means)) Interpretation � Values > +2.0 or < -2.0 need to be investigated. TOP References: College of American Pathologists QAS Program Coefficient of Variation Interval (CVI) Overview: The CVI is a measure of precision., but it is difficult to get a good definition of. It also may be called the Coefficient of Variation Index, or the CVR. CVI = (CV for laboratory) / (pool CV) or (CV for laboratory for time period) / (peer group CV for time period) Interpretation: � Values > +2.0 or < -2.0 need to be investigated. TOP References: The Interlaboratory Quality Assurance Program. Coulter Diagnostics. 1988. Total Allowable Error Overview: Analysis of the total allowable error (TEa) can help a laboratory meet its goals for precision performance. Variables laboratory mean = mean at laboratory for period of stability in reagents & controls "true" mean = mean for all methods & laboratories laboratory standard deviation = standard deviation noted at laboratory method standard deviation = standard deviation reported by vendor CLIA limit � given as a range, either using a percent or an absolute value � if both specified, use whichever is greater Calculations calculated bias = laboratory's deviation (based on site and method) from mean of all sites = ((laboratory mean) - (true mean)) laboratory imprecision = (factor) * (laboratory standard deviation) where � factor is 1.96 for 95% � factor is 2.50 for 99% Total allowable error = TEa = ((CLIA limit) * (true mean)) bias as percent of CLIA limit = ((calculated bias) / ((CLIA limit) * (true mean))) = (((laboratory mean) - (true mean)) / (total allowable error)) = ( ((laboratory mean) - (true mean)) / ((CLIA limit) * (true mean))) total error = calculated bias + imprecision = (((laboratory mean) - (true mean)) + (laboratory imprecision)) ( ((laboratory mean) - (true mean)) + ((factor) * (laboratory standard deviation))) assessment of performance = (total error) / (total allowable error) * 100 = (((laboratory mean) - (true mean))+ (laboratory imprecision)) / (((CLIA limit) * (true mean)))* 100 = (((laboratory mean) - (true mean))+ (1.96 * (laboratory standard deviation))) / (((CLIA limit) * (true mean))) * 100 systemic error (critical) = SEc = ( ( ( (total allowable error) - (calculated bias) ) / (laboratory standard deviation) ) - 1.65) = ( ( ( ((CLIA limit) * (true mean)) - ((laboratory mean) - (true mean))) / (laboratory standard deviation)) - 1.65) Use the systemic error for selection of QC control rules to use � standard deviation to use = ((calculated standard deviation) * ((denominator of primary rule) / 2)) Example: If using 1:3s rule, where the denominator = 3 standard deviation to use = (laboratory standard deviation) * (3 / 2) = 1.5 * (laboratory standard deviation) TOP References Blanchard J-M, O'Grady M. Applicationof the Westgard quality control selection grids (QCSG) to the Kodak Ektachem 700 analyzer. Abstract presented athte 43rd AACC National Meeting, Washington, DC. July 30-August 1, 1991. Westgard JO, Bawa N, et al. Laboratory precision performance. Arch Pathol Lab Med. 1996; 120: 621-625. Westgard JO. Error budgets for quality management: Practical tools for planning and assuring the analytical quality of laboratory testing processes. Clinical Laboratory Management Review. July/August, 1996. pages 377-403. Westgard JO. Chapter 150: Planning statistical quality control procedures. pages 1191-1200. IN: Rose NR, de Macario EC, et al (editors). Manual of Clinical Laboratory Immunology, Fifth Edition. ASM Press. 1997. Chi Square Outcome Comparison of Two Groups Overview: When 2 different groups receive different treatment, the number of each group improved and not improved can be compared as follows: group A improved group B improved total improved group A not improved group B not improved total not improved total group A total group B total patients This data shows 1 degree of freedom. chi square value using Yates correction for 1 degree of freedom = ((total number) * ((ABS(((number of group A improved) * (number group B not improved)) - ((number of group A not improved) * (number of group B improved))) - ((total number) / 2))^2)) / (((number of group A improved) + (number of group B improved)) * ((number of group A not improved) + (number of group B not improved)) * (total number of group A) * (total number of group B)) From the chi-square value, it is the probability that a difference is due to chance can be calculated. The Excel function CHIDIST will give the probability of the difference being due to chance for the chi-square value. The probability that the difference is not due to chance is then (1 - (probability due to chance)). TOP References: Barnett RN. Clinical Laboratory Statistics, 2nd edition. Little, Brown and Company. 1979. pages 26-29 Beyer WH. CRC Standard Mathematical Tables, 25th edition. CRC Press. 1978. page 537 Keeping ES. Introduction to Statistical Inference. Dover Publications. 1995 printing of 1962 work. pages 314-322 Comparison of Two Observers Overview: When 2 observers tally data from the same material, it is useful to see whether the differences in their tabulations is due to chance or due to observer variation. For more than 2 observations: chi square = (summation from 1 to number of observations ( (((observer A value) - (observer B value)) ^ 2) / ((observer A value) + (observer B value)) ) ) From this value, the probability that the differences between the 2 observers is due to chance can be calculated. The equation can be simplified from an integral depending on whether there is an even or odd degree of freedom. Even Degrees of Freedom For even degrees of freedom, this is relatively simple. probability due to chance (chisquare, degree of freedom) = ((e) ^ ((-1) * (chisquare) / 2)) * (summation of i from 0 to I ( (((chisquare) / 2) ^ (i)) / (factorial (i))) where: � I = (1/2 * ((degree of freedom) - 2)) 2 degrees of freedom probability = e^((-1) * (chisquare) / 2) 4 degrees of freedom probability = (e^((-1) * (chisquare) / 2)) * (1 + ((chisquare) / 2) ) 6 degrees of freedom probability = (e^((-1) * (chisquare) / 2)) * (1 + ((chisquare) / 2))+ (((chisquare) ^2) / 8)) 8 degrees of freedom probability = (e^((-1) * (chisquare) /2 )) * (1 + ((chisquare) / 2))+ (((chisquare) ^2) / 8) + + (((chisquare) ^3) / 48)) 10 degrees of freedom probability = (e^((-1) * (chisquare) / 2)) * (1 + ((chisquare) / 2))+ (((chisquare) ^2) / 8)+ (((chisquare) ^3) / 48) + (((chisquare) ^4) / 384)) Odd Degrees of Freedom For odd degrees of freedom, this is quite complex, and it is easier to use the Excel function CHIDIST. probability due to chance (chisquare, degree of freedom) = 1 - ( (1 / (gamma function (I + 1))) * (summation from 0 to infinity (((-1)^i) * (((chisquare) / 2) ^ (I + i + 1)) / ((factorial (i)) * (I + i + 1))) where � I = (1/2 * ((degree of freedom) - 2)) TOP References: Barnett RN. Clinical Laboratory Statistics, 2nd edition. Little, Brown and Company. 1979. pages 26-29 Beyer WH. CRC Standard Mathematical Tables, 25th edition. CRC Press. 1978. page 537 Keeping ES. Introduction to Statistical Inference. Dover Publications. 1995 printing of 1962 work. pages 314-322 Test Comparison Using Receiver Operating Characteristics (ROC) Plots Overview: The Receiver Operating Curve (ROC) originated during World War II with the use of radar in signal detection. This was extended to the use of diagnostic tests for identifying disease states, using plots of sensitivity versus specificity for different test results. The area under a ROC curve serves as a measure of the diagnostic accuracy (discrimination performance) for a test. Receiver Operating Curve To generate a receiver operating curve it is first necessary to determine the sensitivity and specificity for each test result in the diagnosis of the disorder in question. The x axis ranges from 0 to 1, or 0% to 100%, and can be either the � false positive rate (1 - (specificity)), or � true negative rate (specificity) The false positive rate is the one typically used. The y axis ranges from 0 to 1, or 0% to 100% � true positive rate (sensitivity), with range 0 to 1 (or 0 to 100%) When the x-axis is the false positive rate (1 - (specificity)), the curve starts at (0,0) and increases towards (1,1). When the x-axis is the true negative rate (specificity), the curve starts at (0, 1) and drops towards (1, 0). The endpoints for the curve will run to these points. Area under Curve One way of measuring the area under a curve is by measuring subcomponent trapezoids. Data points can be connected by straight lines defined by: y = ((slope) * x) + intercept The area under each line can be determined by integration of (y * dx) over the interval of x1 to x2: area = (((slope) / 2) * ((x2 ^ 2) - (x1 ^2))) + ((intercept) * (x2 - x1)) By summating the areas under each segment, an approximation of the area under the entire curve can be reached. However, the trapezoidal method tends to underestimate areas (Hanley, 1983), so that other techniques for measuring area should be used if greater accuracy is required. The maximum area under ROC curve is 1 and is seen with the ideal test. The closer the area under the ROC curve is to 1, the better (more accurate) the test. Comparison of Two Methods Two methods can be compared by the area under their respective ROC curves. The method with the larger area under the ROC curve is preferable over one with a smaller area, allowing for variability, as being more accurate. TOP References: Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psych. 1975; 12: 387-415. Beck JR, Shultz EK. The use of relative operating characteristic (ROC) curves in test performance evaluation. Arch Pathol Lab Med. 1986; 110: 13-20. Dorfman DD. Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals - rating method data. J Math Psychol. 1969; 6: 487-496. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143: 29-36. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristics curves derived from the same cases. Radiology. 1983; 148: 839-843. Henderson AR. Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. Ann Clin Biochem. 1993; 30: 521-539. Henderson AR, Bhayana V. A modest proposal for the consistent presentation of ROC plots in Clinical Chemistry (Letter to the Editor). Clin Chem. 1995; 41: 1205-1206. Lett RR, Hanley JA, Smith JS. The comparison of injury severity instrument performance using likelihood ratio and ROC curve analysis. J Trauma. 1995; 38: 142-148. Pellar TG, Leung FY, Henderson AR. A computer program for rapid generation of receiver operating characteristic curves and likelihood ratios in the evaluation of diagnostic tests. Ann Clin Biochem. 1988; 25: 411-416. Pritchard ML, Woosley JT. Comparison of two prognostic models predicting survival in patients with malignant melanoma. Hum Pathol. 1995; 26: 1028-1031. Raab SS, Thomas PA, et al. Pathology and probability: LIkelihood ratios and receiver operating characteristic curves in the interpretation of bronchial brush specimens. Am J Clin Pathol. 1995; 103: 588-593. Schoonjans F, Depuydt C, Comhaire F. Presentation of receiver-operating characteristic (ROC) plots (Letter to the Editor). Clin Chem. 1996; 42: 986-987. Shultz EK. Multivariate receiver-operating characteristic curve analysis: Prostate cancer screening as an example. Clin Chem. 1995; 41: 1248-1255. Vida S. A computer program for non-parametric receiver operating characteristic analysis. Comput Meth Prog Biomed. 1993; 40: 95-101. Zweig MH. Evaluation of the clinical accuracy of laboratory tests. Arch Pathol Lab Med. 1988; 112: 383-386. Zweig MH, Campbell G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin Chem. 1993; 39: 561-577. Zweig MH, Ashwood ER, et al. Assessment of the clinical accuracy of laboratory tests using receiver operating characteristics (ROC) plots: Approved guideline. NCCLS. 1995; 15 (19). Z-score Overview: The Z-score can be used to put a patient result in perspective with reference values from a control population. It basically gives the number of standard deviations that a given value is from the reference population mean. Z-score = ((patient value) - (mean for reference population)) / (standard deviation for reference population) This appears to be share features with the Standard Deviation Interval (SDI). TOP References Withold W, Schulte U, Reinauer H. Methods for determination of bone alkaline phosphatase activity: analytical performance and clinical usefulness in patients with metabolic and malignant bone diseases. Clin Chem. 1996; 42: 210-217. Westgard Rules and the Multirule Shewhart Procedure Westgard Control Rules Overview: Westgard et al have proposed a series of multiple rules (multirules) for interpreting quality control data. The rules are sensitive to random and systemic errors, and they are selected to keep the probability of false rejection low. Procedure 1. Starting with a stable testing system and stable control material, a control material is analyzed for at least 20 different days. This data is used to calculate a mean and standard deviation for the control material. 2. Usually 2 control materials are analyzed (one with a low value, one with a higher value in the analytical range). Sometimes 3 or more control materials may be used, and rarely only 1. 3. The controls are included with each analytical run of the test system. 4. A Levey-Jennings control chart is prepared to graphically represent the data for each control relative to the mean and multiples of the standard deviation. 5. With each analytical run, the pattern of the current and previous control results are analyzed using all of the selected Westgard control rules. 6. If none of the rules fail, then the run is accepted. If one or more rules fail, then different responses may occur. This may include rejecting the run, adjusting the stated mean, and/or recalibrating the test. Westgard Control Rule Definition 1:2S control result is outside +/- 2 standard deviations of the mean 1:3S control result is outside +/- 3 standard deviations of the mean 2:2S 2 consecutive control results are more than 2 standard deviations from the mean R:4S either (a) one control is more than 2 SD above mean and other is more than 2 SD below the mean; or (b) the range between 2 controls exceeds 4 SD 4:1S the last 4 consecutive control results are all either 1 SD above or below the mean 10:X the last 10 consecutive control results all lie on the same side of the mean Rule Failure Systemic Error Random Error 1:3S yes yes 2:2S yes R:4S yes 4:1S yes 10:X yes TOP References: Lott JA. Chapter 18: Process control and method evaluation. pages 293-325 (Figure 18-4, page 302). IN: Snyder JR, Wilkinson DS. Management in Laboratory Medicine, Third Edition. Lippincott. 1998. Westgard JO. Chapter 150: Planning statistical quality control procedures. pages 1191-1200. IN: Rose NR, de Macario EC, et al (editors). Manual of Clinical Laboratory Immunology, Fifth Edition. ASM Press. 1997. Westgard JO, Klee GG. Chapter 17: Quality management. pages 384-418. IN: Burtis CA, Ashwood ER. Tietz Textbook of Clinical Chemistry, Third Edition. WB Saunders Company. 1999 (1998). Using a Series of Control Rules in the Multirule Shewhart Procedure Overview: Westgard et al have developed a series of rules for evaluating controls which can be used to judge if the data from an analysis is acceptable. The results of the rule analysis can be employed sequentially in a multirule Shewhart procedure to determine whether to accept or reject an analytic run. Westgard Rule Failed? Yes No 1:2S go to next rule in control, accept run 1:3S out of control, reject run go to next rule 2:2S out of control, reject run go to next rule R:4S out of control, reject run go to next rule 4:1S out of control, reject run go to next rule 10:X out of control, reject run in control, accept run TOP References: Lott JA. Chapter 18: Process control and method evaluation. pages 293-325 (Figure 18-4, page 302). IN: Snyder JR, Wilkinson DS. Management in Laboratory Medicine, Third Edition. Lippincott. 1998. Westgard JO, Barry PL, Hunt MR. A multi-rule Shewhart chart for quality control in clinical chemistry. Clin Chem. 1981; 27: 493-501. Evaluating Reports in the Medical Literature Criteria for Assessing the Methodologic Quality of Clinical Studies Overview: The methodologic quality of a clinical study or trial can be evaluated by examining its design and implementation. A score based on the key parameters can be used to evaluate the study and to compare it with other similar studies. Parameters (1) randomization (2) blinding (3) analysis (4) patient selection (5) comparability of groups at baseline (6) extent of follow-up (7) description of treatment protocol (8) cointerventions (9) description of outcomes Parameter Finding Points randomization not concealed or not sure 1 concealed randomization 2 blinding not blinded 0 adjudicators blinded 2 analysis other 0 intention to treat 2 patient selection selected patients or unable to tell 0 consecutive eligible patients 1 comparability of groups at baseline no or not sure 0 yes 1 extent of follow-up < 100% 0 100% 1 treatment protocol poorly described 0 reproducibly described 1 cointerventions (extent to which interventions applied equally across groups) not described 0 described but not equal or not sure 1 well described and all equal 2 outcomes not described 0 partially described 1 objectively defined 2 Interpretation � minimum score: 0 � maximum score: 14 � The higher the score, the higher the quality in the study design and implementation. TOP References: Heyland DK, Cook D, et al. Maximizing oxygen delivery in critically ill patients: a methodologic appraisal of the evidence. Crit Care Med. 1996; 24: 517-524. Heyland DK, MacDonald S, et al. Total parenteral nutrition in the critically ill patient. JAMA. 1998; 280: 2013-2019. Measures of the Consequences of Treatment Number Needed to Treat Overview: The number needed to treat is a simple method of looking at the benefit of a treatment intervention to prevent a condition or complication. It is the inverse of the absolute risk reduction for the treated versus untreated control populations. It can be used to extrapolate findings in the literature to a given patient at an arbitrary specified baseline risk when the relative risk reduction associated with treatment is constant for all levels of risk. Variables: � number of people in control group � number of people in control group who develop condition of interest during time interval � number of people in active treatment group � number of people in active treatment group who develop condition of interest during time interval event rate in control group = (number of people in control group with condition) / (number of people in control group) event rate in active treatment group = (number of people in active treatment group with condition) / (number of people in active treatment group) relative risk reduction = ((event rate in control group) - (event rate in active treatment group)) / (event rate in control group) absolute risk reduction = (event rate in control group) - (event rate in active treatment group) number needed to treat = 1 / (absolute risk reduction) = 1 / ((event rate in control group) - (event rate in active treatment group)) Interpretation � The number needed to treat indicates the number of patients who need to be treated to prevent the condition of interest during the time interval. � The smaller the number needed to treat, the greater the benefit of the treatment to prevent the condition. � The number needed to treat should be considered together with other factors such as the seriousness of the condition to be prevented and the risk of adverse side effects from the treatment. TOP References: Altman DG. Confidence intervals for the number needed to treat. BMJ. 1998; 317: 13091312. Cook RJ, Sackett DL. The number needed to treat: a clinically useful measure of treatment effect. BMJ. 1995; 310: 452-454. Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. N Engl J Med. 1988; 318: 1728-1733. The Corrected Risk Ratio and Estimating Relative Risk Overview: The corrected risk ratio can be used to derive an estimate of an association or treatment effect that better represents the true relative risk. Odds ratio and relative risk (see Figure on page 1690 of Zhang and Yu) � If the incidence of an outcome in the study population is < 10%, then the odds ratio is close to the risk ratio. � As the incidence of the outcome increases, the odds ratio overestimates the relative risk if it is more than 1, or underestimates the relative risk is less than 1. Situations when desirable to perform correction � if the incidence of the outcome in the nonexposed population is more than 10%, AND � if the odds ratio is > 2.5 or < 0.5 incidence of outcome in nonexposed group = N = (number with outcome in nonexposed group) / (number in nonexposed group) incidence of outcome in exposed group = E = (number with outcome in exposed group) / (number in exposed group) risk ratio = E / N odds ratio = (E / (1 - E)) / (N / (1 - N)) E / N = (odds ratio) / [(1 - N) + (N * (odds ratio))] corrected risk ratio = (odds ratio) / [(1 - N) + (N * (odds ratio))] This equation can be used to correct the adjusted odds ratio obtained from logistic regression. TOP References Wacholder S. Binomail regression in GLIM: Estimating risk ratios and risk differences. Am J Epidemiol. 1986; 123: 174-184. Zhang J, Yu KF. What's the relative risk? A method for correcting the odds ratio in cohort studies of common outcomes. JAMA. 1998; 280: 1690-1691. Confidence Intervals Confidence Interval for a Single Mean Overview: The confidence interval for a series of findings can be calculated from the number of values, the mean, standard deviation and standard statistical tables. Data assumptions: single mean, symmetrical distribution confidence interval = (mean) +/- ((one-tailed value of Student's t distribution) * (standard deviation) / ((number of values) ^ (0.5)) where: � for a 95% confidence interval, the one-tailed value is for 2.5% (F 0.975, t 0.025) � degrees of freedom = (number of values) - 1 � as the number of values increases, the closer the one-tailed value for t=0.025 approaches 1.96; at 120 degrees of freedom it is 1.98 TOP References: Beyer WH. CRC Standard Mathematical Tables, 25th Edition. CRC Press. 1978. Section: Probability and Statistics. Percentage points, Student's t-distribution. page 536. Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Confidence Interval for the Difference Between Two Means Overview: The confidence interval for the observed difference in the means for two sets of data can be calculated from standard statistical tables and data characteristics (number of values, mean, standard deviation) for the two data sets. Data assumptions: 2 sets of data with symmetrical distribution confidence interval for the difference in the means between 2 sets of data = ABS((mean first group) - (mean second group)) +/- (factor) factor = (one-sided value of Student's t-distribution) * (pooled standard deviation) * (((1 / (number in first set)) + (1 / (number in second set))) ^ (0.5)) degrees of freedom = (number in first set) + (number in second set) - 2 pooled standard deviation = ((A + B) / (degrees of freedom)) ^ (0.5) A = ((number in first set) - 1) * ((standard deviation of first set) ^ 2) B = ((number in second set) - 1) * ((standard deviation of second set) ^ 2) where: � for a 95% confidence interval, the one-tailed value is for 2.5% (F 0.975, t 0.025) � as the number of values increases, the closer the one-tailed value for t=0.025 approaches 1.96; at 120 degrees of freedom it is 1.98 TOP References: Beyer WH. CRC Standard Mathematical Tables, 25th Edition. CRC Press. 1978. Section: Probability and Statistics. Percentage points, Student's t-distribution. page 536. Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Confidence Interval for a Single Proportion Overview: When a certain event occurs several times in a series of observations, then its proportion and confidence interval can be calculated. Variables � N observations � X events of interest Distribution used � F distribution, with F = 0.975 for the 95% confidence interval � uses m and n as degrees of freedom proportion of events = X / N lower limit for the 95% confidence interval = X / (X + ((N - X + 1) * (F distribution for m and n))) where � m = 2 * (N - X + 1) � n = 2 * X upper limit for the 95% confidence interval = ((X + 1) * (F distribution for m and n)) / (N - X + ((X + 1) * (F distribution for m and n))) where � m = 2 * (X + 1) = n + 2 � n = 2 * (N - X) = m - 2 TOP References Beyer WH. CRC Standard Mathematical Tables, 25th Edition. CRC Press. 1978. Section: Probability and Statistics. F-distribution. page 540. Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Confidence Interval When the Proportion in N Observations is 0 or 1 Overview: If either 0 or n events occur in n observations, then the limits of the confidence interval can be calculated based on the confidence interval and the number of observations. X = 1 - ((confidence interval in percent) / 100) If 0 events occur in n observations � lower limit for the confidence interval: 0 � upper limit for the confidence interval: 1 - ((X/2) ^ (1/n)) If n events occur in n observations � lower limit for the confidence interval: ((X/2)) ^ (1/n)) � upper limit for the confidence interval: 1 (100%) TOP References Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Confidence Interval for the Difference Between Two Proportions Based on the Odds Ratio Overview: When comparing two populations for an event, the odds ratio and 95% confidence intervals can be calculated from looking at the number in each group positive and negative for the event. Group 1 Group 2 Negative A B Positive C D (Table page 316, Young 1997) odds for the event in group 1 = C / A odds for the event in group 2 D / B odds ratio for group 2 relative to group 1 = (odds group 2) / (odds group 1) = (A * D) / (B * C) confidence interval for 95% = EXP( X +/- Y) X = LN ((A * D) / (B * C)) Y = 1.96 * SQRT((1/A) + (1/B) + (1/C) + (1/D)) where: � 1.96 is the value for Z from the standard normal distribution with F(Z) = 0.975 If the odds ratio is 1.0, then there is no difference between the two groups. If the 2 groups are comparing an intervention, then this is equivalent to a null hypothesis of no intervention difference. Small Sample Sizes If sample sizes are small (less than 10 or 20), then 0.5 is added to each of the factors. odds ratio = (odds group 2) / (odds group 1) = ((A+0.5) * (D+0.5)) / ((B+0.5) * (C+0.5)) confidence interval for 95% = EXP( X +/- Y) X = LN (((A+0.5) * (D+0.5)) / ((B+0.5) * (C+0.5))) Y = 1.96 * SQRT((1/ (A+0.5)) + (1/ (B+0.5)) + (1/ (C+0.5)) + (1/ (D+0.5))) NOTE: I am using sample size as (A + B + C + D). TOP References Beyer WH. CRC Standard Mathematical Tables, 25th Edition. CRC Press. 1978. Section: Probability and Statistics. F-distribution. page 524. Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Confidence Interval for the Difference Between Two Proportions Using the Normal Approximation Overview: If the events for two proportions are normally distributed, then the confidence interval for the difference between the two proportions can be calculated using the normal approximation. Requirements (1) events occur with a normal distributions (2) populations and events are sufficiently large (3) the proportions for the 2 populations are not too close to 0 or 1 Population 1 Population 2 total number N1 N2 number showing response R1 R2 proportion responding in population 1 = P1 = (R1) / (N1) proportion responding in population 2 = P2 = (R2) / (N2) confidence interval = P1 - P2 +/- ((one tailed value of the standard normal distribution) * (SQRT (((P1 * (1 - P1)) / N1) + ((P2 * (1 - P2)) / N2))) where: � The one tailed values for standard normal distributions with two-tailed confidence intervals, assuming an infinite degree of freedom: Confidence Intervals one-tailed value 80% 1.282 90% 1.645 95% 1.960 98% 2.326 99% 2.576 99.8% 3.090 Interpretation � If the confidence interval includes 0, then the data shows no statistically significant difference between the 2 proportions. TOP References Beyer WH. CRC Standard Mathematical Tables, 25th Edition. CRC Press. 1978. page 524. Young KD. Lewis RJ. What is confidence? Part 2: Detailed definition and determination of confidence intervals. Ann Emerg Med. 1997; 30: 311-318. Odds and Percentages Overview: The rate of occurrence for a condition can be expressed as the odds or the percentage of a population involved. total population = (number affected) + (number unaffected) odds denominator = (total population) / (number affected) = 1 + ((number unaffected) / (number affected)) odds = 1 in (odds denominator) (number affected) to (number unaffected) percent affected = (number affected) / (total population) * 100% = 1 / (odds denominator) TOP References Harper PS. Practical Genetic Counselling, Fifth Edition. Butterworth Heinemann. 1999. Table 1.1, page 10. Benefit, Risk and Threshold for an Action Benefit-to-Risk Ratio and Treatment Threshold for Using a Treatment Strategy Overview: Each treatment strategy has potential risks and benefits. The treatment threshold uses a treatment's benefit and risk when used for a given condition to help decide if and when to treat. benefit for treatment = (risk of adverse outcome from the disease in those untreated) - (risk of adverse outcome from the disease with treatment) risk of treatment = (risk of significant adverse complication due to treatment) benefit-to-risk ratio = (benefit for treatment) / (risk for treatment) treatment threshold = 1 / ((benefit-to-risk ratio) +1 ) = (risk) / ((benefit) + (risk)) Interpretation � Treatment should be given when the risk of having the condition exceeds the treatment threshold. � Treatment should be withheld if the risk of having the condition is less than the treatment threshold. TOP References Beers MH, Berkow R, et al (editors). The Merck Manual of Diagnosis and Therapy, Seventeenth Edition. Merck Research Laboratories. 1999. Chapter 295. Clinical Decision Making. page 2523. Testing and Test Treatment Thresholds Overview: If a test is performed to determine whether a treatment strategy is used, then the testing and test treatment thresholds can help decide if the test should be done. Test features � performance characteristics (sensitivity and specificity) for the condition are known � assume that the test has no direct adverse risk to the patient benefit for treatment = (risk of adverse outcome from the disease in those untreated) - (risk of adverse outcome from the disease with treatment) risk of treatment = (risk of significant adverse complication due to treatment) testing threshold = ((1 - (specificity of test)) * (risk of treatment)) / (((1 - (specificity of test)) * (risk of treatment)) + ((sensitivity of test) * (benefit of test))) test treatment threshold = ((specificity of test) * (risk of treatment)) / (((specificity of test) * (risk of treatment)) + ((1 - (sensitivity of test)) * (benefit of test))) Interpretation � If the probability of disease is equal or more than the testing threshold and equal or less than the test treatment threshold, then the test should be done. � If the probability TOP