GCR Statistics

Design your study

Follow the outlined steps and start writing down your methodology. When you are finished, you have the basis for your study protocol. Furthermore, you will be able to claim that you have followed the Standards for Reporting Diagnostic Accuracy (STARD 2015) as well as STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guidelines.

Why perform an etiological study?

The aim of diagnostic research is to evaluate how well a diagnostic test can confirm and rule-out a certain disease.
Every patient has its own set of characteristics (age, gender, comorbidity etc.), behaviours, signs (physical examination) and symptoms (history taking). The suspicion of a certain disease is raised on the basis of this information. The probability of disease presence is called the prior risk. In reality, a differential diagnosis – a set of possible diseases ordered on the basis of likelihood and/or severity – is established.
A subsequent diagnostic test either increases or decreases the prior risk, so that the physician can feel confident about the presence or absence of a specific disease and act accordingly (either treat the disease or repeat the same process for another disease in the differential diagnosis).

What types of diagnostic studies are there?

Diagnostic studies can be split into three types of research:

1. Test research. This is the most published research type. Test research evaluates the isolated ability of a diagnostic test to demonstrate or rule-out a disease. Classically, a first proof of principle needs to be established for a newly developed test. The most efficient way to do this is to select a group of patients with a certain disease, take a random sample of patients without the disease from the same source (case-control study), and subsequently run the test in all patients. The proportion of patients with a positive test result in the diseased group gives us the test sensitivity (aka true positive rate). The proportion of patients with a negative test result in the non-diseased control group gives us the test specificity (aka true negative rate). Test sensitivity and specificity are often called test characteristics, but are dependent on the disease prevalence in the study. Test research is also very applicable to settings in which diagnosis is performed on the basis of just one test, for example screening.

2. Diagnostic clinical prediction research. This type of research evaluates the added disease probability estimation value of a diagnostic test in relation to all other relevant information. This approach is much more sensible for clinical application, since the set of information that has led to the disease suspicion in the first place (patient characteristics, signs and symptoms etc.) cannot be ignored. Furthermore, almost always multiple tests are ordered (laboratory investigations, radiological imaging, ECG’s etc.). A cross-sectional study design can be used to enrol patients suspected of the disease. It is critical to collect relevant patient information, the outcome of the test under study and a reference standard in all patients. With this information, the positive predictive value (proportion of patients with the disease in those with a positive test outcome) and negative predictive value (proportion of patients without the disease in those with a negative test outcome) can be determined. Subsequently, a multivariable logistic regression model can be build 1) without the test under study and 2) with the test under study. For both models, an AUC (area under the curve) of the ROC (receiver operator curve) is calculated. This gives the proportion of patients correctly discriminated by the test. Now, by comparing the AUC values (and 95% confidence intervals) of both models, it can be determined whether the diagnostic test is helpful. Somewhat more elaborate methods such as reclassification or decision curve analyses can be used to get an idea about the impact of a test on clinical decision making.

3. Clinical utility trials. This type of study is rarely performed, but is highly relevant to clinical practice. The goal of such a trial is to determine the downstream effect of a diagnostic test on relevant health outcomes. This can range from patient-specific outcomes such as 5-year survival rate to institutional or societal outcomes such as healthcare quality, efficiency and cost-effectiveness. An example is a randomized controlled trial that assesses the survival impact of screening for colorectal cancer with fecal-occult blood testing in the general population with age 55 – 75 years. In this example diagnostic and therapeutic research are combined.

The most important results of a diagnostic study are presented in a 2 x 2 contingency table

	Disease +	Disease -
Test +	A	B
Test -	C	D

Sensitivity or True Positive Rate = A/(A+C)
Specificity or True Negative Rate = D/(B+D)
Positive Predictive Value = A/(A+B)
Negative Predictive Value = D/(C+D)

As you can see above specificity, sensitivity, positive and negative predictive values can be calculated with the same simple 2 x 2 table.

Which of the following questions makes the most sense to you?

Sensitivity: Doctor, I know I have the disease, but what is the chance that I will have a positive test outcome?

Specificity: Doctor, I know I don’t have the disease, but what is the chance that I will have a negative test outcome?

Positive Predictive Value: Doctor, I have a positive test outcome, now what is the chance that I really do have the disease?

Negative Predictive Value: Doctor, I have a negative test outcome, but what is the chance that I really don’t have the disease?

Unfortunately, sensitivity and specificity rates still form the cornerstone of diagnostic research publications.
In my opinion, their value is limited to proof-of-concept studies on new tests. Predictive values are more informative for clinicians!

What are the biggest challenges in diagnostic research?

- Making sure that you have complete and accurate data on patient characteristics and test outcomes. Retrospective study designs are prone to missing data. Prospective study designs are quite feasible, since follow-up is often times not necessary.
- Choosing the right reference test/standard. In diagnostic research, we’ll need to decide whether the patient actually does or doesn’t have the disease. Thus, we need a reference standard against which the new diagnostic test (index test) is evaluated. Every patient needs to receive both the index test and the reference test. Selectively performing only one of the tests may lead to biased study results. If a gold standard (best possible test, for example pathological examination) has already been defined, then this may be the way to go. It becomes slightly more challenging if no other reference test exists or if the goal is to demonstrate that the new test outperforms the gold standard. Using follow-up (diseases tend to get worse if untreated) or a test treatment (if the treatment works this is indirect proof of the disease) may provide solutions to these problems.
- Insuring objective test scores. Hopefully the experimental test is relatively objective (good inter-observer reliability/agreement) and replicable (good intra-observer reliability/agreement). Click here to learn more about reliability studies. If the test assessment is operator-dependent (for example as in radiological imaging) it is recommended that the assessor is blinded for the reference test outcome.
- Finding the right threshold for test outcomes. Test outcomes are not always dichotomous (positive vs. negative). They can also be categorical (low, intermediate, high risk) or continuous (serum CRP value). For continuous measures, a test-positivity cut-off needs to be defined. Area under the curve (AUC) analysis of a receiver operator curve (ROC) can provide sets of sensitivity and specificity rates for various thresholds. The optimal threshold can subsequently be chosen depending on the requirements for the test (good in ruling out, ruling in or a combination). A word of caution: selecting the optimal threshold on the basis of maximum accuracy often results in overly optimistic diagnostic accuracy estimates, and therefore these estimates should be externally validated in an independent sample.

How to get started?

Hopefully, you have already defined your research question. You know your domain, determinant(s) and outcome of interest. Now, write down the background of the clinical problem, findings of previous studies and rationale of your study.
Include the intended use of the index test. Is it meant for diagnosis, screening, staging, monitoring, surveillance, prognosis or treatment selection? In which setting and type of patients can it be used? What is the clinical role of the index test relative to other tests? Is it primarily used to rule-out (exclude) or rule-in (confirm) disease in low/high risk patients, or does it need to do both? Why? What is the significance of potential test errors (false positives and false negatives)?
In diagnostic accuracy studies, it is also important to prespecify testable hypotheses such as minimum levels of sensitivity/specificity, positive/negative predictive value or AUC value, superiority or non-inferiority to another index tests.
The next step is to meticulously define your study methodology. Your methodology must be so clear ahead of time, that other researchers could easily replicate the study.

We can divide study design into two parts:

Data collection
Data analysis

Designing the data collection

Establish your study design

In the early phase of test evaluation use a case-control study design. Select patients with confirmed disease and take a random sample of healthy controls (preferably from the same setting). Take the index test in all patients and record measures of sensitivity, specificity, false positive and false negative rates from a 2 x 2 contingency table. This is also the time to evaluate reliability measures and search for a positive-test threshold for continuous measures. This early phase will give you an idea about the index test’s diagnostic potential. Hopefully, you do realise that the ‘real-world’ diagnostic accuracy should be evaluated in subsequent prospective cohort studies.
In subsequent phases of diagnostic accuracy evaluation use a cohort study design. This will allow you to evaluate the index test performance in relation to other relevant information and tests in a larger population with a broad scala of pathology, mirroring the clinical setting in which it is meant to be used. Use a prospective study design unless… This is not feasible. The difference between prospective and retrospective designs lies in the data collection. The analysis is always retrospective. A prospective design allows you to standardise the data collection and try and limit missing data. If you are going to use a retrospective design try to maximise your number of patients complement your data by collaborating with other centers/institutions and patient registries.

Patient accrual

Describe:
- The study setting (primary care, secondary/tertiary hospital, ICU, ED etc.)
- The dates and period of recruitment, exposure time, and time of follow-up
- Eligibility criteria (inclusion and exclusion criteria). These criteria should be related to the nature and stage of the target disease and future use of the index test. Often times this includes the signs, symptoms and previous test results that have generated the disease suspicion. Only exclude patients with a condition or treatment that adversely affects the way the index test works if this will likely happen in clinical practice as well.
- Explain by whom and how eligible candidates are identified. For example all consecutive patients with suspected acute appendicitis were identified by the study nurse on the emergency department.

Variables of interest

Describe:
- The index test under study.
- The reference test/standard.
- Explain why these tests were chosen and give detailed methods of assessment and measurement (so that these methods are replicable).
- Describe by whom these tests are assessed and what their level of training/experience is.
- Define positive, negative (and if applicable indeterminate) test outcomes. Define and give the rationale for the positivity cut-off in continuous test measurements. If the aim of the study is to explore the ideal threshold, state so.
- Describe whether the investigators assessing the index test were blinded for patient/clinical information and the outcome of the reference test and vice versa.

Outcome(s)

Describe:
- The collection of data on severity of disease in those with the target condition and distribution of alternative diagnoses in those without

Define primary and secondary outcomes such as:
- Optimal positive-test cut-off
- Added value of the test in relation to clinical information and previous tests
- (If applicable adverse events resulting from the index test)

Bias prevention:

Describe how you prevent potential sources of bias (for example selection bias, information bias and confounding)

Designing the data analysis

Describe:
- How summary data are analysed (mean – sd; median – range). Addition of 95% confidence intervals.
- Sample size determination based on primary endpoint (use the Sample Size Calculator)
- Which analyses will be performed (Use our Test Wizard). For normally distributed data use a parametric test, non-normally distributed data require a non-parametric test. For paired/clustered data use an appropriate test
- In the case of model building: method of model assumption evaluation (graphical, numerical), method for variable selection (forward/backward/enter), way of testing (p-value, Likelihood Ratio Test, AIC value), calibration (plot), discrimination (C-index/AUC) and validation (bootstrapping)
- Transformation of data (if applicable)
- Handling of extreme outliers
- Corrections of statistical significance (Bonferroni etc.)
- Statistical ways of handling missing data: Complete case analysis (not recommended) | Multiple imputation | Reclassification => best or worst-case scenario
- Statistical package used for analyses
- Assumption of statistical significance

NB: if you use relative measures as your endpoint such as an Odds Ratio (from logistic regression analyses) or Hazard Rate (Survival analyses) try to calculate absolute measures as well. A large relative risk increase on a small a priori risk is still a small risk!

Diagnostic Research

Characteristics of Diagnostic Research