GCR Statistics

Design your study

Follow the outlined steps and start writing down your methodology. When you are finished, you have the basis for your study protocol. Furthermore, you will be able to claim that you have followed both the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) and Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.

Why perform an etiological study?

The aim of prognostic research is to predict a certain outcome over time with a (set of) variable(s) or so called predictor(s). Many factors can serve as predictors, including genetic information, biomarkers, imaging data, demographics, behaviours, evironmental factors and much more. In contrast to ethiological research, the ability to predict the outcome does not have to be based on a cause-effect relationship. Instead, we look at associations (direction and magnitude) with the outcome. From this follows, that confounding does not play an important role in prognostic research. The reason to unravel prognostic information is that it brings tremendous value to our understanding of a disease, helps informing patients, facilitates clinical decision-making, improves the design and analysis of clinical trials, and is a major driver of treatment optimisation. Prognostic research enables individualised care. Without it we would inform, monitor and treat all patients with a certain disease equally, obviously failing in all of these aspects.

What types of diagnostic studies are there?

Diagnostic studies can be split into three types of research:

1. Fundamental prognostic research.. This is the most basic type of study. In these studies, an outcome frequency is assessed. A well known-example are the annual cancer statistics reports. Knowledge of outcome frequency is essential for understanding diseases, monitoring hospital performances and performing sample size calculations in the design of clinical trials.

2. Prognostic factor research. There is an abundance of prognostic factor research in the literature. In this type of research, studies explore the prognostic value of factors with unknown significance. Follow-up studies should then try to replicate these findings and assess their prognostic value over established prognostic factors.

3. Prognostic clinical prediction research. In clinical practice, decisions are rarely made on the basis of one factor. Thus, in prognostic clinical prediction research a multivariable (containing > 1 variable) prediction model is build, internally and externally validated. It should enable physicians to reliably estimate the risk of a specific endpoint for individual patients.

4. Stratified medicine research. The ultimate goal of prognostic research is to enable the use of prognostic information to help tailor treatment decisions to an individual or group of individuals with similar characteristics (strata). In stratified medicine research, one assess the impact of the model implementation on clinical practice.

What are the biggest challenges in diagnostic research?

- Enabling generalisability. Lack of generalisability is without a doubt the number one problem in prognostic research. The goal of a prognostic study is to inform physicians about the predictive value of certain factors in a specific type of patient in a specific setting (domain). This is only possible if your study cohort is a good representation (random selection) of this domain. The smaller your cohort, the higher the chance that it is a non-random selection that does not represent the domain. If we would, for example, build a multivariable survival model in say a cohort of 100 patients with lung cancer. Perhaps only 40 patients are deceased at the time of analysis (the rest is either still alive or censored). The model that we will build will heavily depend on the patient characteristics of these 40 patients. There is a good chance that we will find some factors that are significantly associated with survival. In a small cohort like this, our factors may be predictive of survival in our cohort, but we cannot generalize our findings towards our domain of patients with lung cancer. Even if these factors were also true for our domain, our estimates in the model will be off by far. In mathematical terms, the model will be overfitted. This stresses the importance of large cohorts (think in terms of at least hundreds of patients), internal validation (for example bootstrapping to adjust for overfitting) and external validation (are our results generalisable to an independent cohort?).

- Making sure that you have complete and accurate data on genetics, biomarkers, imaging data, patient demographics, behaviours and exposure to environmental factors etc.. Retrospective study designs are prone to missing data, but prospective studies often require a multiple-year follow-up period and can be very expensive. Especially behavioural and environmental factors may be (selectively) misreported by patients in questionnaires. If you have missing data, it is better to use techniques such as multiple imputation than to ignore them (as in complete case analyses).

- Creating value. The amount of prognostic studies published is overwhelming. Yet, very few prediction models are usable or are used in clinical practice. Try to avoid the “publishing for the sake of publishing trap”. It is not worth the effort, and the joy of a new publication quickly fades away. Think ahead of the value that you want to ad. This may be as simple as establishing risk factors or outcome frequencies for the design of a randomized controlled trial. Maybe you want to update an existing clinical prediction model, or develop a novel clinical prediction model for an unexplored clinical setting. If you aim for the latter target, try to keep the model as simple as possible. Use factors for which you can reason/explain the association, causal factors are an obvious choice. Build and validate the prediction model, and create a rule or web-based user-interface that lets clinicians calculate individual patients’ risk in a simple fashion.

How to get started?

Hopefully, you have already defined your research question. You know your domain, determinant(s) and outcome of interest. Now, write down the background of the clinical problem, findings of previous studies and rationale of your study. The next step is to meticulously define your study methodology. Your methodology must be so clear ahead of time, that other researchers could easily replicate the study.

We can divide study design into two parts:

Data collection
Data analysis

Designing the data collection

Establish your study design

Use a prospective study design unless… This is not feasible. The difference between prospective and retrospective designs lies in the data collection. The analysis is always retrospective. A prospective design allows you to standardise the data collection and try and limit missing data. If you are going to use a retrospective design try to maximise your number of patients complement your data by collaborating with other centers/institutions and patient registries.
Sometimes, a randomized controlled trial is used as the basis for a prognostic study. Certain adjustments are required, such as creating separate models for the different treatment arms or including treatment arm as a predictor or interaction-term.

Patient accrual

Describe:
- The study setting (primary care, secondary/tertiary hospital, ICU, ED etc.)
- The dates and period of recruitment, exposure time, and time of follow-up
- Eligibility criteria (inclusion and exclusion criteria).

Variables of interest

Describe:
- Which variables of interest and effect-modifiers (previously described in the literature and based on clinical reasoning) are studied
- How continuous variables are handled (categorisation needs to be explained)
- The sources of data.
- Detailed methods of assessment/measurement.

Outcome(s)

Describe:
- Primary and secondary outcomes, along with their exact definition
- Fundamental prognostic information
- Exploration, replication and comparison of a prognostic factor
- Prediction model building (+ calibration/discrimination), validation and update
- The sources of data
- Detailed methods of assessment/measurement

NB: Try to avoid subjective, surrogate and composite endpoints

Bias prevention:

Describe how you prevent potential sources of bias (for example selection bias, information bias and confounding)

Designing the data analysis

Describe:
- How summary data are analysed (mean – sd; median – range). Addition of 95% confidence intervals.
- Sample size determination based on primary endpoint (use the Sample Size Calculator)
- Which analyses will be performed (Use our Test Wizard). For normally distributed data use a parametric test, non-normally distributed data require a non-parametric test. For paired/clustered data use an appropriate test
- In the case of model building: method of model assumption evaluation (graphical, numerical), method for variable selection (forward/backward/enter), way of testing (p-value, Likelihood Ratio Test, AIC value), calibration (plot), discrimination (C-index/AUC) and validation (bootstrapping)
- Transformation of data (if applicable)
- Handling of extreme outliers
- Corrections of statistical significance (Bonferroni etc.)
- Statistical ways of handling missing data: Complete case analysis (not recommended) | Multiple imputation | Reclassification => best or worst-case scenario
- Statistical package used for analyses
- Assumption of statistical significance

NB: if you use relative measures as your endpoint such as an Odds Ratio (from logistic regression analyses) or Hazard Rate (Survival analyses) try to calculate absolute measures as well. A large relative risk increase on a small a priori risk is still a small risk!

Prognostic Research

Characteristics of Prognostic Research