GCR Statistics

Design your study

Follow the outlined steps and start writing down your methodology. When you are finished, you have the basis for your study protocol. Furthermore, you will be able to claim that you have followed the STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) guideline.

Why perform an etiological study?

The aim of etiological research is to explain causes of disease. Oftentimes these genetic factors, behaviours (lifestyle habits) or exposure to environmental factors are suspected risk factors, but a causal relationship has not been defined. Once the direction and magnitude of a causal relationship is defined, we can act upon it. For example, high risk patients can receive frequent screening or we can try to eliminate the risk factor.

What are the biggest challenges in etiological research?

- Making sure that you have complete and accurate data on patient demographics, genetics, behaviours and exposure to environmental factors. Retrospective study designs are prone to missing data, but prospective studies often require a multiple-year follow-up period and can be very expensive. Especially behavioural and environmental factors may be (selectively) misreported by patients.
- Adjusting for confounding variables. Performing a randomised controlled trial in which we expose a part of the study group to a risk factor is often times not feasible, nor ethical. Therefore, we must be sure that we are actually measuring the effect of the determinant under study and not that of another risk factor. This means that we need a lot of data on many patients (to build a mathematically stable model). Furthermore, a cohort needs to be representative of our study domain.

How to get started?

Hopefully, you have already defined your research question. You know your domain, determinant(s) and outcome of interest. Now, write down the background of the clinical problem, findings of previous studies and rationale of your study. The next step is to meticulously define your study methodology. Your methodology must be so clear ahead of time, that other researchers could easily replicate the study.

We can divide study design into two parts:

Data collection
Data analysis

Designing the data collection

Establish your study design

Use a cohort study design unless… The variable under study or disease is very rare. In these cases, a nested case control study is more efficient. In short, a nested case control study design is where you collect data on patients with the outcome of interest (often a disease). Next you take a random sample of patients without the outcome of interest and collect the same data on these patients. This allows you to explore differences in variables of interest. Note that this does not give you definitive proof that a certain variable has caused the disease, since you have reversed the order of testing! Use a prospective study design unless… This is not feasible. he difference between prospective and retrospective designs lies in the data collection. The analysis is always retrospective. A prospective design allows you to standardise the data collection and try and limit missing data. If you are going to use a retrospective design try to maximise your number of patients complement your data by collaborating with other centers/institutions and patient registries.

Patient accrual

Describe:
- The study setting (primary care, secondary/tertiary hospital, ICU, ED etc.)
- The dates and period of recruitment, exposure time, and time of follow-up
- Eligibility criteria (inclusion and exclusion criteria)

Variables of interest

Describe:
- Which variables of interest, potential confounding factors (previously described in the literature and based on clinical reasoning) and effect-modifiers are studied
- The sources of data
- Detailed methods of assessment/measurement

Outcome(s)

Describe:
- Primary and secondary outcomes, along with their exact definition
- The sources of data
- Detailed methods of assessment/measurement
NB: Try to avoid subjective, surrogate and composite endpoints

Bias prevention:

Describe how you prevent potential sources of bias (for example selection bias, information bias and confounding)

Designing the data analysis

Describe:
- How summary data are analysed (mean – sd; median – range). Addition of 95% confidence intervals.
- Sample size determination based on primary endpoint (use the Sample Size Calculator)
- Which analyses will be performed (Use our Test Wizard). For normally distributed data use a parametric test, non-normally distributed data require a non-parametric test. For paired/clustered data use an appropriate test
- In the case of model building: method of model assumption evaluation (graphical, numerical), method for variable selection (forward/backward/enter), way of testing (p-value, Likelihood Ratio Test, AIC value), calibration (plot), discrimination (C-index/AUC) and validation (bootstrapping)
- Transformation of data (if applicable)
- Handling of extreme outliers
- Corrections of statistical significance (Bonferroni etc.)
- Statistical ways of handling missing data: Complete case analysis (not recommended) | Multiple imputation | Reclassification => best or worst-case scenario
- Statistical package used for analyses
- Assumption of statistical significance

NB: if you use relative measures as your endpoint such as an Odds Ratio (from logistic regression analyses) or Hazard Rate (Survival analyses) try to calculate absolute measures as well. A large relative risk increase on a small a priori risk is still a small risk!

Etiological Research

Characteristics of Etiological Research