A word from the RNOH statistical consultants
In our capacity as statistical consultants to various members of staff engaging in research at the RNOH, we have observed a recurrence in the type of advice sought. To this end we have produced a statistical checklist, which researchers could consult as a starting point.
The following set of self-ask questions should allow investigators to conduct their research correctly and with greater confidence. Should any problems remain unanswered then advice should be sought from a statistician.
We aim to highlight the standard problems a researcher will encounter by linking the following considerations with specific examples from a published study about a randomised controlled trial (RCT) looking at prevention of falls in the elderly.
Checklist
Is the investigator planning a study or has data already been collected? Ideally the advice of a statistician should be sought right at the beginning of any research undertaken. Contributions from the statistician given towards the design of the study and in the drafting of the proposal could avoid problems at a later stage.
Planning: What question is the investigator trying to answer? A clear aim should be stated as motivation for undertaking the research. The design and conduct of your study should be undertaken with a view to primarily answering your main aim or aims.
In falls in the elderly example, the whole trial has been conducted with the clear main aim of ascertaining whether a structured assessment of elderly people who have had falls could decrease the rate of further falls. Other outcome measures considered included death, functional status and use of health care.
In determining the best ways to answers your aims, a number of important considerations should naturally follow:
Population: What is the population of interest? (i.e. to what sort of patients/subjects are the findings of this study to be extrapolated?). How will patients be sampled for the study? Are they representative of the intended population?
In the example, the population of interest are elderly people. Clear inclusion and exclusion criteria are given, for example, all patients aged 65 years and above who attended A&E with a primary diagnosis of a fall were potentially eligible.
Patients with strong cognitive impairment were excluded; could this affect the applicability of the findings?
Intervention or Exposure: If undertaking a trial, is the intervention well defined? Is the intervention confounded with other factors? If undertaking an observational study, is exposure status readily determined?
In the example, patients were randomly assigned to one of two groups: The intervention group underwent an assessment with referral to the relevant services indicated, and those assigned to the control group received usual care only. Demographics and other potential confounders were similar in the two groups because of the random assignment.
Outcome: Can it be measured accurately/unambiguously? Will measurement be blind?
Follow-up was done by postal questionnaire. Information about subsequent falls was requested. No mention is made of whether assessors were unaware as to which group patients were from (blinding).
Sample size - one of the most commonly asked questions when planning a statistical study is how many observations should be made? Other things being equal, the greater the sample size the more precise your estimates will be. Depending on what type of study is being undertaken, an estimate of sample size can be reached by setting a few assumptions and then referring to special sample size formulae or tables.
So, is your study to be a descriptive study or a comparative one?
Descriptive study
- What is being estimated? What does the researcher expect the estimate to be (approximately)?
- How wide would the confidence interval (CI) be for this estimate? (e.g. if n=10, 25, 100,400)
- What width of CI can the investigator tolerate?
Comparative studies
What (2) groups are being compared? What sort of outcome?
A) BINARY STUDY
1. Difference in percentages. The difference between the two groups that would be deemed to be of clinical importance needs to be specified, for example:
The proportion of patients who suffered further falls in the intervention group compared to the proportion of patients who suffered further falls in the control group. How many subjects need to be recruited to detect a 30% decrease in falls in the intervention group compared to the controls?
2. Assign power, significance level. The power of a study is defined as the probability of correctly detecting the difference between two treatments as significant; this is usually set at a high value such as 80% or 90%. The significance level is referred to as the Type 1 error, i.e. the probability of incorrectly rejecting the null hypothesis. The Type 1 error is fixed as part of the study, usually at 5%, for example:
The power was set at 90% and the significance level atp<0.05. Then, to detect a 30% reduction in the rate offalls between the two groups... 3. 'n' required a sample size of 352 would be required.
B) QUANTITATIVE STUDIES
- Specify a difference in means that would be deemed to be clinically important.
- Estimate of standard deviation in each group, perhaps based on previous studies or on a pilot study.
- Assign power, significance level (see point two under BiNARY)
- 'n' required.
Whenever a sample size is calculated, it has to be remembered that the value will probably be an underestimate due to non-response, patient withdrawals, etc. The calculated sample size should thus be appropriately inflated to allow for this.
Data already collected - choose appropriate statistical analysis:
- What type of outcome variable do you have?
- Is your outcome variable quantitative, for example, measurements of blood pressure or categorical (has this patient suffered a fall, yes or no)?
Quantitative data
If your data is normally distributed (best determined by a histogram of your data) then use the appropriate parametric test; otherwise, use the non-parametric equivalent:
Type of quantitative data:
- Parametric
- Non-parametric
Analysing data on two different variables within a sample, for example, answering the question of whether there is a relationship between the variables: Pearson's correlation or Spearman's correlation.
In this case you might look at if there is a relationship between number of falls suffered Linear regression and age in a certain group of patients.
Guidance table for statistical tests
| Paired data on same variable (single sample), for example, a comparison of the number of falls suffered in one group of patients before and after occupational therapy assessment. | Paired (one sample) t-test | Wilcoxon signed-rank test |
|---|---|---|
| Unpaired data on same variable (two independent samples), for example, a comparison of the number of falls suffered between the assessment group and the control group. | Unpaired (two sample) t-test | Mann-Whitney U test |
| Data on same variable in three or more groups, for example, a comparison of the number of falls suffered between the control group, the assessment group and another group that received another form of intervention. | One way analysis of variance (ANOVA) | Kruskal Wallis test |
Categorical data
As an example - a comparison of the proportion of patients suffering subsequent falls in the assessment group compared to the proportion in the control group.
If comparison between two independent samples is being undertaken then use a Chisquared test if numbers are large or Fisher's exact test when numbers are small.
If comparison between three or more independent samples is being undertaken then use a Chi-squared test. If numbers are small, then some categories may need to be combined if it makes sense to do so.
Other types of data
Survival data, i.e. when the time for a certain event to occur is of interest. In this situation, a special type of analysis called survival analysis is required, for example, a group of patients that received assessment, how long did each subject take to suffer a subsequent fall? If no subsequent fall occurred in some subjects, then how long were each of these subjects followed up for?
References
Close J, Ellis M, Hooper R, et at. Prevention of falls in the elderly trial (PROFET): A randomised controlled trial. Lancet 1 999;353 :93-7.
2 Campbell MJ & Machin D. Medical Statistics: A commonsense approach, second edition. Wiley, 1993.
Bland M. An introduction to Medical Statistics, second edition. Oxford University Press, 1995.
