P values are measurements of probability due to chance. P values are also measurements of statistical significance, based on an arbitrary reference point of 0.05, or 1 in 20. Data sets with p values less than 0.05 are statistically significant, and those with p values greater than 0.05 are not. It is important to recognize that p values can dismiss chance, but cannot directly prove effect. A small p value (statistically significant) allows the physician to dismiss chance as a possibility when comparing two drugs. A large p value (statistically insignificant) does not allow the physician to dismiss any possibilities (chance, actual effect, confounding factors) when comparing two drugs. A large p value can be favorable in demonstrating a lack of difference in side effects between a drug and placebo.
Factors determining p values
When comparing the effects of two conditions (for example, placebo versus drug or drug versus drug), the p value is determined by the study design and by “the difference itself.” Larger patient populations will have lower p values than smaller patient populations. Variations within a patient group can also affect the p value. Clinical studies that are properly “powered” (have the appropriate sample size to see a believable effect) and minimize intra-group variations through selection criteria are likely to show statistically significant effects (or p < 0.05). If the inherent difference between the comparators is large, the p value will be lower or more likely to be significant than if the inherent difference between the comparators is small. Comparing a drug with placebo, versus another drug within the same class, is more likely to yield different p values.
A p value that is not statistically significant does not rule out the existence of a real difference in effects; it could mean that the study is underpowered. When comparing drugs within the same class that exhibit many similar effects, a large patient population – sometimes many thousands – is necessary to avoid what is known as a Type II error.
A Type II error occurs when there is a difference between two conditions, but a difference between the two conditions is not shown. In general, the smaller the difference in effect between two drugs, the larger the sample population required to detect any difference at all. This is why clinical trials comparing drugs that have similar effects (a statin-versus-statin trial, for example) require large numbers of patients to be able to detect differences between the drugs. Head-to-head trials of drugs within the same class, therefore, can involve thousands of patients and hundreds of study centers, and have high clinical trial costs.
Clinical versus statistical
Clinical significance is not related to statistical significance. This means a study with statistically significant results is not necessarily clinically significant. Take a look at the two examples in the figure.
Clinical versus statistical
|
Clinical versus statistical |
|
|
Example #1 (5,000 patients) |
|
|
Mortality reduction |
|
|
Drug A |
50% |
|
Drug B |
45% |
|
p value |
0.005 |
|
Example #2 (200 patients) |
|
|
Mortality reduction |
|
|
Drug A |
50% |
|
Drug C |
30% |
|
p value |
0.5 |
In the first example, the difference in the reduction of mortality between Drug A and Drug B is clearly believable (large patient sample, small p value). However, this information alone may not be clinically significant enough to warrant switching from Drug B to Drug A without considering practical factors such as cost, compliance and side effect profiles. On the other hand, a much smaller study comparing Drug A and Drug C shows a large difference in mortality reduction, albeit not statistically significant. Physicians may view this difference as large enough to warrant clinical consideration, especially if other factors contribute to the clinical decision (including cost, compliance and side effect
profiles). Large patient populations are warranted to appropriately power future studies such that this observed difference will also be statistically significant.
In general, p values are not as clear-cut as they may appear in determining the value of one drug over another. Genetic variations in patients can contribute to differences in results that cannot be credited to drug treatment – and the p value does not distinguish such variations. P values also become limiting when drug combinations are tested against either placebo or other drug combinations, because p values cannot distinguish how much each drug contributes to the effect of the combination. Therefore, both statistical and clinical significance should be considered when interpreting data and how the data practically apply to patient care.