Should Researchers Include a Confidence Index of their Results?
The ongoing reproducibility crisis has drawn attention to the way experiments are reported. Alarmingly, a Nature survey revealed that 70 % of scientists were unable to reproduce another scientists’ experiments and 50 % were unable to reproduce their own experiments. It is common scientific practice to back your results statistically with a p-value. However, the reproducibility crisis suggests that this is not enough. One way to add clarity to data interpretation may be to use a confidence index, that tells you the chances of your results being true.
What is a Confidence Index?
In scientific experiments, you test a treatment on a sample of a population. You ensure that you select a random sample (e.g. clinical trial participants) representative of a population. The confidence index will tell you how certain you can be that your treatment will have the same effect (as occurred in your experiments) on the entire population.
It is a convention within scientists to report their data with a p-value, and although it is similar to a confidence index, it is not the same. The difference is that the p-value implies that your result is 95 % accurate, whereas a confidence index tells you the probability of your result being accurate.
P-value calculations are dependent on:
- Sample size – larger sample sizes will generate more accurate results.
- Response frequency – the greater a particular response, the more accurate your results.
- Population size – this is only important when your population is small.
A confidence index, which could be based on Bayesian probability, would include considerations such as:
- Random variables – unknown factors resulting from a lack of information.
- Prior probability – take account of available information.
- Hypothesis truth – probability that your hypothesis is true or false.
A confidence index is crucial when making decisions about future experiments such as treatments for clinical trials. The confidence index should also give an idea of the limitations of the studies.
Justifying Results with P-values
Conventionally, scientists rely on p < 0.05 to deem their results publishable. It tells other researchers that your results are valid. If your p-value is greater than 0.05, you probably will not publish the data. Some argue that this results in valuable research not being made available in the literature.
Data is typically reported as follows: “Moderator analyses revealed that a higher temperature at bedtime was associated with lower sleep efficiency (SE) (b = − 11.6 pp; p = 0.020).”
We can see the p-value is less than 0.05, however does this mean this statement is true?
In 2016, the American Statistical Association (ASA) published guidelines for using a p-value and cautioned that a p-value by itself, “does not provide a good measure of evidence regarding a model or hypothesis”.
This is because the p-value does not tell you whether:
- Your sample was representative of the population.
- The studied hypothesis is true or false.
- The data was produced by random chance.
Include the Confidence Index
Considering the above, the ASA cautions that the p-value should not be a “substitute for scientific reasoning” and many other factors such as “good study design and conduct” are important.
Yes, any issues with the p-value should be addressed by transparency of experimental methods and analysis of the data in context with previous literature. After all, researchers are trained to assess the validity of the results of a study by considering all the relevant factors.
However, Steven Goodman thinks research could be clarified if the results are quantified with a confidence index. This could minimize so-called p-hacking (manipulation of data in order to achieve p-values < 0.05) and rather give actual values of probability. Do you think this clarity could help sort out the reproducibility crisis? Let us know your thoughts in the comments section below.