P-Curve corrects for publication bias that does not exist.

Ulrich Schimmack

May 9

This is another blog post illustrating the problems of p-curve analyses. P-curve was developed to correct the results of meta-analyses when studies report too many significant results. In this case, p-curve disregards the evidence from non-significant results and applies a correct to the significant results. If the significant results were obtained without a real effect, the distribution of the significant p-values would be uniform. In contrast, the distribution would be 'right-skewed' (i.e., there are more p-values below .01 than p-values between .01 and .05). The distribution of the observed p-values is compared against the uniform distribution using visual inspection and statistical tests.

The main problem of p-curve is that it assumes bias rather than testing for bias. As a result, non-significant results are discarded even when there is no bias and the non-significant results provide valuable information about the presence of an effect. The main reason, p-curve does not test or bias is that the method was developed in a research area, social psychology and behavioral economics, where the use of questionable practices were rampant and non-significant results were rarely published. This is not the case in other research areas like medicine, where non-significant results are reported more often.

The example is from a meta-analysis of the use of hypno-therapy for the treatment of depression (Milling et al. 2019).

Leonard S. Milling, Keara E. Valentine, Hannah S. McCarley & Lindsey
M. LoStimolo (2019) A Meta-Analysis of Hypnotic Interventions for Depression Symptoms:
High Hopes for Hypnosis?, American Journal of Clinical Hypnosis, 61:3, 227-243, DOI:
10.1080/00029157.2018.1489777

The meta-analysis was based on 13 studies and the results are reported in Table 2.

Only 7 of the 13 results are statistically significant, p < .05 in the last column. The z-scores were entered in the p-curve app.

The results show a right-skewed distribution but the slope is flat and the significance test does not allow to reject the null-hypothesis that the deviation from a uniform distribution is just chance, p = .18. Based on these results, the data provide no evidence for the effectiveness of hypnotherapy.

Is there Publication Bias?

To justify the use of a selection model, it is necessary that there is publication bias. A powerful tool to test for the presence of publication bias is the comparison of the percentage of significant results against the mean power of studies. Without bias, the success rate should match mean power (Brunner & Schimmack, 2021). To test this hypothesis, I entered the data into the R-Index spreadsheet. The spreadsheet converts test statistics into two-sided p-values, z-scores, and observed power. In addition, it is recorded whether a result was significant or not.

In this case, the test statistics are z-scores and the conversion to p-values and back to z-scores is not necessary. The key finding is that the success rate is 7 out of 13 (54%) and that mean power is 41%. Thus, there is only a relatively small amount of bias at best. It is therefore problematic to discard the information from the non-significant results, which reduces the statistical power to demonstrate an effect.

Z-Curve

Z-curve was developed as a selection model like p-curve. However, z-curve can also be used for all observed values including non-significant results, if there is no selection bias. Using all 13 observations, the data provide sufficient evidence that hypnotherapy can be effective. The null-hypothesis implies that the expected discovery rate (EDR) and the expected replication rate (ERR) are 5%, which is the percentage of significant results expected by chance alone. The 95%CI excludes a value of 5%. Thus, there is evidence of an effect. At the same time, the results show that the studies had low power to produce significant results individually. The reason is that the sample sizes were small ranging from N = 20 to 53. Thus, future studies should use larger samples to avoid false negative (type II) errors.

Conclusion

The main conclusion is that researchers should use selection models only after they have demonstrated that selection bias is present. Assuming selection and discarding non-significant results reduces power and can lead to false negative results. A comparison of success rates and observed power using the R-Index spreadsheet provides a simple method to examine publication bias in small sets of studies.

R-Index Spreadsheet