Publication Bias

Scientific studies are not always published. Those studies that prove a theory and are significant make it into their intended publications, while insignificant or negative results are discarded. The number of studies that do get published is a fraction of those that are produced and submitted to journals. In short, there is a distorted representation of data on a given subject; a filtering process, to which the public is largely blind.

Publications only have so much space. Insignificant results or indeterminable conclusions are not of interest; they are saying, “We thought X was true, but it really isn’t.”.  Scientific journals also don’t publish studies that disprove previous studies that were already published. Why would a scientific journal be interested in publishing a study that negates a hypothesis? Also, didn’t someone already cover that hypothesis with a published result? Articles have already been written about those published studies so why revisit them?

There is a problem with this method of decision-making. Not only does the general public get only a small glimpse into the overall field of study, but those studies that do get published could be erroneous or misleading, and lead to that theory catching hold in the zeitgeist.

The famous paper that highlights this publication bias is: Why Most Published Research Findings Are False: by John Ioannidis in 2005. Since then, scientists, the media, and to a lesser extent the public, have been more careful about how much faith they put into published studies.

Give Me The Details

 For a study to be significant, it has to have a low p-value. In statistics, a p-value is a measure of how significant a study is. A value closer to zero means that it is significant, and a value closer to one is insignificant. From dummies.com:

“All hypothesis tests ultimately use a p-value to weigh the strength of the evidence. The p-value is a number between 0 and 1 and interpreted in the following way:

  • A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
  • A large p-value (> 0.05) indicates weak evidence against the null hypothesis, so you fail to reject the null hypothesis.
  • p-values very close to the cutoff (0.05) are considered to be marginal (could go either way).”

Why .05 as the threshold for P-Value? It was determined by statistician R.A. Fisher in the 1930s and stuck. Kind of like how the average body temperature was determined to be 98.6 degrees F by a German doctor, Carl Reinhold August Wunderlich, in 1851. It is really closer to 97.5 F, but the nearly two-century-old figure is the one we all go by.

Scientific studies are vulnerable to a concept called P-Hacking, where a number of variables are measured simultaneously to find a correlation, with any insignificant results downplayed or discarded. In order for the results to be significant, sometimes the data is sifted through to find some correlation between cause and effect, even if in reality there is none. Through this sifting, the p-value may decline as they are taking an insignificant result and making it significant through editing.

P-Hacking is a problem in science. Scientists/professors need grant money and tenure. In order to achieve these things, they need to establish themselves in their field. In order to do that, they need to be cited. In order to be cited, their work needs to be published. As said earlier, the best way to get published is to test a surprising hypothesis and have the results support that hypothesis and be significant. Scientists can test provocative theories, but in order for their findings of those tests to be significant, their chances of achieving that goal are improved by p-hacking.

Here is a good motto for aspiring scientists: Test novel and unexpected hypotheses and p-hack your way to publication. Thank you Veritasium.

Here are some ways that publication bias can be impacted through skewed studies:

Small Samples

Small sample sizes can lead to errors in the findings, which lead to conclusions that may not be true. To illustrate the effect of small sample sizes on results, let’s make up a study. This study will look at breast cancer. The chance of a woman in the US getting breast cancer in her lifetime is around 1-in-8 (or 13%). By the way, the chances of women dying from breast cancer in the US are 1-in-38. I did not know that breast cancer was so common and so treatable.

Now let’s take a random variable that probably does not affect breast cancer: hair color.

So the pretend study will have 1,000 participants, all with different hair colors: 300 Blonde, 300 Brunette, 300 Black, and 100 Red. These participants will have already lived a full life and passed away, so their lifetime risk factors can be measured. The result of the study shows that, of the 1,000 people’s medical history surveyed, the following breast cancer occurrences are found: 35 Blonde, 20 Brunette, 40 Black, 20 Red.

Blonde and Black have results that are not really surprising. However, Brunettes, in this study, show a 6% breast-cancer-occurrence rate. Meanwhile, Redheads are reported as having a 20% occurrence-rate. These findings, when you look at the data alone, are surprising and potentially noteworthy.

An example finding (or headline) from this study is “Brunettes are less than half as likely to get breast cancer.” Another example would be “Redheads are at high-risk for getting breast cancer.” This second headline is possibly more egregious because only 100 redheads were analyzed, so the results can easily be skewed.

At face value, this hypothesis and study are ridiculous. My guess is that hair color does not impact breast cancer likelihood. With that said, genetics might play a role in breast cancer and hair color can also be determined from heredity…but the link between breast cancer and hair color is a leap.

The irresponsible result of a study like this is that Brunettes will walk around thinking they are safe and/or Redheads will be more paranoid about getting breast cancer.

If the sample size were larger, say 100,000, then the results of the study would be tighter and more believable. A standard deviation is how much the results stray from the average with a 95% confidence rating. With a smaller sample size, the standard deviation would be larger, meaning that the results from the data could be flawed. Larger sample sizes would decrease the error in the results (leading to a smaller standard deviation). Results would cluster around the true average. So with a sample of 100,000 women, you would probably find the results of all hair colors get closer to that 13% occurrence range. With 10,000,000 women, that average would be even more insightful. 

Prejudices

Another way that studies can be misleading is through prejudice, specifically by organizations/industries sponsoring the study. One side of an issue will want perception about that issue to be a certain way, and they can skew published articles so that their side prevails.

A good example is the tobacco industry. Here is a study from 2005 that shows that the tobacco industry has habitually skewed research to limit the damage of findings to smoking.

A large industry, like tobacco or pharmaceuticals, has the resources to a) fund studies in their favor, b) discount other studies through peer reviews (to skew the public perception of the effect of their product on the population), c) market/promote those favorable findings and d) give them the impetus to advocate for policies that are favorable to their industry, with lobbyists hired to help the cause. This effort from an industry that would like to insulate themselves from damage serves to quell research studies that highlight the potential dangers of their product. The opponents in this argument have fewer resources to battle against that industry.

Ways that prejudice can skew the results of their sponsored studies can include:

  • Posing a hypothesis that is more likely to prove their biased point. 
  • Posing biased survey questions in the study.
  • Pre-selecting data or participants to survey.
  • Eliminating outliers from research that refute their point.
  • Keeping outliers that support their point.
  • P-Hacking so certain results seem more significant than they are.
  • Drawing attention to those data results that support their position.
  • Hiding data results that do not support their position.

It is difficult for public-interest, non-profit advocates to fight against multi-billion-dollar industries in the fields of research and lobbying. They simply can’t devote as much to making their case as their well-funded competitors.

Hot Topic

Research areas that are timely and pique the interest of both the scientific community and the public tend to be more readily accepted by scientific journals. These journals want to draw readers to their publications, and prefer to feature studies that interest not only scientists, but those outside of academia, particularly news outlets catering to public interest.

This article is from May, only a few months into the coronavirus outbreak. It highlights just how many papers written and published refer to the COVID-19 pandemic. One excerpt is as follows:

“As of 14 April, some 80% of the more than 11,000 COVID-19 manuscripts it examined had appeared in refereed journals, some of which originally appeared as preprints.”

This 80% publication ratio is staggering when you compare the probability of publishing a normal paper to around 10% or less.

Coronavirus research was accumulated fast and the papers were written hastily. The COVID-19 topic is/was pressing, and many people in and out of science want to grasp scientific progress and build off of it. However, as this article shows, much of the coronavirus research has been too fast, and holes were easily poked in published studies, leading to an increase in retracted papers. For the studies to be more valuable, more time and deeper study must be done, a luxury that is not easily granted as a race is on for coronavirus reporting and a cure.

Impact on the Zeitgeist

Probably the biggest danger of publication bias is how findings that would normally stay within the science industry get adopted and championed in the mainstream press, creating a premature validation of those results. Regardless of the levels of inaccuracies revealed in the findings that are trumpeted throughout pop culture, most laypeople will take a factoid they have heard widely used, especially by the media,and it will stick with them for life.

Let’s take the 10,000-hour rule. The original study of expert musicians was written by K. Anders Ericsson and published by Psychological Review in 1992. Here, a link was made between hours of deliberate practice and how well college-aged musicians played their instruments.  The rule says that 10,000 hours of deliberate practice will elevate the person practicing to the status of master performer. The rule is short, utilizes a round number, is provocative, and is aspirational.

If the study had simply remained there, the findings would have remained obscure, though probably cited a lot by other scientists. The concept was not made popular until Malcolm Gladwell popularized it in his 2008 book, Outliers. Since then, it has become part of our culture, and many people apply this concept to skill acquisition. As a rough heuristic, the 10,000-hour rule is effective in grabbing people’s attention. However, both K. Anders Ericsson and Malcolm Gladwell would poke holes in that rule. The variables of this tenet are crazy:

  • At what age one starts practicing
  • How talented one is
  • One’s physical attributes
  • The definition of “deliberate practice”
  • Does a person hit the 10,000-hour mark and all of a sudden become a master? 

This article is one that keeps popping up in my research; it expands upon that questioning.

The biggest issue regarding this zeitgeist issue is this: Scientific studies may subsequently refute the findings.  Original scientists/authors who championed publicizing the results may put fences around the study’s findings. But the headline takeaway has already been assimilated by the public.

The 10,000-hour rule may as well be considered fact. Just like the average body temperature is still assumed to be 98.6 degrees, when in reality, the true number may be lower. With these commonly-accepted misperceptions nestled somewhat close to reality, there really is no reason to waste a lot of time refuting them. So what if my body temperature should be 97 degrees or 98 degrees? All that really matters is that when my body temperature gets over 100 degrees, I should worry. So what if I only spent 9,000 hours on a skill or that I will never reach the level of master in that discipline? I spent a lot of time on something and got better. Charles Barkley’s golf swing may disagree.

Are White Cars Safe?

We, as non-scientists, or non-experts on most subjects, have no reliable way to sort out misinformation from statistically correct trends (or results). We rely on news sources, who rely on scientific journals to vet information that is deemed worthy of our attention. When there is a bias in which research studies get highlighted, the public has no choice but to trust the published results and ends up distorting that bias even more.

One final example that I will leave you with regards to car colors and safety.

Some studies show that white is the safest car color, with other colors coming in at an average of 10% more accident-prone than white cars. These studies could be discounted when you realize that white is the baseline, and the headline finding is that all other colors average as less safe than white. If you dug into the data though, orange looks like it could be a better option. So is the study (and resulting headline) misleading? This study, on the other hand, shows silver cars are safer than white. You will notice, though, that the sample size is small, and even smaller for silver cars. Could that study be misleading as well?

Regardless of the actual data, it has been reported for decades that white is the safest option and those car buyers who prioritize safety will select a white car. No amount of studies or additional discourse will change this preference. The theory has already seeped into the public consciousness, and may as well be a fact.

Question: What are truths that you hold that are based on research or publications but that could be stress-tested? What are “facts” that you have accepted because everyone around you believes in them without ever looking into whether they are true or not?