Article · Research · Evidence · September 2016

Probability of Bad Sports Science: Statistically Significant?

Confoundings, anecdotal evidence, conflicts of interest, and p-hacking — and what sports science can do about it. Using a Dumb and Dumber analogy because it fits better than you'd think.

Key Issues in Sports Science Research

Confoundings

Complex Variables, Low Internal Validity

Did athletes change eating habits, sleep, warm-up methods, or practice time? Different players from the previous year? Most sports science studies are non-experimental without control groups or multiple measurement points.

Anecdotal Evidence

Weakest Research Design

A single unique event that lacks empirical research, is unverifiable using the scientific method, and has unreliable results with possible confoundings. Powerful for memory and communication — but dangerous as the basis for decisions.

Conflicts of Interest

Industry-Funded Research

Studies with financial conflicts of interest were five times more likely to present conclusions favorable to the funder. Technology companies, Big Pharma, and Big Food have all demonstrated selective publication and manipulated results.

P-Hacking

Fishing for Significance

Analyzing data in many different ways to get the result you want. Adding or dropping two subjects can cause a completely opposite result. The p-value of 0.05 does not mean what most people think it means.

Full Article

In our last article, we attempted to define Evidence-led Performance Research after previously providing an Emerging Definition of Sports Science. Here, we discuss confoundings, anecdotal evidence, conflicts of interest, and p-hacking, along with ideas to avoid the aforementioned issues with sports science research.

Bear with me as I use an analogy from the movie "Dumb and Dumber." The hypothesis is that if Lloyd is "cool" enough, then he will get the girl. When Lloyd Christmas asks the girl if he has a romantic chance with her, she states, "More like one in a million." And he states, "So you're telling me there's a chance!" So, like many science experiments, we perform an experiment and then fish for data later to decide what tests we want to perform to make it true — and maybe perform some p-hacking to back it up.

As a physical therapist in the changing world of healthcare with a desire for the best outcomes for our patients, evidence-based research has always been important. I have learned a lot investigating the state of research for this article, but at the same time feel less confident in the scientific method at a time when reputable sports science is needed with Big Data now available. The ability to finally quantify and qualify movement and not rely on a coach's "hunch" is weakened because there is not enough scientific research present — and mostly we are trying to prove something did not happen: injuries.

In the secretive world of pro sports, the Cowboys are not going to call the Redskins to have a mutual study of the effectiveness of the Catapult system in decreasing injuries. Many studies in sports science are non-experimental — defined as rooted in well-established research and based on strong expert opinion but without a control/comparison group or multiple measurement points — making it difficult to attribute observed changes to the program.

Confoundings

We know that there is anecdotal evidence of a decrease in injuries and improved performance when using sports science data from sensors, but actual randomized, double blind studies are lacking, along with a complex interaction of many variables leading to a lack of internal validity. Did they change their eating habits, warm-up/recovery methods, sleep patterns, practice time, or playing time? Are there different players from the previous year? Were training methods different, irrespective of data gained from sensors?

Jo Clubb states, "If we are trying to establish 'real' changes in test performance and ideally the 'smallest worthwhile change,' then we must try to account for all confounding variables." While a great concept, this sounds difficult to perform on some levels, such as in college sports with high turnover. According to Fergus Connolly, "At the most conservative estimate, using a manageable 150 player metric variables and just 6 injury predictors leaves you with over a billion different possibilities."

Anecdotal Evidence

Anecdotal evidence is a single unique event that lacks empirical research or sound theory, is unverifiable using the scientific method, and is the weakest concerning research design because of unreliable results with possible confoundings. We all love a good story, and the anecdote assists in remembering complex concepts by allowing us to connect to the experiences of others. The famous quote from Raymond Wolfinger — "The Plural of Anecdote is Data" — leads to the belief that anecdotes can be used by sports science and healthcare with complex algorithms for predictive analytics.

Bill Gardner states, "Case studies give only the factual side. We need to know what happened to the athlete who did not receive the training to infer causality. An action is a cause if doing it leads to an outcome, and not doing it would not. Again, we are trying to prove something did not happen by not having injuries."

Conflicts of Interest

It is amazing we have not learned from history — over 60 years after tobacco spent millions to buy off scientists to prove smoking does not cause cancer. Recently, an American professional league hired researchers and funded research on helmet sensors used to detect magnitude of hits on the head and created "a standard for accuracy that was unattainable," reportedly citing this research to suspend use of the sensors.

Big Pharma is guilty of selectively publishing studies that support their medications, setting inclusion criteria to select those most likely to respond to treatment, and manipulating the dose of both intervention and control drugs. Studies of physical activity funded by the beverage industry reach conclusions that tend to shift the blame for obesity away from bad diets. Systematic reviews with financial conflicts of interest were five times more likely to present a conclusion of no positive association between sugar-sweetened beverages and obesity than those without them.

P-Hacking

P-hacking is defined as fishing for a statistical difference by analyzing data in many different ways to get the result you want. Martin Buchheit gives a very informative and surprising p-hacking example — illustrated by a Yann Le Meur infographic — on how dropping or adding two subjects can cause a complete opposite result in a study conclusion because p-values and study conclusions are sample-size dependent.

Could you define a p-value? In science, "significance" usually means a P value of less than 0.05, or 1 in 20 — but this does not mean that the difference observed between two groups is functionally important. Based on middle-of-the-road assumptions, you'll need a P value around 0.0027 to achieve an error rate of about 5%. With all the doubts about science and a journal banning the use of p-values, the American Statistical Association released a statement that no single index should substitute for scientific reasoning and that business or policy decisions should not be based only on whether a p-value passes a specific threshold. Scientists actually joke that if statistics are needed to find a difference, it was not an important difference.

Possible Sports Science Research Solutions

Martin Buchheit recommends embracing magnitude-based inferences in sports science. Magnitude-Based Inference (MBI) is today well-established in clinical medicine, where practical/clinical significance often takes priority over statistical significance.

Establish a Rapid Learning Sports Science (RLSS) similar to the Rapid Learning Health System — an ongoing learning-and-improvement process with an automated rapid cycle system of randomized studies embedded in private big data collection that produces athlete outcomes and risk alerting in real time with predictive models utilizing machine learning. Utilizing an RL Health System could decrease research costs from millions to six figures and enable twenty years of research each year.

According to Stephen Smith of Kitman Labs: "Working with a team of 40–50 athletes, you may see 25 injuries a year, maybe less if you're lucky. Of these 25 injuries, maybe 5 of them will be the same. To build a large enough dataset to build predictive models here would require hundreds of years' worth of data from this one team. That's where the power of a large dataset comes into play."

Analytics should inform decision-making, not replace it. There must always be human, domain-specific intervention at the end to review and interpret the data. Because we are at the beginning of the emerging technological age of applied sports science, sports scientists have a chance to change the perception of research — at least in sports science.

Remember: look at the end of all articles for conflicts of interest and be wary if "funded by" is not included.

Disclosure: Funding was provided by an American small business owner of a physical therapy and sports performance company — meaning there is no funding. Daniel Chris Cothern is self-employed. There is no conflict of interest.