In the world of greenhouse gas (GHG) emissions research, understanding the role of statistical power and sample sizes can make a big difference to the studies you are planning. In GHG research, we often want to measure things like methane emissions from livestock. Since we cannot measure every single animal, farm or paddock, we collect samples.
Although working through what you need might sound daunting, let’s break it down in a way that makes sense, even if you don’t have a stats background.
The sample size needed depends on a few key inter-related components
- Statistical power
- Confidence
- The difference you are wanting or expecting to detect
- How variable the thing you are measuring is
1) Statistical Power
In the context of research, statistical power tells us how likely it is that a study will detect an effect if there truly is one.
So in the context of emissions, if a product truly can reduce emissions, the power is telling us how likely the study is going to find that. Therefore, the power of a study mostly affects the study funder or developer – it is their risk appetite of how much they want to invest.
Why does statistical power matter?
The greater the power, the more likely that a co will find the result that they want. But the study size and budget increases exponentially as you increase power – hence 80% is often the sweet spot and used for most study situations. The lower the power, the smaller the sample size and budget, but also less likely to be able to identify a positive effect – a study with low power is basically another name for a ‘pilot-study’.
2) Confidence: How Sure Do You Want to Be?
Confidence is about how certain you want to be that your results are not just due to chance. It’s like asking: “If we repeated this study 100 times, how often would we expect to see a similar result?”
Why does confidence matter?
The higher the confidence level you aim for, the more animals (or samples) you’ll need. That’s because you’re being stricter about the evidence required to say, “Yes, this works!”
It’s a trade-off:
- Higher confidence = more credibility, but also more cost and larger sample sizes.
- Lower confidence = smaller study, faster results, but greater risk of false positives (or jumping the gun on a promising product).
Confidence level is primarily a concern for consumers and regulatory bodies; in practice, regulators generally require confidence intervals of 90% or higher.
3) The Difference
To work out how many samples (e.g., animals) you need for your study, we need to know the difference you are looking for. For example, if you have a new product that is designed to decrease methane emissions from livestock, what difference (or percentage reduction) are you expecting? Are you after a 40% reduction or a 10% reduction?
Why does knowing the difference you’re looking for matter?
This makes a very big difference to how many animals you need. The smaller the difference you are looking for, the more animals you will need to detect the difference with the confidence you have set.

4) How Variable the Outcome Measure Is
Variation in the outcome measure drives how many animals you need. Variability is the spread in what you are measuring across animals, for example daily methane emissions. That spread can reflect true biological differences, with differences in variation noted between age, breed and species, but can also be due to the accuracy of the technology measuring methane.
Why does outcome variability matter?
Greater variability in methane measurements makes it harder to detect a real treatment effect without increasing sample size (or reducing power or confidence). Therefore, it is preferable to select a study population that is as similar as possible, using a validated and repeatable measuring technology. This will reduce variation and result in you achieving the same statistical power with fewer subjects.
Example: Sample Size Implications of 95% vs 90% Confidence
Assume a methane-reducing product has the following characteristics:
- Difference expected/effect size: a 20% reduction in methane output
- Mean baseline methane emissions: 200 g/day
- Standard deviation (a measure of variation): 50g/day
- Power: 80%
- Compare confidence levels: 95% vs 90% (equivalent to p=0.05 and p=0.10)
Decreasing confidence from 95% to 90% in this instance would reduce the number of animals required by 12 (23%)
Confidence Level | Total numbers |
95% | ~52 |
90% | ~40 |
So there you have it
When considering the sample size required for a study, the following four factors should be considered
- Statistical power
- Confidence
- The expected difference
- How variable the outcome measure
These factors are connected, and considering them together can make a study more efficient and animal-friendly. Improving power doesn’t always mean enrolling more animals; for instance, measuring emissions over a longer period reduces variability, which can achieve the same power and confidence with fewer animals.