t-test & ANOVA (Analysis of Variance)

What are they? The t-test is a method that determines whether two populations are statistically different from each other, whereas ANOVA determines whether three or more populations are statistically different from each other. Both of them look at the difference in means and the spread of the distributions (i.e., variance) across groups; however, the ways that they determine the statistical significance are different.

When are they used? These tests are performed when 1) the samples are independent of each other and 2) have (approximately) normal distributions or when the sample number is high (e.g., > 30 per group). More samples are better, but the tests can be performed with as little as 3 samples per condition.

How do they work?

t-test Example

We want to determine whether the concentration of Proteins 1 – 4 in serum are significantly different between healthy and diseased patients. A t-test is performed, which can be visually explained by plotting the protein concentration on the X-axis and the frequency along the Y-axis of the two proteins on the same graph (Figures 1 – 4).

Proteins 1 & 2 have the same difference in protein concentration means but different group variances. Alternatively, Proteins 3 & 4 have similar variances but Protein 4 has a larger difference in protein concentration means between the patient groups.

A t-test assigns a “t” test statistic value to each biomarker. A good differential biomarker, represented by little to no overlap of the distributions and a large difference in means, would have a high “t” value.

Which is a better biomarker of disease: Protein 1 or Protein 2? Protein 1

Which is a better biomarker of disease: Protein 3 or Protein 4? Protein 4

What type of statistical value do I get? The t-test and ANOVA produce a test statistic value (“t” or “F”, respectively), which is converted into a “p-value.” A p-value is the probability that the null hypothesis – that both (or all) populations are the same – is true. In other words, a lower p-value reflects a value that is more significantly different across populations. Biomarkers with significant differences between sample populations have p-values ≤ 0.05.

58 Feedbacks on “t-test & ANOVA (Analysis of Variance)”

  1. I have three populations of cells. Population 1 is my control, untreated cells. Population 2 is my sham treated cells, and population three is my treated cells. Can I use student t-test to determine a statistical difference comparing pop 1 to pop 2, and pop 3 to pop1, and pop 2 to pop 3, or must I use a one-way ANOVA to compare all three? And what is the rationale?

    1. Great question!

      1.) The t-test could be used, but is not recommended. The t-test can determine differences between two groups, but is not recommended for multi-group comparisons because the alpha level (i.e., significance level) must be set lower than the standard 0.05.

      An alpha of 0.05 is another way of saying that there is a 5% chance that the data would be (falsely) identified as significant (i.e., rejecting the null hypothesis). On the flip side, that would mean that 95% of the data would be accurately assigned (i.e., level of confidence). If the t-test were to be implemented here, the alpha value would need to be adjusted to maintain a 95% level of confidence. Using an alpha = 0.05, the level of confidence would actually be 85.7% (0.95^3=0.857). Therefore, to get a 95% level of confidence (0.983^3 = 0.95), the alpha level should be set at 0.017.

      Unfortunately, this low alpha threshold can result in missing statistically significant markers that would otherwise be identified with analyses where the alpha could be set at 0.05.

      2.) ANOVA should be used, but should also be accompanied by a post-hoc test. ANOVA can determine whether there is a difference between the groups, but cannot determine which group contributes to the difference. For a single variable test, ANOVA can be used first. To account for the multiple comparisons, the ANOVA data should be analyzed with another test (e.g., Duncan, Newman-Keuls) where the alpha can be set at 0.05.

      It is also important to mention the sample size. You mentioned that you had three populations of cells. A minimum of 3 biological replicates should be used to conduct initial statistical comparisons to understand the effect size (signal) and variance (noise). This information will determine the sample size that you’ll need to ensure that the power is no less than 0.80. Notably, more accurate information will be obtained with a larger sample set in the pilot study. Sometimes, the signal and noise are known a priori; in this case, a pilot study may not be needed.

  2. So I have got 5 conditions and 10 trials for each condition. Im comparing the growth in plants due to a change light spectrum color’s. Im conducting this experiment with 3 different plants, meaning for every plant I have a set of data of 5 condtions and 10 trials. Is it better to use an ANOVA test? and does it depend on my error bars which test I should use(ive been hearing that).

    1. Yes, you should repeated measures ANOVA under a generalized linear model (https://www.statisticssolutions.com/conduct-interpret-repeated-measures-anova/). If the case of normality is not met (https://en.wikipedia.org/wiki/Normality_test), the data should be pre-processed appropriately before inputting the data into the models. For example, you may need to transform your data by converting everything to the log format. The type of pre-processing that you should use will be based on the characteristics of the original data. If the data are generated from a high throughput platform like a microarray or next generation sequencing, please refer to the R package limma (https://www.bioconductor.org/packages/release/bioc/html/limma.html).

  3. I want determine which of the 2 exercises is more effective in decreasing cholesterol levels (baseline -post exercise)

    1. Both paired-t-test and two-way ANOVA could be used since only 1 factor (baseline versus post-treatment) is being compared. If you want to analyze additional factors or groups, such as different treatments, two-way ANOVA with repeated measures would be better. You can learn more about ANOVA with repeated measures here: https://www.statisticssolutions.com/conduct-interpret-repeated-measures-anova/. This is because ANOVA considers the variance across ALL treatment groups whereas a t-test analysis would have to be performed within EACH treatment group.

      Please note that both t-test and ANOVA require normality of data distribution so that sometimes appropriate data preprocessing (i.e., transformation) is necessary.

  4. Hi, I am doing a statistical analysis on my paper on patients who underwent surgery for vitreous hemorrhage. By diagnosis, it came out as this, subretinal pathology 57, retinovascular diseases 54, trauma 6, retinal detachment 6, and vasoproliferative tumors 3. Can I use ANOVA to compare the pre-operative and post-operative results among the population with more than 3 itmes. Thanks

  5. I’m doing my bachelors level research in which I have one independent and two dependent variables. I’m seeing the impact of the independent variable on the two dependent variables separately. The two dependent variables have no connection with each other. The participants for the study are 60 and are not grouped.
    Which test would be recommended for it?

  6. I did an experiment to look at a treatments for an injury. I have 5 groups: sham injury, injury w/o treatment, and then injury + treatments 1, 2, or 3.

    We use sham groups to prove that the injury model did, in fact, cause an injury. In my statistical analysis, should I:
    a) compare sham v. injury w/ treatment with t-test, then injured w/o treatment to injured+treatments with ANOVA?
    b) compare all 5 groups with ANOVA, then post-hoc analysis?

    Having the sham data in the same ANOVA set with the injured and treated animals is adding a second variable. Variance should be equal within groups, but means should be different.

    1. Both of the strategies would work.

      The first method is a two-stage approach with easily interpretable results because it focuses on the primary objective of the study: the effect of treatment following injury.

      The second method is a one-way ANOVA that has statistical “integrity.” In other words, all of the data can be analyzed at one time without having to perform separate tests or reuse the data. From a statistical point-of-view, this approach is better than the first.

      There is also a third way to analyze the data if you have a “no injury but received treatment” group. You can perform a 2-way ANOVA by creating two grouping variables (injury and treatment) for each animal. This method is relevant if you are interested in the individual effect of injury and treatment, as well as their potential interactions. To learn more about 2-way ANOVA, please refer to http://www.sthda.com/english/wiki/two-way-anova-test-in-r.

  7. Hello! I am experimenting on the productivity of the students with respect to their time preferences. The first condition is that, in the first 5 days, the student will do his/her homework in the morning (8 AM to 4 PM). The second condition is that, in the next 5 days, the students will do his/her homework in the evening (8 PM to 4 AM). What test can I use to further interpret my results?

    1. Paired t-test. It will be better if the students can be randomized into two branches, one branch switching from morning to evening, and the other switching from evening to morning. Thus the effect of proficiency gain after 5-day monitored practicing will be balanced.

  8. I am comparing 2 unrelated parameters (expressed in 3 different ways, all in numerical values) between control and disease group. The disease group is further sub-divided based on severity of abnormality. The numbers are small, as its a pilot study.
    Which test should be used to assess differences between groups?
    I have used unpaired t test initially, but want to be sure, if its the right way.

    1. For a pilot study, one-way ANOVA for each disease state (e.g., disease-free, diseased) would be the appropriate choice. Data should be appropriately pre-processed (i.e., transformed), if necessary. This will enable any follow-up studies to focus on the information pertaining to each disease sub-group.

  9. Hello. I’m running an experiment to determine which agar and treatment I should use for my protocol. I’m testing two different agars. Each agar will undergo different 4 different conditions. Each condition has 3 levels. Which test should i use to analyse my data?

    1. Whether you use one-way or two-way ANOVA will depend on the objective of your study. If the objective of this study is to find the “best” condition-level combination, then you should use one-way ANOVA. However, if you’d rather evaluate the effect of the conditions and levels, then two-way ANOVA with interaction analysis would be more appropriate.

  10. My data is not normally distributed even after using log, sqrt or cuberoot. Do I use non-parametric test?

    1. Yes. A non-parametric test is a safe choice when the normality cannot be achieved after data transformation.

  11. Hi! I am writing an Internal Assessment on the calcium content in six different types of tofu using EDTA. There are 5 trials per tofu, would ANOVA test work for this experiment?

    1. Yes, an ANOVA test would be appropriate. It is a 6-group balanced investigation as there are 6 different types of tofu. Since each “trial” (5 trials per tofu) represents a sample, there will be a total of 30 samples.

  12. Hello sir
    I’m doing research on differences between 8 food samples divided in 2 groups. Is it okay to conduct one way anova and post hoc test ?

    1. For two-groups, the t-test is a good choice. ANOVA will also give similar results, although a post-hoc test is not required for two-group comparisons. If you want to do multiple comparisons between samples, first perform ANOVA on multiple groups and then perform post-hoc analysis between various pairs of groups.

  13. Hi, am comparing the retention level of total phenolic content of spices before and after cook, then compare retention level of 3 different amount of spices used after cook. Should i use ANOVA or a paired t-test?

    1. To compare the retention level of total phenolic content before and after cooking, a paired t-test would be appropriate. To compare the dose-dependent effect of spices on retention level, use ANOVA with repeated measures: https://statistics.laerd.com/spss-tutorials/two-way-repeated-measures-anova-using-spss-statistics.php This approach enables multiple comparisons within a single analysis. You may also want to perform a scatter plot with regression.

  14. I am comparing 2 types of schools and 3 different levels of support for distance learning of those schools. Should I use independent t tests, or ANOVA?

  15. I’m comparing two types of menus in interface design, and I want to find out which is better between them. I have just one group consisting of 20 people who did both the menu test. Which would be the best way to analyze this. Just one-way ANOVA? Do I need t-test as well? Or is there a better method. Thanks

  16. hi i want to determine the differences in CAD currency between March 2020 and April 2020. what method should i use? is it okay for t-test?

    1. While the t-test could provide a p-value, it may not be appropriate for this comparison. In the case you present, the CAD currency data would be dependent to previous data. This would violate a t-test assumption that the data follow an independent distribution (i.e., the collected data are independent of the data collected previously). For your type of data, a time-series analysis may be a better approach; learn more here: https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm

  17. i want to determine the significance difference of hormones concentration in wastewater with the concentration in receiving river. which statistical analysis should i use?

    1. You can use either a paired t-test or ANOVA with repeated measurements. ANOVA with repeated measurements would be appropriate if the samples from the wastewater and receiving river were collected simultaneously at various time points. ANOVA may also answer an additional question – whether the hormone concentrations change over time.

  18. i have four types of pollutants and i analyse the concentration of that pollutants in both wastewater and river. I want to see the significance different between the concentration in wastewater and river. which test are suitable?

    1. If the samples from wastewater and receiving water were collected separately without a one-to-one corresponding relationship, the t-test will be sufficient for the two-group setting.

  19. hi, I have 1 independent variable (promotion framing type) with 2 categories (percentage and dollar) and 1 dependent variable (purchase intention). I want to know which promotion framing results in a higher purchase intention.

    My hypothesis is: Dollar off (vs percent off) discount framing has a stronger positive effect on purchase intention for consumers.

    What is the best test to use for this? My sample size is 200 people.

    Thank you

    1. The type of test that should be used depends on 1) whether the two promotions were placed on the same framing type and 2) if both of the promotions were shown to the same person. If both of these stipulations were followed, you should use a paired t-test. If only one promotion was marketed to a person, then you should use a t-test. Since your hypothesis is that the promotions will not result in the same purchasing effect, you may use the one-side probability.

  20. Hi!
    I have 2 populations of different types of Alzheimer’s disease and a control group. Can I use t-test to compare each group with every other, taking into account that they were not subjected to any treatment? I just want to know if they’re naturally different to each other (I understood that ANOVA helps if I want to compare the effect of a treatment in more than two experimental groups, but in this case, the groups are not subjected to any treatment, I want to check for their natural differences).

    Thank you very much!

    1. ANOVA is still the appropriate analysis method. You can consider the groups you are testing as those at different levels of Alzheimer’s disease. For example; control group = level 0; Alzheimer’s disease, type 1 = level 1; and Alzheimer’s disease, type 2 = level 2.

  21. I comparing 3 schools doing online learning with 13 different factors/observations. I was planning to use a one sample t-test across the different factors to test which ones are significant which we can deep dive into in follow up research.

    Concurrently, i am planning to use ANOVA to compare the factors/observations which didn’t fare well across the 3 schools to analyze the cause-effect.

    Does this seem like a decent plan?

    Thank you so much for seemingly reverting to every single question here!

    1. The following information is based on the assumption that the objective of the study is to determine the impact of “online” learning based on some performance measurement(s). As a pilot study, it would be fair to explore the “online” effect at the level of a school, but the power of a sample with only 3 schools may not be strong enough to detect subtle effects. In other words, a sample size of 3 is low for statistical analysis, such that the mean and standard deviation (SD) cannot be estimated accurately. You may want to consider using units that are smaller than a school for this analysis, such as classes or students.

      Finally, you mentioned using a one-sample t-test, which implies that there will not be a control. However, some statistics (mean and SD) from the ENTIRE population is historically available. Using a control is advisable.

      Thank you for your kind words! In case you’re interested, we have explained other biostatistical methods as well: https://raybiotech.com/learning-center/common-biostatistical-methods-explained/

  22. im conducting a research which is to determine what kind of microbes is in the surface specially doorknob in public restroom, what should i use?

    1. It depends on the objective of the study. If the objective is to determine whether a microbe-of-interest is on the doorknob such that the result is either “yes” or “no,” statistical analysis may not be necessary. If the objective is to determine how the counts of specific microbes on the doorknobs differ in different buildings/locations, I would performing a comparison based on Poisson distribution.

  23. Hi,
    I have three mouse groups — wild type, mouse model 1, mouse model 2 (mouse model 1 and 2 are two different Alzheimer’s disease models). Each group has 4 different individuals.
    If I am only interested in the comparisons of model 1 with wild type and model 2 with wild type, Should I use ANOVA or t-test?
    If I am interested in the all three groups comparisons which test should I use?

    1. For the two-group comparison, the t-test would be a good choice. However, ANOVA with post-hoc analysis will be better. To perform an analysis on all of the groups, ANOVA is necessary.

  24. Thanks for your reply. I have one more question. Regarding to two-group comarison, you answered both t-test and ANOVA are fine. Then in the scientific paper, is it ok to show both ANOVA and t-test results instead of choosing one in the same figure? Or I have to choose one?

    1. It is uncommon to present results from more than one statistical approach in one paper, except for studies that aim to compare the performance of different statistical methods. Generally, as the hypothesis is tested more with different statistical approaches and more p-values are acquired, the less confidence we can establish in the data (or p-values).

      Therefore, please pick one approach and report the results from that approach.

  25. I have four independent variables that i am using to predict the dependent variable. however, two of the independent variables are interrelated and one of them has some measurement variables. i had used Anova to account for the variations in one of the independent variables but stack on how to deal with the two interdependent independent variables in a simplified way. My plan is to have a simplified multiple regression equation factoring in all the variables

  26. Hello, Thank you so much for the article. I will appreciate it if you can take a look at something for me.
    I have several mixes of materials that are used in the same kind of test. Let’s say the sample size of each mix is 12 and there are 7 different mixes. This test measures how much energy to fracture the materials. I also know what is the behavior of the materials from the load versus displacement curves.

    To compare the average energy between mixes, should I use T-Test or ANOVA? If the result I get from T-Test is a mix of both significantly different and not significantly different (not in a clear pattern), how should I proceed?

    For the load vs displacement curve, I only want to look at post peak data and I divide that section post peak into 10% increment sections (95%-85%, 85%-75% …) and I’m interested in wanting to know whether these sections have any effect on each other. I know that these sections do not describe the same thing, but can I use either T-Test or ANOVA to do my analysis?

    Thank you so much.

    1. ANOVA with post-hoc on 7 groups of samples will be appropriate for comparing the average energy to break different types of material.

      For the load vs displacement curve, comparing the the maximum post-peak displacement (peak to breaking) may be worth trying. Another measurement could be post-peak rate per increment section. ANOVA still can be applied here for comparing different types of material.

      For your section-by-section data, it is obvious that the data are auto-correlated. In other words, section1 must have happened before section2. Accordingly, ANOVA may not be appropriate for analyzing the ‘section effect’ due to the need for independent data points. Perhaps you can take a look at Time Series analysis, which is common in analyzing economics data.

  27. Good evening,

    Just like to confirm, my experiment consists of two experimental groups: control and intervention group. We are analysing the baseline results and the ‘after ten weeks’ results using EORTC-CLC30 numerical scores.

    Would a Repeated measure one way anova be the appropriate statistical test for such an experiment? Would a paired t-test be appropriate too?

    Thank you

    1. Yes ,repeated measure one-way ANOVA would be appropriate. A paired t-test can be applied to compare the ‘baseline’ and ‘after-treatment’ groups. In case you do perform a t-test, please make a derivative measurement from the ‘baseline’ and ‘after-treatment’ scores (e.g., percentage of change) and then apply the t-test on two groups.

  28. Hello – I am proposing a research design – where I am evaluating the influence of police-body-cameras on police productivity and police brutality. I am measuring police productivity through the number of police actions taken (citations and arrests); I am measuring police brutality through the number of “use of force” instances and through the number of citizen complaints of officer misconduct. The two police department sites would run independent studies. Randomly sampling will be used to select 120 patrol officers from the Duty List at each site; then random sampling will be used to place those 120 into three respective groups: treatment group – given automatic body cameras (N=40), control group A – given traditional body cameras (N=40), and control group B – given no cameras. Data collection would occur for a 9 month period. Participants would not be told of the study, as it may impact their behaviors, knowing they are being closely monitored for specific actions/behaviors.

    Would an ANOVA test be appropriate? And how would I calculate the effect size?

    1. From what it sounds like, this is “rates comparison” problem. In other words, the outcome is proportional to the brutality (of the action taken). ANOVA may not be appropriate for such a comparison.

      I suggest trying a Chi-squared test where the effect size would be the difference in brutality rates across the groups. An even better analysis approach would be the McNemar test (or paired Chi-squared test). This would allow you to perform a cross-over design analysis, which would account for a particular police officer wearing or not wearing camera for a predefined pattern/period.

Leave a Reply

Your email address will not be published. Required fields are marked *