Chi-Square Test Calculator | χ², p-value, and Critical Value from Observed vs Expected
Compute the goodness-of-fit chi-square statistic, degrees of freedom, upper-tail p-value, and critical value from a list of observed and expected frequencies. The reject / fail-to-reject verdict against your chosen alpha is decided right in the browser.
💡 About this tool
Is a die loaded? Do menu orders match the mix you planned? Are clicks in an A/B test spread evenly across variants? The chi-square goodness-of-fit test puts a number on how far a set of observed counts strays from the counts a hypothesis predicts.
The formula is short — for each category, square the difference between observed and expected, divide by expected, and sum: χ² = Σ (O − E)² / E. The hard part is deciding whether that χ² is just noise or a real departure. That call needs the upper-tail probability of the chi-square distribution at the right degrees of freedom (the p-value) plus the critical value for your alpha. Done by hand, you reach for a printed table and interpolate whenever your df isn't listed.
This calculator evaluates the p-value directly through the incomplete gamma function — a series expansion below the mode and a continued fraction above it, converged to 1e-14 — so it returns an exact value at any degrees of freedom without a table lookup. The critical value comes from a bisection solve of the inverse CDF. Paste your two lists and you get χ², df (categories − 1), p-value, critical value, and the verdict.
🧐 Frequently Asked Questions
Q. Should I read the p-value or the critical value to decide? Either one lands on the same conclusion. Reject the null (observed follows expected) when p < α, or equivalently when χ² exceeds the critical value. The tool shows both so you can quote whichever your write-up or grader expects.
Q. Why is df "categories − 1"? Because the observed totals are constrained to equal the expected total, one degree of freedom is spent. If you estimated parameters from the data to build the expected counts (say, fitting a distribution after estimating its mean), you lose one more df per estimated parameter. This tool treats the expected counts as given — the textbook one-way goodness-of-fit case.
Q. How do I set the expected counts? For a "uniform" null, use total ÷ number of categories in every cell. For a fixed ratio, use total × each proportion. Remember the expected values are counts, not probabilities or percentages. If many cells have an expected count below 5 the approximation weakens, so merge categories or switch to an exact test instead.
Q. Can I run it when observed and expected totals don't match? The arithmetic still runs, but goodness-of-fit assumes the totals agree. When the expected total differs from the observed total, the df interpretation and the p-value lose their meaning. Enter expected values as "the observed total allocated by your theoretical proportions."
Q. Does it handle a test of independence (contingency table)? This tool is built for the one-dimensional goodness-of-fit test. For a 2 × 2 or r × c table testing independence, df becomes (rows − 1) × (columns − 1) and the expected counts come from the row and column margins, so you'd flatten those margin-based expecteds into the two lists yourself.
📚 Fun Facts
Karl Pearson introduced the chi-square test in 1900, making it one of the very first significance tests in statistics. The Greek letter χ² is used because the statistic is approximately a sum of squared standard-normal variables. As degrees of freedom grow, the chi-square distribution drifts toward a normal shape with its peak near χ² ≈ df. In practice the test covers goodness-of-fit, independence, and homogeneity, and it still shows up in machine learning as a feature-selection score for measuring how strongly a categorical feature relates to the target.