Chi-Square-And-Hypothesis-Testing

Chi Square & F Distribution

By: Dr. Shirl Smith

Big D scenario

Based upon the input from Units 1 and 2, you have just received your next assignment that will contribute to your next decision. For the outdoor sporting goods client, based upon your prior decision as to either expand to the next market or retain your current position, justify your decision further utilizing the Chi-Square Distribution tool.

Big D scenario

One key criterion point: You do not have adequate data to formulate a full Chi-Square for the outdoor sporting goods client. However, you do have sufficient data to initiate this process. You are charged to demonstrate the initial steps of a nonparametric test that are qualitative. Utilizing the null and alternative hypothesis, further present your justifications for your selection and what it means beyond the mere formulas. What is this going to tell the Board of Directors and contribute to the decision-making process?

Big D scenario

The following are assumptions you might make in this assignment that might make the assignment more helpful and make the responses more uniform:
Continue to utilize the Big D scenario. Work under the assumption that the sample is based upon two different proposed product lines.
Additionally, work under the assumption that the same demographics are utilized for each product.

A Little History

A bit of Trivia
The chi-squared distribution gets its name from the Greek letter “chi”, Χ2;, which looks like an “x”. Since it is based on squared variables they choose the name, Χ2, or chi-squared.

Theoretical background

The chi-squared distribution is a mathematical function that is derived from the normal distribution. Recall that a z-score is a standard normal random variable. If you square a z-score you still have a random variable, and its probabilities P(z2 < c) can be looked up in the appropriate table. Even a sum of k of these squares can be analyzed. The distribution of a sum of squared z-scores is called chi-squared distribution. It is based on the number of terms in the sum. For each independent term, you add a degree of freedom. Typically, the degree of freedom of the distribution is the number of categories minus 1. What this means is that if you add a bunch of squared terms each of which has been standardized (the mean subtracted and then divided by s), the result has values that can be analyzed and compared to values in a probability table. This table is called the chi-squared table.

The chi-squared distribution is used if you can put your data in a contingency table. Contingency tables are tabulated data where every point is tallied in exactly once. Once tallied, you can use the chi-square statistic (i.e. the formula) to test if one cell in the table has a higher (lower) than expected tally, or in the case of groups, if one group has higher than expected tallies.

Formula

If you have a table of observed tabulated (count) values and you wish to compare the observed counts to those based on some sort of theory, and then you can obtain a chi-square statistic as follows:

Basic assumptions

The following chi-squared distribution may be used as long as:

The observations are independent.

Every observation is tallied exactly once.

None of the expected values are less than 5.

Application used

There are two applications that you will be asked to learn. The first tests if two (or more) proportions are all the same. A typical application would be to test if 6 sided die has each part turn up with equal likelihood. Here the expected value for each side is one sixth of the number of roles, and the degree of freedom is 5, one less than the number of categories. The test statistic for the general case that has k possible outcomes is:

Application used Cont…

where the Expi = ( 1⁄k )·Total, and degrees of freedom = k−1

Example

(Comparing multiple proportions)

A university has open house three times in the spring, and attendance is as follows. March – 55 visitors, April – 73 visitors and in May 52 visitors. Test if each month had about a third of the visitors.
Set up null hypothesis: Ho: Each month had 1/3 of the visitors.
Get data and determine the expected number of visitors per month if each month had a third of the total.
Total visitors = 55 + 73 + 52 = 180. Thus the expected number per month = (1/3)180 = 60 visitors
Choose test: Decide on the chi-square test with 2 degrees of freedoml.
Calculate the test statistic and p-value:
Χ² = (55−60)²⁄60 + (73−60)²⁄60 +(52−60)²⁄60 = 4.3.
p-value = P( Χ² > 4.3) = 0.116
State Conclusion: the data does not show that the one month is preferred over another.

Example Cont…

Note even though I use the word “proportions”, we actually only use the tallies or count values. A more common application of this statistical test is to determine if two groups have the same set of proportions for a set of a categorical variable. In other words, this test can be used to determine if one trait (say a person’s gender) has bearing on the outcome of another trait (how that person voted.) The process for this test is similar to the previous one except that the expected values require you to adjust the number of representatives from each group. Thus:

Example Cont….

where the Expi, j = (Total in group “i”)·(Total in category “j”) ⁄ (Grand Total),
and degrees of freedom = # of categories −1

Another Example Cont…

(Comparing two distributions)

Is there a difference in how people vote depending on their gender? Here is a typical exit poll result from the 2008 election. Of the 200 male voters, 85 voted democratic, 90 voted republican, and the rest for another party. Of the 300 women voters polled, 165 voted democratic and 130 republican. Set up a contingency table and test if the two groups voted differently.
Set up null hypothesis: Ho: The two groups voted in a similar distribution..
Get data and determine the expected number of votes if the two groups were identical.

Another Example Cont…

Observed Values	Expected Values
Dem	Rep	Other	Total		Dem	Rep	Other	Total
Male	85	100	15	200		200×(.5) = 100	200×(.46) = 92	200×(.04) = 8	200
Female	165	130	5	300		300×(.5) = 150	300×(.46) =138	300×(.04) = 12	300
Total	250	230	20	500		250 (50%)	230 (46%)	20 (4%)	500

Another Example Cont…

Choose test: Decide on the chi-square test with 2 degrees of freedom.
Calculate the test statistic and p-value:
Χ² = (85−100)²⁄ 100 + (100−92)²⁄ 92 +(15−8)²⁄ 8 +
+ (165−150)²⁄ 150 + (130−138)²⁄ 138 +(5−12)²⁄ 12 = 15.1.
p-value = P( Χ² > 15.1) = 0.0005
State Conclusion: the data shows that there is a difference between the voting patterns of men and women.
Additional Applications
The chi-square test has other applications. Two common ones are:
A test to analyze the sample variance and to evaluate its confidence interval, and
A test to compare, an observed distribution to a theoretical one.

Technology

The Chi-square distribution is easily calculated on your calculator or excel.

Technology Cont…

*The “Χ²cdf()” command is located in the “DISTR” menu.
+chthree sections of a course and course is off the expected value of row “i” column “j” is: Expected value = (total of row “i”)×(total of A column “j”) / Grand total

F Distribution

The F distribution is the probability distribution associated with the f statistic. In this lesson, we show how to compute an f statistic and how to find probabilities associated with specific f statistic values.
The f Statistic
The f statistic, also known as an f value, is a random variable that has an F distribution. (We discuss the F distribution in the next section.)
Here are the steps required to compute an f statistic:
Select a random sample of size n1 from a normal population, having a standard deviation equal to σ1.
Select an independent random sample of size n2 from a normal population, having a standard deviation equal to σ2.
The f statistic is the ratio of s12/σ12 and s22/σ22.
The following equivalent equations are commonly used to compute an f statistic:
f = [ s12/σ12 ] / [ s22/σ22 ]
f = [ s12 * σ22 ] / [ s22 * σ12 ]
f = [ Χ21 / v1 ] / [ Χ22 / v2 ]
f = [ Χ21 * v2 ] / [ Χ22 * v1 ]

F Distribution

where σ1 is the standard deviation of population 1, s1 is the standard deviation of the sample drawn from population 1, σ2 is the standard deviation of population 2, s2 is the standard deviation of the sample drawn from population 2, Χ21 is the chi-square statistic for the sample drawn from population 1, v1 is the degrees of freedom for Χ21, Χ22 is the chi-square statistic for the sample drawn from population 2, and v2 is the degrees of freedom for Χ22 . Note that degrees of freedom v1 = n1 – 1, and degrees of freedom v2 = n2 – 1 .

The F Distribution

The distribution of all possible values of the f statistic is called an F distribution, with v1 = n1 – 1 and v2 = n2 – 1 degrees of freedom.
The curve of the F distribution depends on the degrees of freedom, v1 and v2. When describing an F distribution, the number of degrees of freedom associated with the standard deviation in the numerator of the f statistic is always stated first. Thus, f(5, 9) would refer to an F distribution with v1 = 5 and v2 = 9 degrees of freedom; whereas f(9, 5) would refer to an F distribution with v1 = 9 and v2 = 5 degrees of freedom. Note that the curve represented by f(5, 9) would differ from the curve represented by f(9, 5).
The F distribution has the following properties:
The mean of the distribution is equal to v2 / ( v2 – 2 ) for v2 > 2.
The variance is equal to [ 2 * v22 * ( v1 + v1 – 2 ) ] / [ v1 * ( v2 – 2 )2 * ( v2 – 4 ) ] for v2 > 4.

Cumulative Probability and the F Distribution

Every f statistic can be associated with a unique cumulative probability. This cumulative probability represents the likelihood that the f statistic is less than or equal to a specified value.

Statisticians use fα to represent the value of an f statistic having a cumulative probability of (1 – α). For example, suppose we were interested in the f statistic having a cumulative probability of 0.95. We would refer to that f statistic as f0.05, since (1 – 0.95) = 0.05.
Of course, to find the value of fα, we would need to know the degrees of freedom, v1 and v2. Notationally, the degrees of freedom appear in parentheses as follows: fα(v1,v2). Thus, f0.05(5, 7) refers to value of the f statistic having a cumulative probability of 0.95, v1 = 5 degrees of freedom, and v2 = 7 degrees of freedom.
The easiest way to find the value of a particular f statistic is to use the F Distribution Calculator, a free tool provided by Stat Trek. For example, the value of f0.05(5, 7) is 3.97. The use of the F Distribution Calculator is illustrated in the examples below.

Cumulative Probability and the F Distribution

F Distribution Calculator
The F Distribution Calculator solves common statistics problems, based on the F distribution. The calculator computes cumulative probabilities, based on simple inputs. Clear instructions guide you to an accurate solution, quickly and easily. If anything is unclear, frequently-asked questions and sample problems provide straightforward explanations. The calculator is free. It can be found under the Stat Tables tab, which appears in the header of every Stat Trek web page.

Probability problem Excel command TI 83/84 command

P ( Χ² < c ) = ? =CHIDIST( c, df ) Χ²cdf(c, 99999, df)*

DISCLAIMER

QUICK LINKS

CONTACT INFO

EMAIL:

We accept