Lecture 17 - Chi-Square
6/29/05
- Non-Parametric vs. Parametric
- Up until now, all of the statistical tests we’ve been using have been parametric
- DVs have been measured in interval or ratio scales
- score on a scale measuring a construct (I.e. depression)
- directly measured characteristic (age, height, weight)
- calculated descriptive statistics for the sample
- Tested differences of sample statistics from population parameters
- Non-parametric tests
- Use primarily nominal data
- I.e. sex, high/low splits, color, name of favorite restaurant
- tabulate frequencies for each category - how many people in each one?
- # men, # women
- # high, # low
- # who like green, # who like purple, # who like red
- # @ Strawn’s, # @ George’s, # @ Lil’ Joe’s
- Generally don’t make inferences about the larger population --> no use of population parameters
- Chi-Square Goodness of Fit
- Looking at one variable/factor
- Are the numbers of people in each of the categories of that variable out of the ordinary? (Do they “fit” with what we expect?)
- I.e., are there many more men than women; an inordinate amount of people picking green over purple, red, and blue; does one restaurant get a greater number of customers than the others
- Null hypotheses
- No preference - the # of people in each category should be evenly distributed
- No difference from comparison population - or what might be expected for a variety of reasons
- Data
- count up the people in each category
- = observed frequencies - fo
- make sure each person only gets counted once!
- Expected Frequencies
- derived from the null hypothesis
- how many people should be in each category given the number I counted up and the % I’m expecting in each?
- No Preference null
- Split n into equal parts
- 25% of 185 = 46.25
- This gives us the expected frequency for each cell
fe = pn, where p = %
- No difference from comparison null
- Split n into parts determined by expected percentages
- Chi-Square statistic
- For each cell
- subtract expected value from observed value
- square it (so all values end up positive)
- divide by the expected value
- keeps things relative
- a 20 pt difference shouldn’t be a big deal when fe is already 200, but it should be a big deal when fe is only 10
- Add up results for each cell
- Against no difference against comparison null
- Chi-Square distribution and df
- like the F distribution, chi-square distribution is skewed and has all-positive values (because we’ve squared some numbers)
- like the t distribution, it has a different shape for each df
- gets more “normal” with bigger df
- df = C - 1
- # columns minus one
- for our example, df =
- not related to sample size!
- All but the last column are free to vary
- look up critical value in table (pg 699)
- df = 3, alpha = .05
- can only do one tailed - no such thing as “direction” with nominal data
- critical value =
- calculated value =
- Reporting the result
- The number of people at each establishment is not out of the ordinary given their seating capacities, chi-square (3, n = 185) = 7.34, p > .05.
- Notice that we need to report n here - because it’s not reflected in the df as it had been before.
- Chi-Square Test for Independence
- Used with 2 variables (factors)
- do the factors depend on each other?
- Is being in a particular category of one factor related to being more likely to be in a particular category on the other factor?
- I.e. are people more likely to eat at one place or another if they are students or faculty?
- Is there a status X location interaction?
- Hypotheses
- Null
- The variables are independent
- Being a student or professor will not influence where you go to eat
- The proportions for each category of one of the variables will be equal
- Notice that the null does not ask about the main effects of each variable
- We don’t care if people in general are more likely to go to one place or another. We want to know if the proportions are the same for students and professors.
- Alternate
- The variables are related
- Being a student or professor will influence where you go to eat
- The proportions of who goes where will be different for students and professors
- Observed and Expected Frequencies
- fo = who gets counted up where
- fe calculations
- obtain all of the row and column totals
- given the number of profs and students, and the general popularity of each place, how many profs and how many students should be at each place?
- for each cell, fe = (fc fr )/n
- Calculate chi-square and df
- Critical value
- Critical chi-square =
- calculated chi-square =
- Report result
- Being a professor or student influenced where people ate, Chi-Squrare (3, n = 220) = 19.83, p < .05.
- Assumptions and Restrictions
- Independence of observations
- make sure no-one got counted twice and is represented in different categories
- Size of expected frequencies
- none of the fe cells should be less than 5
- increases the chance of Type I error
- SPSS gets grumpy with you
- Effect Size
- Cramér’s V
- Not influenced by size of n (large ns give large chi-square values, so dividing by n cancels that out)
- Different definitions for what is “large” for each df*
Behavioral
StatisticsPage
Last updated: 6/28/05