Thursday, March 12, 2015

Significance Testing


Part 1:

1.

2. 
Asian-Long Horned Beetle
     The null hypothesis states that the number of Asian-Long Horned Beetles in Bucks County is the same as the entire state of Pennsylvania. The alternative hypothesis states that there is a difference between the number of beetles found in Bucks County and the average number of beetles for the entire state of Pennsylvania. Since we are looking at 50 fields we are going to use a z-test. Since we are using a two tailed test with a 95% confidence interval we determine that the critical value is 1.96.
   Using the z-test equation
in this case the sample mean is 3.2, the hypothesized mean is 4, the standard deviation is .73, and the number of observations is 50. Therefore, the z-statistic in this scenario is -8, which allows us to reject the null hypothesis that there is no difference between the number of beetles found in Bucks County and the entire state of Pennsylvania. This means that there is less Asian-Long Horned Beetles in Bucks County than you would expect to see. 

Emerald Ash Borer Beetle
    The null hypothesis states that the number of Emerald Ash Borer Beetles found in Bucks County is the same as the average number of beetles that should be found in each county based on the statewide average. The alternative hypothesis states that there is a difference between the number of beetles found in Bucks County compared to the statewide average that should be present in each county of Pennsylvania. Since we are using a two tailed test at 95% confidence we can calculate the critical value of 1.96. Using the same equation as above, the sample mean=11.7, the hypothesized mean=10, the standard deviation=1.3, and the number of observations=50. This gives a z-statistic of 9.4, which causes us to reject the null hypothesis. Therefore, there is a difference between the number of Emerald Ash Borer Beetles found in Bucks County compared to the number of beetles estimated for each county in Pennsylvania. Since the z-statistic is greater than the critical value we can determine that there are more Emerald Ash Borer Beetles in Bucks County.

Golden Nematode
     The null hypothesis states that the number of Golden Nematode found in Bucks County is no difference than the average number of Golden Nematode found in any other county in Pennsylvania. The alternative hypothesis states that the number of Golden Nematodes found in Bucks County is different than the average number of Golden Nematode found in any other county in Pennsylvania. Since we are using a two tailed test with 95% confidence we calculate the critical value of 1.96. Using the z-test equation above, the sample mean is 77, the hypothesized mean is 75, the standard deviation is 5.71, and the number of observations is 50. We can calculate a z-statistic of 2.47, which means we can reject the null hypothesis. This means there is a difference between the number of Golden Nematodes found in Bucks County compared to the average per county in Pennsylvania. Since the z-statistic is greater than the critical value we can determine that there are more Golden Nematodes in Bucks County.




3.
     The null hypothesis states that there is no difference between the number of people per party in 1960 and the number of people per party from the sample taken in 1985. The alternative hypothesis states that there is a difference between the number of people per party from the sample taken in 1985 compared to the number of people per party from the survey taken in 1965. Since we are using a one tailed test with 95% confidence we calculate the critical value to be 1.64. Since the number of observations is less than 30 we are going to use a t-test. Based on the t-test equation:
the sample mean is 3.4, the hypothesized mean is 2.1, the standard deviation is 1.32, and the number of observations in 25. Therefore, the t-statistic equals 4.92, which is much higher than the critical value of 1.64. This allows us to reject the null hypothesis and confirm that there is a difference between the sample collected in 1985 and the survey conducted in 1965.





Part 2:

Introduction

     There is a common debate with Wisconsin residents about what makes "Up North" different than areas in southern WI.  Northern Wisconsin is commonly defined as the area north of Highway 29, which spans from the Elk Mound all the way to Green Bay (Figure 1). It is known for its large forests, small  population, and fun outdoor recreation. In this exercise we will be exploring different variables to determine whether they are good indicators of what makes up the great Wisconsin Northwoods. In order to compare different variables we will be using SPSS to run a regression analysis on three variables, which are commonly thought to differentiate northern WI from southern WI. The three variables I will be using include: total population, number of hotel beds, and miles of funded snowmobile trails.



Figure 1  Northern WI counties and southern WI counties determined by their location compared to Highway 29 that runs east/west across the state.

Methods

As previously state, the main goal of this exercise is to determine if certain variables differentiate northern WI and southern WI. We start by assigning a value to each county based on whether it is located in northern WI or southern WI. We then join an Excel table to the counties shapefile in ArcMap. Then attribute fields are added to give each county a rank. The ranks will be based on equal interval classifications for the range of each variable. The ranks for total population are as follows: 4709-236873 are given a 1, 236874-4457278 is given a 2, 457279-701211 is given a 3, and 701212-933380 is given a 4. The population rank and the location value (1 or 2) ara then input into the Crosstabs function in SPSS. SPSS will then create 2 tables that are useful in determining whether there is a differnce between northern and southern counties. This process will then be done for the number of hotel beds and the total length of snomobile trails per county as well.

Results


     Based on the equal interval classification scheme for total population, there were only four counties in the entire state that contained higher population than the first classification of 236,873 people. These counties include Milwaukee County, Waukesha County, Dane County, and Brown County, which are all located in southern WI (Figure 2). After the population variable was plugged into SPSS, two tables were provided. Table 1 shows the expected counts and observed counts for each population classification based on their location. Table 2 shows the Chi-squared value associated with total population. Since the Chi-squared value is only 2.541 we fail to reject the null hypothesis that there is a difference between the population of northern WI and southern WI. This can also be noted with the asymptotic significance. This value is .281, which is much larger than the .05 required to reject the null hypothesis at 95% confidence.




Figure 2  Total population of counties based on an equal interval classification scheme. Only four of the 72 counties in WI contain larger populations than the first classification break.





nvs * tot_poop Crosstabulation
 
tot_poop

Total

1

2

4

nvs

1

Count

27

0

0

27

Expected Count

25.5

1.1

.4

27.0

2

Count

41

3

1

45

Expected Count

42.5

1.9

.6

45.0

Total

Count

68

3

1

72

Expected Count

68.0

3.0

1.0

72.0
Table 1  Expected and observed counts of the four population classifications for counties in northern WI vs counties in southern WI. This table is used to calculate the degrees of freedom in Table 2




Chi-Square Tests
 
Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

2.541a

2

.281

Likelihood Ratio

3.900

2

.142

Linear-by-Linear Association

1.852

1

.174

N of Valid Cases

72
  

a. 4 cells (66.7%) have expected count less than 5. The minimum expected count is .38.

Table 2  Chi-squared test that is used to determine whether the null hypothesis should be rejected. In this case, the Chi-squared value is lower than the value associated with 2 degrees of freedom, which means the null hypothesis has failed to be rejected.


     Although all of the major population centers are located in southern WI, there are also many counties that have very few people. Since the number of counties with small populations greatly outweight the 4 counties with large populations they have a bigger impact on the Chi-squared value.


     Similarly to total population per county, there is no differnce between the number of hotel beds in northern WI compared to southern WI. According to the Chi-squared chart, a 95% confidence interval with 3 degrees of freedom has a critical value of 7.81. Therefore, the Pearson's Chi-squared value of 3.871 is less and we fail to reject the null hypothesis. We can also determine whether there is a differnce in hotel beds in northern versus southern WI is the asymptotic significance value. Since we are using a 95% confidence interval the asymptotic significance value must be below .05 to say that there is a difference between the two regions. In this case the asymptotic significance value of .276 is much larger than .05, meaning we fail to reject the null hypothesis.
    


Figure 3  The number of hotel beds per county based on equal interval classification.





nvs * htl_bds Crosstabulation
 
htl_bds

Total

1

2

3

4

nvs

1

Count

23

3

1

0

27

Expected Count

23.3

2.3

.4

1.1

27.0

2

Count

39

3

0

3

45

Expected Count

38.8

3.8

.6

1.9

45.0

Total

Count

62

6

1

3

72

Expected Count

62.0

6.0

1.0

3.0

72.0
Table 3 Table showing the expected and observed count of counties that fall into each of the classifications.






Chi-Square Tests
 
Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

3.871a

3

.276

Likelihood Ratio

5.173

3

.160

Linear-by-Linear Association

.241

1

.623

N of Valid Cases

72
  

a. 6 cells (75.0%) have expected count less than 5. The minimum expected count is .38.


Table 4 The Chi-sqaure value of 3.871 is much lower than the value required to reject the null hypothesis for 3 degrees of freedom.

    
    

Finally, many Wisconsin residents consider northern WI a snowmobiler's paradise. Since there are 3 degrees of freedom at 95% convidence, the Pearson's Chi-squared value must be at least 7.81. As you can see from Table 6, The Chi-square value for snowmobile trails is 24.424, meaning that there is a significant difference between snowmobile trails in northern WI and southern WI.




Figure 4  Length of funded snowmobile trails per county based on equal interval classification.




nvs * snow_trail Crosstabulation
 
snow_trail

Total

1

2

3

4

nvs

1

Count

2

13

9

3

27

Expected Count

8.3

13.9

3.8

1.1

27.0

2

Count

20

24

1

0

45

Expected Count

13.8

23.1

6.3

1.9

45.0

Total

Count

22

37

10

3

72

Expected Count

22.0

37.0

10.0

3.0

72.0
Table 5  Observed and expected values for snowmobile trail length per county. These values are used to estimate Pearson's Chi-square value.




Chi-Square Tests
 
Value

df

Asymp. Sig. (2-sided)

Pearson Chi-Square

24.424a

3

.000

Likelihood Ratio

27.387

3

.000

Linear-by-Linear Association

22.494

1

.000

N of Valid Cases

72
  

a. 3 cells (37.5%) have expected count less than 5. The minimum expected count is 1.13.


Table 6  Pearson's Chi-square value for snowmobile trails in nothern vs souther WI. As you can see the Pearson's Chi-square value of 24.424 is much higher than the critical value of 7.81. Therefore, there is a difference between snowmobile trail length in norther and southern WI.
 


Conclusion

Based on the Chi-square values for each of the variables selected the only one that actually differentiates northern and southern WI is the total lenght of snowmobile trails per county. I had thought that population would have also been a determining factor, but based on the way the classifications were set up, the only counties that were difference in population were Milwaukee, Waukesha, Dane, and Brown Counties. I feel as if the classifications would have been set up differently total population and number of hotel beds would have been significant in differentiating between the two regions.  Even though Eau Claire contains over 70,000 people, it still showed up in the lowest population class, which gave it the same overall weight as Rusk County, which only has 14,000 residents.