Thursday, April 2, 2015

Correlation and Spatial Autocorrelation

Part 1: Correlation

1.

In this portion of the exercise we are looking at the correlation between distance and sound level. From the given table (Figure 1) we are able to create a scatterplot to better visualize the connection between distance and sound level (Figure 2). After the scatterplot is created we are going to import the table into SPSS and create a Pearson's Correlation matrix (Figure 3).



Figure 1  Excel table with the distance compared to sound level.


Figure 2  Scatterplot showing the relationship between distance and sound level.






Correlations
 
distance ft

sound level dB

distance ft

Pearson Correlation

1

-.896**

Sig. (2-tailed)
 
.000

N

10

10

sound level dB

Pearson Correlation

-.896**

1

Sig. (2-tailed)

.000
 

N

10

10

**. Correlation is significant at the 0.01 level (2-tailed).

 
Figure 3  Pearson's correlation matrix showing the correlation
between distance and sound level. As you can see from the Pearson
Correlation value of -.896 there is a strong negative correlation.


 
In this example the null hypothesis states that there is no linear relationship between distance and sound level. The alternative hypothesis states that there is a linear relationship between distance and sound level.
As you can see from the correlation matrix there is a very high negative correlation between distance and sound level of -.896, meaning that the farther away from the source the lower the sound level will be. This r value of -.896 is significant to the .01 significance level. This means that the null hypothesis is rejected, and that there is a correlation between distance and sound level.



2.

In this section we are given census tract data for Milwuakee County, WI. This data includes percent white, percent black, percent Hispanic, percent with no high school diploma, percent with a Bachelors Degree, percent below the poverty line, and percent that walk to work.



Correlations
 
PerWhite

PerBlack

PerHis

NO_HS

BS

BELOW_POVE

Walk

PerWhite

Pearson Correlation

1

-.887**

-.218**

-.532**

.650**

-.767**

.028

Sig. (2-tailed)
 
.000

.000

.000

.000

.000

.630

N

307

307

307

307

307

307

306

PerBlack

Pearson Correlation

-.887**

1

-.246**

.171**

-.503**

.668**

-.050

Sig. (2-tailed)

.000
 
.000

.003

.000

.000

.386

N

307

307

307

307

307

307

306

PerHis

Pearson Correlation

-.218**

-.246**

1

.759**

-.320**

.182**

.029

Sig. (2-tailed)

.000

.000
 
.000

.000

.001

.616

N

307

307

307

307

307

307

306

NO_HS

Pearson Correlation

-.532**

.171**

.759**

1

-.559**

.501**

.050

Sig. (2-tailed)

.000

.003

.000
 
.000

.000

.384

N

307

307

307

307

307

307

306

BS

Pearson Correlation

.650**

-.503**

-.320**

-.559**

1

-.521**

.081

Sig. (2-tailed)

.000

.000

.000

.000
 
.000

.157

N

307

307

307

307

307

307

306

BELOW_POVE

Pearson Correlation

-.767**

.668**

.182**

.501**

-.521**

1

.354**

Sig. (2-tailed)

.000

.000

.001

.000

.000
 
.000

N

307

307

307

307

307

307

306

Walk

Pearson Correlation

.028

-.050

.029

.050

.081

.354**

1

Sig. (2-tailed)

.630

.386

.616

.384

.157

.000
 

N

306

306

306

306

306

306

306

**. Correlation is significant at the 0.01 level (2-tailed).


Figure 4  Correlation matrix created based on the Milwaukee County data.

 

As you can see from the correlation matrix above there are several strong correlations in the data (Figure 4). First off, it is obvious to state that as the percent of people with no high school diploma increase, the percent of people with a Bachelor's Degree is going to decrease. There is also going to be an increase in people below the poverty line and an increase in the percent of people who walk to work. Unfortunately, there is a really strong relationship between percent Hispanic and percent with no high school diploma. There is also a positive correlation between percent black and percent with no high school diploma. On the other hand, there is a strong positive correlation between percent white and percent with a Bachelor's Degree.


Part 2: Spatial Autocorrelation



Introduction

   In order to determine clustering patterns of voting habits throughout the state of Texas, it is important to look at the spatial autocorrelation of percent democratic vote and the percent of voter turnout. This exercise aims to determine where clustering of the two variables exists and how it has changed between 1980 and 2008.


Methods

    While most of the data is provided by the Texas Election Commission, the percent Hispanic population per county must be downloaded from the U.S. Census Bureau. The voting pattern table must then be joined to the Texas county shapefile. After the table is joined it must be exported as a .dbf file and imported into GeoDa.
    The first step to determine the clustering patterns of Texas voting patterns is to create a Moran's I scatterplot for each variable provided. The Moran's I scatterplot, determined by Equation 1, is a visual representation comparing the value of a variable of one county compared to the values of surrounding counties.

    Eq. 1

The Moran's I scatterplot also provides a value that is a good indicator of spatial autocorrelation. If the Moran's I value is close to 0, there is no spatial autocorrelation with that individual variable, meaning the variable is randomly occurring. Although the Moran's I value tells us how much clustering there is, a map must be created showing where the spatial autocorrelation is occurring. This can be done using the Univariate LISA tool in GeoDa.

    Finally, in order to determine the strength of relationships between variables a correlation matrix must be created using SPSS. The correlation matrix compares each variable together and provides a Pearson's Correlation value. This value is important for determining the strength and direction in which variables interact.

Results

As you can see from the scatterplot below, a Moran's I value of .6957 shows that there is significant clustering of Hispanic populations in the state of Texas. This means that there is large clustering of Hispanic people, as well as areas where non-Hispanics are clustered. Figure X, shows the exact location of where there is clustering of Hispanic populations. As one would expect, the counties in southern Texas contain large number of Hispanic people and counties in northern Texas contain a smaller percent of Hispanic people.



 
 


 
 
 
    Based on the Moran's I value of .5752, there is a large amount of clustering of democratic vote in Texas in 1980. In southern Texas there are a lot of counties that contain a large number of democratic vote. There is also a large clustering of democratic vote in eastern Texas. In contrast, there is clustering of counties with low democratic vote in northern and western Texas.


 


 
    Compared to 1980, there is more clustering of democratic vote in 2008. There is clustering of high democratic vote in southern and western Texas, while there is high clustering of low democratic vote on northern and north central Texas. The clustering of high democratic vote also seems to correlate with areas that have high Hispanic clustering.

 
 


   There is also clustering of voter turnout in 1980, indicated by the Moran's I value of .4681. It appears that areas that had clustering of high democratic vote also had low clustering of voter turnout. In contrast, areas with clustering of low democratic vote have high clustering of voter turnout.



    Compared to the voter turnout in 1980, there is less clustering of voter turnout in 2008, as indicated by the Moran's I value of .3634. Similarly, there is a correlation between clustering of high democratic vote and clustering of low voter turnout and a correlation between clustering of low democratic vote and clustering of high voter turnout.




    Although the local indicators of spatial autocorrelation show areas that have high spatial autocorrelation, they do not show the strength of the correlation between the two variables. In order to determine strength and direction of correlation between the variables above, it is necessary to create a correlation matrix (Figure ). In this correlation matrix there are some obvious trends that affect voting results. First, the correlation coefficient between percent Hispanic and percent democratic vote increased significantly between 1980 and 2008. This means that areas with high Hispanic populations are more likely to vote democratic. However, there is a negative correlation between percent Hispanic and voter turnout, in both 1980 and 2008, meaning that the counties with high Hispanic populations have lower voter turnout.
 
    Another strong negative correlation exists between voter turnout and percent of democratic vote exists, meaning that as voter turnout increases the percent of democratic vote decreases in both 1980 and 2008.



Correlations
 
hd02_s114

Pres80D

Pres08D

vtp80

vtp08

hd02_s114

Pearson Correlation

1

.093

.669**

-.407**

-.668**

Sig. (2-tailed)
 
.139

.000

.000

.000

N

254

254

254

254

254

Pres80D

Pearson Correlation

.093

1

.540**

-.612**

-.484**

Sig. (2-tailed)

.139
 
.000

.000

.000

N

254

254

254

254

254

Pres08D

Pearson Correlation

.669**

.540**

1

-.600**

-.604**

Sig. (2-tailed)

.000

.000
 
.000

.000

N

254

254

254

254

254

vtp80

Pearson Correlation

-.407**

-.612**

-.600**

1

.664**

Sig. (2-tailed)

.000

.000

.000
 
.000

N

254

254

254

254

254

vtp08

Pearson Correlation

-.668**

-.484**

-.604**

.664**

1

Sig. (2-tailed)

.000

.000

.000

.000
 

N

254

254

254

254

254

**. Correlation is significant at the 0.01 level (2-tailed).



 


Conclusion

    In conclusion, there is definite clustering of voting results and voter turnouts throughout the state of Texas. There has also been some slight changes in the clustering patters of both results and voter turnout between 1980 and 2008. The only area that had constant clustering of democratic vote and voter turnout between 1980 and 2008 is the southern tip or Texas. This area has had a large percent of democratic vote and a low voter turnout in both 1980 and 2008.


No comments:

Post a Comment