Tuesday, April 28, 2015

Regression Analysis

Part 1:


Question: A study on crime rates and poverty was conducted for Town X.  The local news station got a hold of some data and made a claim that as the number of kids that get free lunches increase so does crime.  You feel this might a bit silly so you run a regression equation with the given data. Using SPSS to do you regression, determine if the news is correct?    What percentage of persons will get free lunch with a crime rate of 79.7?  How confident are you in your results?  Provide the data and a response addressing these questions.    

In this particular scenario, the null hypothesis states that there is no linear relationship between free lunches and crime rate.
On the other hand, the alternate hypothesis states that there is a linear relationship between the number of free lunches given out at schools and the crime rate. 


Figure 1 shows the results of the linear regression performed in SPSS. Using the Unstarndardized Coefficient values we are able to determine the regression equation. In this case the equation constant is equal to 21.819 and the regression coefficient is equal to 1.685. Therefore, based on the data provided, there does appear to be a positive linear relationship between between number of free lunches and crime rate. 

In order to determine the number of people who will be getting a free lunch if there is a crime rate of 79.7 we need to solve the regression equation for x as shown below.
y=a+bx
where,
y=79.7
a=21.819
b=1.685
So,
79.7=21.819+1.685x
and,
x= 34.351


Therefore, if there is a crime rate of 79.7 the expected number of people getting free lunches will be 34.351.


Figure 1  Results of the linear regression performed in SPSS. 



    Although coefficient of determination (R-squared) is extremely low in this case, which signifies that the model is not a good predictor of the what is actually occurring. However, when plotting the data on a scatterplot (Figure 2) one can see that the majority of have small residuals. There is one exception that causes the coefficient of determination to be be so small. If that particular sample had not been collected, or if more samples had been collected in total, the affects of that one outlier would not have played such a role in reducing the R-squared value (Central Limit Theorem). 

Figure 2  Scatterplot showing the residuals, strength of the model, and the regression equation of the trend line. 


Part 2:

Introduction

    It is important to determine where students are attending school and the factors that aide in their decision. This can be used by each university to try to increase attendance in certain areas. This exercise is aimed at determining significant variables that affect attendance at UW-Eau Claire and UW-Platteville. The main variables that are used include population/distance from each university, percent of each county with a BS degree, and median household income. One would think that all three of these variables play a large role in determining where students will choose to go to college. Significance or each variable is determined by running linear regression of the variables for the two schools. The residuals for each significant variable will then be mapped to show which areas have higher then expected attendance.


Methods

    Regression analysis will be used to determine which variables play a significant role in determining where students go for their undergraduate education. In this exercise several variables are provided. The variables provided include, the population attending each University of Wisconsin branch from each county, the distance of each county from the individual universities, the total population of each county, the percent population with a BS degree for each county, the number of residents between the ages of 18-24 per county, and the median household income for each county.

    The first in the regression analysis is to normalize the population data. This is done by dividing the total number of residents per county by the distance away from the individual university. This process will remove any bias for counties will large populations. Next, the regression analysis will be performed in SPSS. For two different schools (UW-Eau Claire and UW-Platteville), the population attending the univeristy will be set as the dependent variable and regression will be run for the population/distance, percent BS degree, and median household income variables (totaling six). This regression will provide details about which variables are significant at the 95% confidence level (Figure 1-6).

   After determining which variables play a significant role in university enrollment, another regression analysis will be performed. This time, the values of the residuals will be added to the tabular data. The table will then be exported into ArcMap. Finally, the residuals for the four significant variables will be mapped (Figure 7-10). 


Results

   After the regression analysis was performed on the three variables for each school, the variables that play a significant role are obvious. For the University of Wisconsin-Eau Claire, median household income has a significant value of .104 and is therefore not statistically significant based on a 95% confidence level (Figure 1). The percent population of each county with a BS degree does play a significant role in attendance at UWEC with a significance value of .003 (Figure 2). Finally, population/distance is highly significant in determining the enrollment at UWEC (Figure 3).


Figure 1  Regression analysis results based on the number of students who attend UWEC from each county based on the median household income of each county.


Figure 2  Regression results for the number of students attending UWEC from each county based on the percent population of each county with a BS degree.


Figure 3  Regression results for the number of students who attend UWEC from each county based on the population/distance of each county away from UWEC. 



   UW-Platteville has similar results to UWEC. For UW-Platteville, the students median household income do not play a significant role in determining where the students attend college, based on 95% confidence level. If the results were based on a 90% confidence, median household income would be a significant factor (figure 4). The other two variables examined do, however, play a significant role based on the 95% confidence. The percent of population with a BS degree has a significance value of .019 which means that the percent of population in each county does play a role on attendance at UW-Platteville (Figure 5). Also, population/distance has a significance value of .000 which means that it is highly significant in determining whether or not students will attend UW-Platteville (Figure 6).
Figure 4  Regression results for the number of students who attend UW-Platteville from each county based on that counties median household income.


Figure 5  Regression results for the number of students who attend UW-Platteville from each county based on the percent of each counties population with a BS degree.


Figure 6  Regression results for the number of students who attend UW-Platteville from each county based on the population/distance of each county from the university.



   After determining which variables play a significant role in determining whether students will attend a specific university it is important to create a graphic representation to how each county .This is done by looking at the residuals for each county based on the regression equation for each variable. In Figure 7, areas in Marathon County, Brown County, Dane County, and Waukesha County all have higher attendance at UWEC compared to what is expected based on population/distance. On the other hand, there are a lot of different counties that have a smaller representation at UWEC than expected based on population distance. Overall, this map shows areas that have high representation at UWEC even though they are located farther away from UWEC.

Figure 7  Attendance at UWEC shown as residuals based on the regression equation for population/distance from UWEC.


   Figure 8 represents the residuals associated with attendance at UWEC per county based on percent of that county with a BS degree. Eau Claire County has a much higher representation at UWEC compared to what would be expected for the percent of population with a BS degree. Similarly, counties surrounding Eau Claire County also have higher number of attendees at UWEC compared to what is expected. This is possibly caused by the lack of willingness for people in small towns with few BS degrees to travel farther distances to go to different schools. 

Figure 8 Representation of residuals for attendance at UWEC based on expected number of attendees and percent of each county with BS degrees. 

While UWEC has a large number of attendees from counties far away from Eau Claire County, UW-Platteville has very few people travelling far distance to attend. As you can see from Figure 9, there is only three counties that have more attendees than expected based on the regression equation. In addition, most of the counties far away from UW-Platteville have lower attendance per county than would be expected. This means that people aren't willing to travel far distances to attend UW-Platteville.


Figure 9  Mapping residuals associated with expected attendance based on population/distance from UW-Platteville. 

Similarly to population/distance, UW-Platteville has a higher attendance in counties adjacent to UW-Plattevill based on the percent of each county with a BS degree than expected. Areas farther away from UW-Platteville have a lower representation at the university compared to what is expected based on the regression equation. From this information, counties that have higher percent with BS degree are not likely to have high attendance at UW-Platteville unless you are located really close to the university.

Figure 10  Mapping residuals for expected attendance at UW-Platteville for each county based on the percent with BS degree.


Conclusion

   Overall, it appears that the quality of school is a large indicator of how far people will travel to attend. Since UW-Eau Claire is a much better school than UW-Platteville people are willing to travel from all over the state. On the other hand, UW-Platteville is not a very good school so people aren't willing to travel as far. When looking at the percent of population of each county with a BS degree, both UWEC and UW-Platteville have higher than expected attendance from counties adjacent to the universities. This means that people who come from areas with low a low BS degree percentage are less likely to travel farther distances to go to college.









No comments:

Post a Comment