I'm trying automate the process of running hundreds of different regression analyses within the same table, based on different fields. I'm using the lm
function.
My data has a number of different elections results at the county level. I would like to compare the support for every candidate vs the percentage of voters over the age of 65 years old to see if there's a relationship between the two variables. For example "Is there a relationship between the number of older voters in a county and their support for candidate x?" I have hundreds or different elections - each with multiple candidates - and hundreds of counties for each election. I would like to run regression analysis for each candidate, in every race, for every county; and export a table that gives the slope and intercept for each analysis.
My input table is
contest county candidate percent_over_65 percent_support
1 1 1 .65 .44
1 1 2 .65 .34
1 1 3 .65 .22
1 2 1 .70 .60
1 2 2 .70 .30
1 2 3 .70 .10
2 1 4 .65 .70
2 1 5 .65 .30
2 2 4 .70 .60
2 2 5 .70 .40
My ideal output would be something like:
contest county candidate slope_value intercept_value
1 1 1 .05 .65
1 1 2 -.01 .23
1 1 3 .02 .17
1 2 1 .25 .36
1 2 2 .15 .45
1 2 3 -.02 .12
2 1 4 .75 .33
2 1 5 -.10 .18
This question and the answer by Hadley towards the bottom with 57 upvotes was very helpful; he used the plyr function that "deconstructed" the process once; But now, I essentially want to nest another plyr function within the original function (if that makes sense). It seems like I could add a couple of for-loops to the mix to get the desired result, but I haven't been able to figure it out. Any help would be much appreciated. Thanks!