1

I have two different data sets, one that has the annual unemployment rate by state (listed under a single column) and the second data set has the minimum wage for each state. Both have only have data between 2003-2020.

The problem is

  1. They are in different data sets
  2. The X variable (minimum wage) spans over 17 different columns

Questions

  1. How can I regress data from 2 different data sets
  2. How can I regress 17 columns without having to type minwage$2003 + minwage$2004 + . . . + minwage$2020

I tried this, but again, it's very inefficient.

unemp_minwage <- lm(unemployment_03_20$`U-3` ~ minwage$`2003` + minwage$`2004` + minwage$`2005` + minwage$`2006` + minwage$`2007` + minwage$`2008` + minwage$`2009` + minwage$`2010` + minwage$`2011` + minwage$`2012` + minwage$`2013` + minwage$`2014` + minwage$`2015` + minwage$`2016` + minwage$`2017` + minwage$`2018` + minwage$`2019` + minwage$`2020`)

Not to mention I got this error code: Error in model.frame.default(formula = unemployment_03_20$U-3 ~ minwage$2003 + : variable lengths differ (found for 'minwage$2003')

Then I tried just regressing on one year of minimum wage, but got a similar error.

Suggestions?

bandcar
  • 649
  • 4
  • 11

1 Answers1

1

To get the exact formula in your question:

as.formula(paste("unemployment_03_20$`U-3` ~", paste(paste0("minwage$`", 2003:2020, "`"), collapse = " + ")))

So you can do something like this (for clarity):

model <- as.formula(paste("unemployment_03_20$`U-3` ~", paste(paste0("minwage$`", 2003:2020, "`"), collapse = " + ")))

unemp_minwage2015 <- lm(model)

I strongly suggest merging the data first so you don't inadvertently make an error, and then supplying lm() with that data (rather than individual vectors from multiple datasets.

Brigadeiro
  • 2,649
  • 13
  • 30
  • I got this error Error in model.frame.default(formula = model, drop.unused.levels = TRUE) : variable lengths differ (found for 'minwage$`2003`') How do I merge the data? – bandcar May 07 '21 at 03:29
  • Is there a way to regress multiple columns without having to use the paste function first and then regressing? – bandcar May 07 '21 at 03:31
  • @ihaveaquestion you are getting that error because your DV (U-3) is a longer vector than your IVs. This is likely a huge problem, because your cases aren't mapping onto each other in the way you would expect. See the `merge` function to learn how to merge data. – Brigadeiro May 07 '21 at 03:46
  • @ihaveaquestion There are other ways. For example, if all you have in the dataset you pass to the data argument of lm() is your DV and your IVs, you can do `lm(DV ~ ., data = myDataset)` and you'll get what you're looking for. – Brigadeiro May 07 '21 at 03:47