Hi I am trying to run a time series regression based on different counties

Question

I am trying to create a time series database containing information on different counties.

I am receiving an error where only the first three rows are returned.

I was wondering if anyone had any thoughts on this: What I am receiving

I am trying to produce with 5 different value columns

I have tried using:

TimeSeries <- ts(DataCommons1, start=c(2018), end=c(2020), frequency=1)

to no avail

The head() of my data is:

df <- data.frame(
  placeName = rep(c("Autauga County", "Baldwin County"), each = 3L),
  Date.dc.wc8q05drd74bd = rep(2018:2020, 2),
  Value.dc.wc8q05drd74bd = c(2108L, 1994L, 1768L, 6791L, 7242L, 7733L),
  Date.Count_Person_16OrMoreYears_NoHealthInsurance_NAICSEducationalServicesHealthCareSocialAssistance_Worker = c(NA, NA, NA, 2018L, 2019L, NA),
  Value.Count_Person_16OrMoreYears_NoHealthInsurance_NAICSEducationalServicesHealthCareSocialAssistance_Worker = c(NA, NA, NA, 1200.004, 1063.796, NA),
  Date.Count_Person_USCitizenByNaturalization_DateOfEntry2000To2009_ForeignBorn = rep(NA_integer_, 6L),
  Value.Count_Person_USCitizenByNaturalization_DateOfEntry2000To2009_ForeignBorn = rep(NA_real_, 6L),
  Date.dc.cyc1f26msdct4 = rep(2018:2020, 2),
  Value.dc.cyc1f26msdct4 = c(887L, 682L, 706L, 2476L, 2388L, 2795L),
  Date.Count_Household_Householder1To2OwnChildren_SingleMotherFamilyHousehold = rep(c(2018L, 2019L, NA), 2),
  Value.Count_Household_Householder1To2OwnChildren_SingleMotherFamilyHousehold = c(997L, 968L, NA, 2977L, 3076L, NA)
)

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064) — MrFlick, Apr 25 '23 at 21:26
Please read about [how to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and update your question accordingly. Include a sample of your data by pasting the output of `dput()` into your post or `dput(head())` if you have a large data frame. Also include code you have tried, any relevant errors, and expected output. If you cannot post your data, then please post code for creating representative data. — LMc, Apr 25 '23 at 21:26
Hi @MrFlick sorry for the poor formatting of the question. I have tried to rectify this in line with the rules and make it easier to view. Is this better presentation? — Will, Apr 25 '23 at 21:41
Hi @LMc is this easier to view? Thanks for any help you are able to provide and for the advice so far! — Will, Apr 25 '23 at 21:42
@Will so in your expected output how is `Value` defined from the data you have shared? Where does date come from there are a lot of variables in your data that start with "Date"? — LMc, Apr 25 '23 at 21:48
@LMc so the database is basically "county", "date", "value" and then it just repeats date and value across the columns. The dates are all 2018, 2019 and 2020 and the values are the result of surveys into the specific variable. — Will, Apr 25 '23 at 21:50
So you want to put all the `Date...` columns into one column and all the `Value...` columns into another? — LMc, Apr 25 '23 at 21:52
yeah so a column holding the 3 dates repeating for all of the counties followed by the 5 value columns. I am then hoping to use this to create a time series. My current database gets corrupted by the TS function as shown in the photo. Sorry for the slow reply I was working on the code and didn't see a notification. — Will, Apr 25 '23 at 22:01

score 0 · Accepted Answer · answered Apr 25 '23 at 22:00

Based on what you have described, you are trying to pivot your data into a long format where the columns that start with "Date" go into one column. Those that start with "Value" go into another:

library(tidyr)
library(dplyr)

df |> 
  pivot_longer(cols = -placeName,
               names_pattern = "^(Date|Value).(.*)",
               names_to = c(".value", "description")) |>
  mutate(Date = as.Date(paste0(Date, "-01-01")))

From here you can use dplyr::filter to subset your rows. It looks like you convert the county name to numeric possibly with something like as.numeric(factor(placeName)). This should get you started.

Output

   placeName      description                                         Date       Value
   <chr>          <chr>                                               <date>     <dbl>
 1 Autauga County dc.wc8q05drd74bd                                    2018-01-01  2108
 2 Autauga County Count_Person_16OrMoreYears_NoHealthInsurance_NAICS~ NA            NA
 3 Autauga County Count_Person_USCitizenByNaturalization_DateOfEntry~ NA            NA
 4 Autauga County dc.cyc1f26msdct4                                    2018-01-01   887
 5 Autauga County Count_Household_Householder1To2OwnChildren_SingleM~ 2018-01-01   997
 6 Autauga County dc.wc8q05drd74bd                                    2019-01-01  1994
 7 Autauga County Count_Person_16OrMoreYears_NoHealthInsurance_NAICS~ NA            NA
 8 Autauga County Count_Person_USCitizenByNaturalization_DateOfEntry~ NA            NA
 9 Autauga County dc.cyc1f26msdct4                                    2019-01-01   682
10 Autauga County Count_Household_Householder1To2OwnChildren_SingleM~ 2019-01-01   968
# ... with 20 more rows

Thank you! I will give this a go. And see if i can make it work. I appreciate you taking the time to look into this for me. — Will, Apr 25 '23 at 22:06

Hi I am trying to run a time series regression based on different counties

1 Answers1