-4

I am trying to reshape a large dataframe (34645 x 11619) from wide to long. I would like to reshape the years 99 to 16. This means I have variables such as "edu99", "edu00", ... "edu16" or variables such as "p99d61", "p00d61", ..., "p16d61". The year string is not always on the same position.

Is there a way, to tell R to look for the the year strings "99-16" in the variable names when reshaping? (of course given that the string numbers uniquely identify the year).

Or in general, are there efficient strategy to reshape a big dataset?

Thank you so much for your help!

Best, Patrick

1 Answers1

0

I would use tidyr instead of reshape for this one:

  1. wide to long using gather() function: https://tidyr.tidyverse.org/reference/gather.html
  2. extract the year using extract() function: https://tidyr.tidyverse.org/reference/extract.html

You can use this regex for step 2 to extract the year "(99|0[0-9]|1[1-9])". It selects any digits-pair equal to 99, between 00 and 09 or between 10 and 19.

extract(<long_data_name>, <column_name>, <string_name_of_result_column>, regex = "(99|0[0-9]|1[1-9])", remove = TRUE)

byouness
  • 1,746
  • 2
  • 24
  • 41