Create new variables based on multiple columns if value in one column match part of variable name

Question

I'm quite new to R, so please be gentle.

I have a dataset that contains multiple columns that are repeated measures for several years. I have and ID-variable that identify the specific individual per row that these measures relate to.

I also have a variable that include a year per row.

So my data looks something like this

ID	Y2005_N0X	Y2006_N0X	Y2007_N0X	Y2008_N0X	Y2005_N06	Y2006_N06	Y2007_N06	Y2008_N0X	YEAR
1	5	6	7	8	9	94	29	69	2005
2	6	7	8	9	9	39	59	39	2007

and so forth...

I want to create a new column that takes the value from 2005 for ID 1, and the value from 2007 from ID 2, and names this new column prioryear_N0X.

I have several of these columns so I want to create a command that generate multiple new columns based on those conditions, so that I get a new column called prioryear_N1X, prioryear_N06, and so forth. It is also crucial that the ID column are intact.

After this I also want to create new columns called thisyear_N0X and so forth which takes the value from YEAR+1 (for example 2004+1=2005) and match to the present variables called Y2005_N0X and so forth.

ID	Y2005_N0X	Y2006_N0X	Y2007_N0X	Y2008_N0X	Y2005_N06	Y2006_N06	Y2007_N06	Y2008_N0X	YEAR	prioryear_N0X	prioryear_N06	thisyear_N0X	thisyear_N06
1	5	6	7	8	9	94	29	69	2005	5	9	6	94
2	6	7	8	9	9	39	59	39	2007	8	59	9	39

Suggestions?

How do I use the command if I have variables that also starts with R?

I have tried the following, but receive an error.

df %>%
pivot_longer(cols = -c(ID, YEAR), names_to = c("header_year", ".value"), names_pattern = "Y(\\d+)_(\\w+)", names_transform = list(header_year = as.integer)) %>%
group_by(ID) %>%
mutate(across(starts_with("N", "R"), function(x) { x[YEAR == header_year] }, .names = "prioryear_{.col}"),
across(starts_with("N", "R"), function(x) { x[YEAR + 1 == header_year] }, .names = "thisyear_{.col}")) %>%
pivot_wider(id_cols = c(ID, YEAR, starts_with("prioryear"), starts_with("thisyear")), names_from = header_year, values_from = starts_with("N", "R"))

Welcome to SO! It might help if you provide a table of what your result should look like in the end, with your new columns added. Please see [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example. — Ben, Mar 03 '22 at 20:29
Thank you! I will try to provide a table and update my post! — T Kalin, Mar 04 '22 at 07:33
I'm aware of the function in SPSS where I can write one command for each column like this IF (YEAR= 2005) prioryear_N06=Y2005_N06. EXECUTE. However, it takes an extremely long time to create commands for roughly 200 variables per year, so I'm thinking that R can do this smoother. — T Kalin, Mar 04 '22 at 10:03
I'm sorry, it still is not clear to me. It would help to have a complete reproducible example. For your final table, I would fill in "(number)" with the actual value that belongs there. If you need to add more example data, would recommend doing so. Also, all the columns should be present you need. Is "up to Y2019" an actual column? Or does this refer to more columns that have been left out, increasing by 1 year, up to 2019? — Ben, Mar 04 '22 at 12:35
Also, in your original data table, you have columns such as "Y2005_N0X" - but those columns are not appearing in your final end result table. Was that intentional? Since you're adding columns, shouldn't your original columns be present? In addition, you might want to read about [tidy data](https://r4ds.had.co.nz/tidy-data.html). It might be easier to manipulate in "long" form using something like `pivot_longer`. However, other strategies can create new columns in tidyverse: e.g.,[these](https://stackoverflow.com/questions/70096683/mutate-across-multiple-columns-to-create-new-variable-sets). — Ben, Mar 04 '22 at 12:39
Thanks! I have tried to update it now to be more clear on how I'm thinking! — T Kalin, Mar 04 '22 at 13:22

score 0 · Accepted Answer · answered Mar 04 '22 at 18:54

Here is one approach using tidyverse. You can use pivot_longer to put into long form, and then mutate to dynamically create new columns based on those start with "N". This includes two new columns, one where the header year matches the YEAR column, and one where it matches YEAR + 1. Finally, you can put your data into wide format so resembles your desired table.

library(tidyverse)

df %>%
  pivot_longer(cols = -c(ID, YEAR), names_to = c("header_year", ".value"), names_pattern = "Y(\\d+)_(\\w+)", names_transform = list(header_year = as.integer)) %>%
  group_by(ID) %>%
  mutate(across(starts_with("N"), function(x) { x[YEAR == header_year] }, .names = "prioryear_{.col}"),
         across(starts_with("N"), function(x) { x[YEAR + 1 == header_year] }, .names = "thisyear_{.col}")) %>%
  pivot_wider(id_cols = c(ID, YEAR, starts_with("prioryear"), starts_with("thisyear")), names_from = header_year, values_from = starts_with("N"))

Output

     ID  YEAR prioryear_N0X prioryear_N06 thisyear_N0X thisyear_N06 N0X_2005 N0X_2006 N0X_2007 N0X_2008 N06_2005 N06_2006 N06_2007 N06_2008
  <int> <int>         <int>         <int>        <int>        <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
1     1  2005             5             9            6           94        5        6        7        8        9       94       29       69
2     2  2007             8            59            9           39        6        7        8        9        9       39       59       39

score 0 · Answer 2 · answered Mar 07 '22 at 09:33

I think I solved it. I had missed that I had to include c when using multiple matches.

df %>%
    pivot_longer(cols = -c(ID, YEAR), names_to = c("header_year", ".value"), names_pattern = "Y(\\d+)_(\\w+)", names_transform = list(header_year = as.integer)) %>%
    group_by(ID) %>%
    mutate(across(starts_with(c("N", "R")), function(x) { x[YEAR == header_year] }, .names = "prioryear_{.col}"),
           across(starts_with(c("N", "R")), function(x) { x[YEAR + 1 == header_year] }, .names = "thisyear_{.col}")) %>%
    pivot_wider(id_cols = c(ID, YEAR, starts_with("prioryear"), starts_with("thisyear")), names_from = header_year, values_from = starts_with(c("N", "R")))

Create new variables based on multiple columns if value in one column match part of variable name

2 Answers2