Pulling data from a row up to a new column for each ID

Question

I have data in a dataframe that contains values across multiple years for different IDs. Currently, it appears as follows:

(can be recreated with the following code):

df<-data.frame(WLH_ID=c("15-7318","15-7318","15-7319","15-7319","15-7320","15-7320","15-7320"),
year=c("2017","2018","2017","2018","2017","2018","2019"),
overlap_95=c("1","1","0.626311190934023","0.968386735773874","0.713286882088087","0.824103998854928","0.451493099154607"))

I hope to reshape it to appear as follows::

Essentially I want to pull the value from the row so each year is in its own column- but keep the data in the same row as the ID if that makes sense so I can compare across each ID between years

Some IDs may have more years' worth of data than others, and in such cases, I would like the extra years to be represented as NAs or NULLs.

I am assuming this is something that can be done- just don't know where to start. I couldn't find this question already answered either but I could have worded it wrong.

Thanks in advance!

For reference, what you're trying to do is take your data from a long format (lots of rows) to a wide format (lots of columns). joran's answer down below is spot on. — Stewart Macdonald, Mar 12 '23 at 01:49

score 1 · Answer 1 · answered Mar 12 '23 at 01:45

I think maybe you just need dplyr::pivot_wider()?

> df %>%
   pivot_wider(names_from = year,values_from = overlap_95,names_prefix = "overlap_95_")

# A tibble: 3 × 4
  WLH_ID  overlap_95_2017   overlap_95_2018   overlap_95_2019  
  <chr>   <chr>             <chr>             <chr>            
1 15-7318 1                 1                 NA               
2 15-7319 0.626311190934023 0.968386735773874 NA               
3 15-7320 0.713286882088087 0.824103998854928 0.451493099154607

score 0 · Answer 2 · answered Mar 12 '23 at 02:04

One way is using pivot_wider(), following rename() your columns, and finally, replace() to replace missing values with blank cells:

library(tidyverse)

dfw <- df %>% pivot_wider(names_from = "year", values_from = "overlap_95") %>%
      rename(verlap_95Y1='2017',verlap_95Y2='2018',verlap_95Y3='2019') %>%
      replace(is.na(.), "")

The dfw should look like this:

View(dfw)

Pulling data from a row up to a new column for each ID

2 Answers2