0

I have a dataset with 40+ variables, which I'm trying to convert into a long format. Below is an example of what I am working with: (I can't use the original data due to sensitivity)

w_1   = "X20200101"
x_1 = "2020-01-01 05:00:01"
x_2 = "2020-01-01 05:01:01"
x_3 = "2020-01-01 05:02:01"
x_4 = "2020-01-01 05:04:01"
y_1 = "ETI"
y_2 = "EPI"
y_3 = "AUSCULATION"
y_4 = "CPR"

df_wide <- as.data.frame(cbind(w_1,x_1, x_2, x_3,x_4,y_1,y_2, y_3,y_4))

df_wide
        w_1                 x_1                 x_2                 x_3                 x_4 y_1 y_2         y_3 y_4
1 X20200101 2020-01-01 05:00:01 2020-01-01 05:01:01 2020-01-01 05:02:01 2020-01-01 05:04:01 ETI EPI AUSCULATION CPR

This is what I'm trying to achieve:

w = c("X20200101",
      "X20200101",
      "X20200101",
      "X20200101")
x = c("2020-01-01 05:00:01",
      "2020-01-01 05:01:01",
      "2020-01-01 05:02:01",
      "2020-01-01 05:04:01")
y = c("ETI",
      "EPI",
      "AUSCULTATION",
      "CPR")

df_need <- as.data.frame(cbind(w, x, y))

        w                   x            y
1 X20200101 2020-01-01 05:00:01          ETI
2 X20200101 2020-01-01 05:01:01          EPI
3 X20200101 2020-01-01 05:02:01 AUSCULTATION
4 X20200101 2020-01-01 05:04:01          CPR

The approach that I have taken so far looks like this:


fake_df <- df_wide %>% 
  pivot_longer(cols = starts_with("x"),
               names_to = "x_t",
               values_to = "x_time",
               values_drop_na = T) %>% 
  pivot_longer(cols = starts_with("y"),
               names_to = "y_p",
               values_to = "y_proc",
               values_drop_na = T)

The above code then produces:

> fake_df
# A tibble: 16 x 5
   w_1       x_t   x_time              y_p   y_proc     
   <fct>     <chr> <fct>               <chr> <fct>      
 1 X20200101 x_1   2020-01-01 05:00:01 y_1   ETI        
 2 X20200101 x_1   2020-01-01 05:00:01 y_2   EPI        
 3 X20200101 x_1   2020-01-01 05:00:01 y_3   AUSCULATION
 4 X20200101 x_1   2020-01-01 05:00:01 y_4   CPR        
 5 X20200101 x_2   2020-01-01 05:01:01 y_1   ETI        
 6 X20200101 x_2   2020-01-01 05:01:01 y_2   EPI        
 7 X20200101 x_2   2020-01-01 05:01:01 y_3   AUSCULATION
 8 X20200101 x_2   2020-01-01 05:01:01 y_4   CPR        
 9 X20200101 x_3   2020-01-01 05:02:01 y_1   ETI        
10 X20200101 x_3   2020-01-01 05:02:01 y_2   EPI        
11 X20200101 x_3   2020-01-01 05:02:01 y_3   AUSCULATION
12 X20200101 x_3   2020-01-01 05:02:01 y_4   CPR        
13 X20200101 x_4   2020-01-01 05:04:01 y_1   ETI        
14 X20200101 x_4   2020-01-01 05:04:01 y_2   EPI        
15 X20200101 x_4   2020-01-01 05:04:01 y_3   AUSCULATION
16 X20200101 x_4   2020-01-01 05:04:01 y_4   CPR 

When I do this though, there data doesn't seem to come out right.

Marshall
  • 85
  • 7

3 Answers3

2

This question is done. But here is another way for you. I modified your data a bit. I rather constructed a data frame. I created w by repeating w_1 based on the number of columns containing "x", which is 4. I also created two columns (i.e., x and y) using select() and unlist().

library(dplyr)
library(tibble)

tibble(w = rep(df_wide$w_1, times = sum(grepl(x = names(df_wide), pattern = "x"))),
       x = unlist(select(df_wide, contains("x"))),
       y = unlist(select(df_wide, contains("y"))))

  w         x                   y          
  <chr>     <chr>               <chr>      
1 X20200101 2020-01-01 05:00:01 ETI        
2 X20200102 2019-01-01 05:00:01 EEE        
3 X20200101 2020-01-01 05:01:01 EPI        
4 X20200102 2019-01-01 05:01:01 FFF        
5 X20200101 2020-01-01 05:02:01 AUSCULATION
6 X20200102 2019-01-01 05:02:01 GGG        
7 X20200101 2020-01-01 05:04:01 CPR        
8 X20200102 2019-01-01 05:04:01 HHH  

DATA

df_wide <- structure(list(w_1 = c("X20200101", "X20200102"), x_1 = c("2020-01-01 05:00:01", 
"2019-01-01 05:00:01"), x_2 = c("2020-01-01 05:01:01", "2019-01-01 05:01:01"
), x_3 = c("2020-01-01 05:02:01", "2019-01-01 05:02:01"), x_4 = c("2020-01-01 05:04:01", 
"2019-01-01 05:04:01"), y_1 = c("ETI", "EEE"), y_2 = c("EPI", 
"FFF"), y_3 = c("AUSCULATION", "GGG"), y_4 = c("CPR", "HHH")), class = "data.frame", row.names = c(NA, 
-2L))

        w_1                 x_1                 x_2                 x_3                 x_4 y_1 y_2         y_3 y_4
1 X20200101 2020-01-01 05:00:01 2020-01-01 05:01:01 2020-01-01 05:02:01 2020-01-01 05:04:01 ETI EPI AUSCULATION CPR
2 X20200102 2019-01-01 05:00:01 2019-01-01 05:01:01 2019-01-01 05:02:01 2019-01-01 05:04:01 EEE FFF         GGG HHH
jazzurro
  • 23,179
  • 35
  • 66
  • 76
1

You need to quote the letter inside starts_with().

What you want to do is:

df_clean <- df %>%
   pivot_longer(cols           = starts_with("x"),
                names_to       = "x_t",
                values_to      = "x_time",
                values_drop_na = TRUE) %>%
   pivot_longer(cols           = starts_with("y"),
                names_to       = "y_p",
                values_to      = "y_proc",
                values_drop_na = TRUE)
prolifebel
  • 103
  • 1
  • 2
  • 6
1

try this way

library(tidyverse)
df_wide %>% pivot_longer(-w_1) %>% 
  separate(name, "name") %>% 
  group_by(name) %>% 
  mutate(id = row_number()) %>% 
  pivot_wider(c(id, w_1), names_from = name, values_from = value) %>% 
  select(-id)
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14