1

I am new to R and I am trying to build my first regression model. However, I am struggling with transforming my data.

I have my data organized in the following format:

resp_id  task_id  alt  A_1  B_1   C_1   D_1   E_1
1        25       1    3    0.4   0.15  0     0
1        25       2    2    0.7   0.05  0.05  0
1        26       1    1    0.4   0     0     0
1        26       2    3    0.4   0.05  0.1   0.05

I am looking for a way to transform my data from format above to the format below:

resp_id  task_id  alt  A_1  B_1   C_1   D_1   E_1   A_2  B_2  C_2  D_2  E_2
1        25       1    3    0.4   0.15  0     0     2    0.7  0.05 0.05 0
1        26       1    1    0.4   0     0     0     3    0.4  0.05 0.1  0.05 

Conceptually I understand that I need to loop through each row until we get to the column 'alt' with value 2. Then all all next column values in that row need to be copied as new columns to the row before and the row from which the values are copied needs to be deleted.

I looked at ways to get from a long dataset to a wide dataset in R, but I couldn't manage to transform my dataset to what I want.

Given my lack of programming experience, could someone help me out?

Martijn
  • 27
  • 5

2 Answers2

2

This is a job for pivot_wider from the tidyr package:

library(tidyverse)

df %>%
  # remove the existing suffix and instead use alt to enumerate the columns
  rename_at(vars(A_1:E_1), ~gsub("_[0-9]*$", "", .)) %>%
  pivot_wider(names_from = alt, values_from = A:E)

Result:

# A tibble: 2 x 12
  resp_id task_id   A_1   A_2   B_1   B_2   C_1   C_2   D_1   D_2   E_1   E_2
    <int>   <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1       1      25     3     2   0.4   0.7  0.15  0.05     0  0.05     0  0   
2       1      26     1     3   0.4   0.4  0     0.05     0  0.1      0  0.05
Aron Strandberg
  • 3,040
  • 9
  • 15
1

Some time ago I was struggling with such transformations too. I'm always looking for the simplest solution. In this case I'd like to recommend reshape

reshape(df, direction = "wide", timevar = "alt", idvar = "task_id", sep = "")

  task_id resp_id1 A_11 B_11 C_11 D_11 E_11 resp_id2 A_12 B_12 C_12 D_12 E_12
1      25        1    3  0.4 0.15    0    0        1    2  0.7 0.05 0.05 0.00
3      26        1    1  0.4 0.00    0    0        1    3  0.4 0.05 0.10 0.05
Adamm
  • 2,150
  • 22
  • 30
  • Thank you both for your suggestions. I managed to get both solutions working, thank you so much! – Martijn Feb 13 '20 at 16:52