0

I am doing some coding in R and have encountered an issue.

I have a data set where participants were given the same question(s) a few different times. There is one id variable, a time variable which records which instance we are dealing with, and one outcome variable.

I did a little research and found a post similar to what I am trying to do.

Turning one row into multiple rows in r

I am trying to do the exact opposite thing of what is being done in this post.

I created this small code to give an idea of what I am dealing with.

A1
id       time  x
1000     1     1
1000     2     2
1000     3     3
1000     4     4
1001     1     1
1001     2     2
1001     3     3
1001     4     4

What I need to do is reorganize the data set so that each case is on one line and I repeat every X variable multiple times (x1 would be the first time point, x2 would be the second time point, etc). Here is a sample code of what I would like the final data frame to look like.

B1
id     x1     x2     x3     x4
1000    1     2      3      4
1001    1     2      3      4

There are a few nuances in my code that make this situation really tricking. Some participants have many more x entries than other participants (some participants only have 1 or 2 different x values while others have 7 or 8). There is some missing data as well.

I have approached the problem in a few ways with no luck. I am not sure what the best way is to handle this situation. The attempts I have tried either require a lot of code, usually the same basic code repeating multiple times, or code that doesn't work. Here is what I have tried.

I tried to use a for loop. I tried to create a new variable to identify the participant by id and then identify the first time they are doing the survey, then I use the first x value. I would then repeat this for each time point(for time 2- find the second x value for a given participant, for time 3- find the third x value for a given participant, etc.). As I have currently anywhere from 1 to 10 time points, this involves a lot of for loops. Because some people don't have an 6 or 7th time, the code often doesn't run. Here is an example of the for loop I have tried.

for (i in A1$id) {
  temp.txt<- paste (
    c ("A1$x1 [A$id ==", i," & A$time == 1] <- A1$x"
    ),   collapse = "")
  eval (parse(text = temp.txt))
}

I tried to subset the data for each time point, then merge the data together at the end. If I try this, I have missing data, and I also encounter issues with variables names no longer being accepted (I think because the names are similar, R has an issue with renaming everything). Here is an example of what that code looks like.

t1 <- subset (A1, A$time == 1)
t2 <- subset (A1, A$time == 2)
t3 <- subset (A1, A$time == 3)
t4 <- subset (A1, A$time == 4)

Z1 <- merge (t1, t2, by = "id")
Z2 <- merge (Z1, t3, by = "id")
Z3 <- merge (Z2, t4, by = "id")

Is there a different/easier way to approach this issue? Thanks, I really appreciate it.

  • Thanks. That worked perfectly. In case anyone else has this issue, here is the documentation on reshape () https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/reshape – Eric Boorman Mar 26 '21 at 18:42
  • Can you provide a better example and the corresponding output. The one in the question is ambiguous since the x and time columns are the same so we can't really tell which is which. – G. Grothendieck Mar 26 '21 at 18:46

2 Answers2

2

1) reshape This is referred to as converting long form to wide form. In base R we can use reshape giving the following data frame. Note that reshape assumes that if there are columns named id and time then those are the id and time columns but if they had been named something else we would have had to specify them using the appropriate reshape arguments.

reshape(DF, dir = "wide")
##     id x.1 x.2 x.3 x.4
## 1 1000   1   2   3   4
## 5 1001   1   2   3   4

2) xtabs Another base R solution is to use xtabs which gives the following table object:

xtabs(x ~ ., DF)
##       time
## id     1 2 3 4
##   1000 1 2 3 4
##   1001 1 2 3 4

3) tapply or tapply which gives this matrix:

with(DF, tapply(x, list(id, time), c))
##      1 2 3 4
## 1000 1 2 3 4
## 1001 1 2 3 4

4) pivot wider The tidyr package has pivot_wider to do this:

library(tidyr)

pivot_wider(DF, names_from = "time", values_from = x)
## # A tibble: 2 x 5
##      id   `1`   `2`   `3`   `4`
##   <int> <int> <int> <int> <int>
## 1  1000     1     2     3     4
## 2  1001     1     2     3     4

Note

The input in reproducible form:

Lines <- "id       time  x
1000     1     1
1000     2     2
1000     3     3
1000     4     4
1001     1     1
1001     2     2
1001     3     3
1001     4     4"
DF <- read.table(text = Lines, header = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Using data.table you can try

library(data.table)
setDT(A1) #Converting into data.table
result <- dcast(A1, id~x, value.var= "time") #long to wide conversion
names(result)[-1]<- paste0("x.",names(result)[-1]) #setting the names accordingly
result #your result

     id x.1 x.2 x.3 x.4
1: 1000   1   2   3   4
2: 1001   1   2   3   4
Chriss Paul
  • 1,101
  • 6
  • 19