1

Currently my data looks like this:

wide.df <- read.table(header = T, sep = ",", text = "
ID, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain, score, group
100, 18 , 4, 29, 30, 40, 0
101, 19,  7, 33, 40, 29, 0
103, 19, 19, 22, 30, 33, 0
200, 29, 30, 22, 33, 11, 1
233, 100, 33, 22, 44, 55, 1")

I need to convert my data into long format to look something like this:

ID  group  left.or.right  mid.or.lat    brain     score
100   0          0             0           29        40   # 0 = left, 0=lat 
100   0          1             0           30        40   # 1 = right, 0=lat
100   0          0             1           18        40   # 0 = left, 1 = mid
100   0          1             1            4        40   # 1 = right, 1 = mid
101   0          0             0           33        29   # 0 = left, 0 = lat
.
.
.
.
.
233   1           1            1            33        55   # 1= right, 1= mid

Where left.mid.brain , right.mid.brain , left.lat.brain, right.lat.brain are changed into factors but their values are still kept and the each participant has four lines each.

neilfws
  • 32,751
  • 5
  • 50
  • 63
crich
  • 99
  • 3
  • 8

2 Answers2

3

The tidyverse (specifically the dplyr and tidyr packages) are very good at operations like this:

library(tidyverse)

long.df <- wide.df %>% 
  gather(variable, brain, left.mid.brain, right.mid.brain, left.lat.brain, right.lat.brain) %>% 
  mutate(
    left.or.right = ifelse(grepl('left', variable), 0, 1),
    mid.or.lat = ifelse(grepl('lat', variable), 0, 1)
  ) %>% 
  select(ID, group, left.or.right, mid.or.lat, brain, score) %>% 
  arrange(ID)

    ID group left.or.right mid.or.lat brain score
1  100     0             0          1    18    40
2  100     0             1          1     4    40
3  100     0             0          0    29    40
4  100     0             1          0    30    40
5  101     0             0          1    19    29
6  101     0             1          1     7    29
7  101     0             0          0    33    29
8  101     0             1          0    40    29
9  103     0             0          1    19    33
10 103     0             1          1    19    33
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • 1
    Since `grepl` returns logical values, you can just convert those to numeric instead of assigning manually in `ifelse`: `left.or.right = as.numeric(grepl("left", variable))` – camille Aug 29 '19 at 13:21
1

Another dplyr/tidyr-based method that should scale fairly well. After you create long-shaped data, you'll have columns with values like "right.mid.brain" that you want to split into "right" and "mid"dplyr::separate does this easily, splitting on "\\." and avoiding too much hardcoding. It sticks you with a dummy column, which I'll drop later.

At that point, you'll have this:

library(dplyr)
library(tidyr)

# 0 = left, 0 = lat 
wide %>%
  gather(key, value = brain, -ID, -score, -group) %>%
  separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
  head()
#>    ID score group left.or.right mid.or.lat dummy brain
#> 1 100    40     0          left        mid brain    18
#> 2 101    29     0          left        mid brain    19
#> 3 103    33     0          left        mid brain    19
#> 4 200    11     1          left        mid brain    29
#> 5 233    55     1          left        mid brain   100
#> 6 100    40     0         right        mid brain     4

If you had more complicated recoding you needed to do, you could use some forcats functions to recode factor levels. In this case, it's simple enough to just convert columns based on conditions like left.or.right == "right", which if true becomes 1, and if false (i.e. if it's left), 0. Select the columns in the order you want.

long <- wide %>%
  gather(key, value = brain, -ID, -score, -group) %>%
  separate(key, into = c("left.or.right", "mid.or.lat", "dummy"), sep = "\\.") %>%
  mutate(left.or.right = as.numeric(left.or.right == "right"),
         mid.or.lat = as.numeric(mid.or.lat == "mid")) %>%
  select(ID, group, left.or.right, mid.or.lat, brain, score) %>%
  arrange(ID)

head(long)
#>    ID group left.or.right mid.or.lat brain score
#> 1 100     0             0          1    18    40
#> 2 100     0             1          1     4    40
#> 3 100     0             0          0    29    40
#> 4 100     0             1          0    30    40
#> 5 101     0             0          1    19    29
#> 6 101     0             1          1     7    29
camille
  • 16,432
  • 18
  • 38
  • 60