0

I have a dataset that is quite in a wide-format and I want it to be in a long format. Usually I use melt for that cases but here I don't know if this will work. That's my dataset:

> Data <-
+   data.table(
+     ID = 1:6,
+     N1 = round(rnorm(6)),
+     E1 = round(rnorm(6)),
+     N2 = round(rnorm(6, 5)),
+     E2 = round(rnorm(6, 5)),
+     Class1 = 1,
+     Class2 = 2
+   )
> 
> Data
   ID N1 E1 N2 E2 Class1 Class2
1:  1  0  0  4  5      1      2
2:  2 -1  0  5  5      1      2
3:  3  0 -1  5  5      1      2
4:  4  1  0  5  5      1      2
5:  5 -1 -1  4  7      1      2
6:  6 -2 -1  6  6      1      2

My desired dataset is that one:

> Data.Long <- 
+   rbind(
+     Data[, .(ID, N = N1, E = E1, Class = Class1)],
+     Data[, .(ID, N = N2, E = E2, Class = Class2)]
+   )
> Data.Long
    ID  N  E Class
 1:  1  0  0     1
 2:  2 -1  0     1
 3:  3  0 -1     1
 4:  4  1  0     1
 5:  5 -1 -1     1
 6:  6 -2 -1     1
 7:  1  4  5     2
 8:  2  5  5     2
 9:  3  5  5     2
10:  4  5  5     2
11:  5  4  7     2
12:  6  6  6     2

For this case my attempt with rbind and variable selection is quite OK. But in my real dataset I have more variables like ID and probably I have more than two classes. Can you think of a better code that won't be repetitive even if there are many classes?

TobiSonne
  • 1,044
  • 7
  • 22
  • it's not quite clear about the "Class": in your original dataset, `Class1` column equals `1` for all records, `Class2` always equals `2` etc. - is that right? – Vasily A Oct 31 '20 at 08:48
  • No not always, that's only in this minimal example. – TobiSonne Oct 31 '20 at 08:53

1 Answers1

2

We can use pivot_longer from tidyr :

tidyr::pivot_longer(Data, 
                    cols = -ID,
                    names_to = '.value',
                    names_pattern = '([A-Za-z]+)') 

#      ID     N     E Class
#   <int> <dbl> <dbl> <dbl>
# 1     1     0     0     1
# 2     1     5     6     2
# 3     2    -1    -2     1
# 4     2     6     6     2
# 5     3     2     0     1
# 6     3     6     5     2
# 7     4     0     0     1
# 8     4     4     6     2
# 9     5    -2     1     1
#10     5     5     6     2
#11     6    -1     0     1
#12     6     6     5     2

.value has a special meaning in pivot_longer which means that the new columns in the long format would have names from the original column names. How are those name derived is defined using names_pattern argument. In names_pattern we mentioned that extract all the characters ([A-Za-z]+) from the name as new name. So N1, N2 become N and they are combined into one column. Same happens with E1, E2 and Class1, Class2 pair.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Great, thank you! Could you explain this a bit please? `names_pattern` is a regex for one ore more letters (why?) and what does `'.value'` in `names_to` do? – TobiSonne Oct 31 '20 at 08:55
  • 1
    I have updated the answer to include some explanation. – Ronak Shah Oct 31 '20 at 09:09