1

I am trying to change my data from wide to long on r using pivot_longer. There appear to be a few people having similar issues on here but I have been unable to adapt their solutions to my data. I have attached a picture of example data in the wide data example and what I am trying to achieve in the long data example.

In summary I have a time and reference column which refer to all columns, I also have multiple columns of group, subject, ID, xcoordinate ycoordinate in the form of: group1, subject1. ID1, xcoord1 ycoord1, group2, subject2, ID2, xcoord2, ycoord2 and so on... What I want is a long table with columns:

time, reference, group, subject, ID, xcoord, ycoord.

With the 5 columns stacking their respective numbered columns, and the time and reference columns repeating for the relevant stacks.

 df %>%
 pivot_longer(cols = -c(time, reference),
names_to = c("group", "subject", "ID", "xcoord", "ycoord")

My understanding is that I require to use the names_pattern function, although I cant seem to get that to work, and I cant find anything clear describing How I should be using it. I should say my data is much wider than the example data, so can't really rely on numbering of columns.

appreciate any help

wide data example

    time reference group1 subject1 ID1 xcoord1 ycoord1 group2 subject2 ID2 xcoord2 ycoord2 group3 subject3 ID3 xcoord3 ycoord3
1  00:01   4097365      1        4   1    7.44   38.16      0       21   2   33.90   47.26      1       15   3   21.53    2.67
2  00:02   4097366      1        4   1    9.84   37.03      0       21   2   32.98   48.47      1       15   3   21.82    2.95
3  00:03   4097367      1        4   1   12.01   35.83      0       21   2   30.17   50.33      1       15   3   22.06    4.45
4  00:04   4097368      1        4   1   12.15   34.17      0       21   2   29.85   50.52      1       15   3   23.50    4.75
5  00:05   4097369      1        4   1   15.27   32.94      0       21   2   28.39   51.30      1       15   3   24.25    4.76
6  00:06   4097370      1        4   1   18.96   31.98      0       21   2   28.39   52.36      1       15   3   25.31    6.57
7  00:07   4097371      1        4   1   22.50   31.13      0       21   2   26.59   53.14      1       15   3   26.05    7.04
8  00:08   4097372      1        4   1   27.47   30.15      0       21   2   25.89   53.94      1       15   3   27.29    7.91
9  00:09   4097373      1        4   1   32.17   29.92      0       21   2   24.64   54.42      1       15   3   27.47    8.44
10 00:10   4097374      1        4   1   33.77   27.49      0       21   2   24.61   55.23      1       15   3   28.59    8.71

Long data example

   time reference group subject ID xcoord ycoord
1  00:01   4097365     1       4  1   7.44  38.16
2  00:01   4097365     0      21  2  33.90  47.26
3  00:01   4097365     1      15  3  21.53   2.67
4  00:02   4097366     1       4  1   9.84  37.03
5  00:02   4097367     0      21  2  32.98  48.47
6  00:02   4097368     1      15  3  21.82   2.95
7  00:03   4097369     1       4  1  12.01  35.83
8  00:03   4097370     0      21  2  30.17  50.33
9  00:03   4097371     1      15  3  22.06   4.45
10 00:04   4097372     1       4  1  12.15  34.17

edit: playing about a bit with the data I have managed to achieve this odd solution which is a mixture of long and wide data.

dput(head(df1))

structure(list(time = c(0, 0, 0, 0, 0, 0), state = structure(c(2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Alive", "Alive;:", "Dead", "Dead;:"
), class = "factor"), reference = c("1880439", "1880439", "1880439", 
"1880439", "1880439", "1880439"), num = c("1", NA, "2", "3", 
"4", "5"), group = c("1", NA, "1", "4", "0", "0"), X = c(NA, 
NA, NA, NA, NA, NA), ID = c(1L, NA, 2L, 4L, 5L, 6L), subect = c(21L, 
NA, 7L, -1L, 2L, 6L), x = c(3514L, NA, 2807L, 5550L, 3956L, 3686L
), y = c(-1644L, NA, -510L, 4400L, 1297L, -55L), speed = c("5.23", 
NA, "3.24", "0.00", "2.31", "3.57"), group1 = c("0", NA, "4", 
"1", "1", "0"), ID1 = c(13L, NA, 14L, 15L, 16L, 17L), subect1 = c(9L, 
NA, -1L, 13L, 14L, 11L), x1 = c(882L, NA, 5550L, 3004L, 761L, 
3317L), y1 = c(-1468L, NA, 4400L, 1633L, 559L, 1443L), speed1 = c("1.70", 
NA, "0.00", "3.06", "2.92", "3.30"), group2 = c("4", NA, "0", 
"1", "0", "0"), ID2 = c(24L, NA, 25L, 26L, 27L, 28L), subect2 = c(-1L, 
NA, 1L, 18L, 5L, 10L), x2 = c(5550L, NA, 5031L, 3936L, 3972L, 
3623L), y2 = c(4400L, NA, -74L, 190L, 686L, 356L), speed2 = c("0.00", 
NA, "0.54", "1.06", "0.95", "2.49"), speed.group2 = c(NA, NA, 
NA, NA, NA, NA)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-6L)).

the dataframe the code results in looks like this

> head(df1)
# A tibble: 6 x 24
   time state reference num   group X        ID subect     x     y speed group1   ID1 subect1    x1    y1 speed1 group2   ID2 subect2    x2    y2 speed2
  <dbl> <fct> <chr>     <chr> <chr> <lgl> <int>  <int> <int> <int> <chr> <chr>  <int>   <int> <int> <int> <chr>  <chr>  <int>   <int> <int> <int> <chr> 
1     0 Aliv~ 1880439   1     1     NA        1     21  3514 -1644 5.23  0         13       9   882 -1468 1.70   4         24      -1  5550  4400 0.00  
2     0 Aliv~ 1880439   NA    NA    NA       NA     NA    NA    NA NA    NA        NA      NA    NA    NA NA     NA        NA      NA    NA    NA NA    
3     0 Aliv~ 1880439   2     1     NA        2      7  2807  -510 3.24  4         14      -1  5550  4400 0.00   0         25       1  5031   -74 0.54  
4     0 Aliv~ 1880439   3     4     NA        4     -1  5550  4400 0.00  1         15      13  3004  1633 3.06   1         26      18  3936   190 1.06  
5     0 Aliv~ 1880439   4     0     NA        5      2  3956  1297 2.31  1         16      14   761   559 2.92   0         27       5  3972   686 0.95  
6     0 Aliv~ 1880439   5     0     NA        6      6  3686   -55 3.57  0         17      11  3317  1443 3.30   0         28      10  3623   356 2.49  
# ... with 1 more variable: speed.group2 <lgl>
Stavrum
  • 51
  • 1
  • 7
  • 1
    Welcome to stackoverflow. Providing links to sample data is generally discouraged because it makes it difficult for people to work with the data, and there is a non-trivial risk involved in clicking unknown links. – Giovanni Colitti Feb 04 '20 at 16:20
  • sorry, that should be sorted now. still a relative novice so was a bit unsure how to do this initially but after thinking a bit it seems quite obvious now – Stavrum Feb 04 '20 at 16:32

1 Answers1

1

Would first rename columns and insert underscore right before number, then use that as separator in pivot_longer.

library(tidyverse)

df %>%
  rename_at(-c(1:2), ~ str_replace(., "(\\w+)(\\d)", "\\1_\\2")) %>% 
  pivot_longer(cols = -c(1:2), names_to = c(".value", "num"), names_sep = "_")

Edit (2/7/20):

With your updated dataset, it appears that some of the variable column names don't have a number at the end. We can add 0 for those.

Also, I assume you want: group, ID, subect, x, y, speed that are repeated (with the first group in column 5 separated from its related variables in columns 7-11).

df1 %>%
  rename_at(c(5,7:11), ~ paste0(., "0")) %>%
  rename_at(-c(1:4, 6, 24), ~ str_replace(., "(\\w+)(\\d+)", "\\1_\\2")) %>%
  pivot_longer(cols = -c(1:4, 6, 24), names_to = c(".value", "val"), names_sep = "_")

Output (Revised):

# A tibble: 18 x 13
    time state   reference num   X     speed.group2 val   group    ID subect     x     y speed
   <dbl> <fct>   <chr>     <chr> <lgl> <lgl>        <chr> <chr> <int>  <int> <int> <int> <chr>
 1     0 Alive;: 1880439   1     NA    NA           0     1         1     21  3514 -1644 5.23 
 2     0 Alive;: 1880439   1     NA    NA           1     0        13      9   882 -1468 1.70 
 3     0 Alive;: 1880439   1     NA    NA           2     4        24     -1  5550  4400 0.00 
 4     0 Alive;: 1880439   NA    NA    NA           0     NA       NA     NA    NA    NA NA   
 5     0 Alive;: 1880439   NA    NA    NA           1     NA       NA     NA    NA    NA NA   
 6     0 Alive;: 1880439   NA    NA    NA           2     NA       NA     NA    NA    NA NA   
 7     0 Alive;: 1880439   2     NA    NA           0     1         2      7  2807  -510 3.24 
 8     0 Alive;: 1880439   2     NA    NA           1     4        14     -1  5550  4400 0.00 
 9     0 Alive;: 1880439   2     NA    NA           2     0        25      1  5031   -74 0.54 
10     0 Alive;: 1880439   3     NA    NA           0     4         4     -1  5550  4400 0.00 
11     0 Alive;: 1880439   3     NA    NA           1     1        15     13  3004  1633 3.06 
12     0 Alive;: 1880439   3     NA    NA           2     1        26     18  3936   190 1.06 
13     0 Alive;: 1880439   4     NA    NA           0     0         5      2  3956  1297 2.31 
14     0 Alive;: 1880439   4     NA    NA           1     1        16     14   761   559 2.92 
15     0 Alive;: 1880439   4     NA    NA           2     0        27      5  3972   686 0.95 
16     0 Alive;: 1880439   5     NA    NA           0     0         6      6  3686   -55 3.57 
17     0 Alive;: 1880439   5     NA    NA           1     0        17     11  3317  1443 3.30 
18     0 Alive;: 1880439   5     NA    NA           2     0        28     10  3623   356 2.49
Ben
  • 28,684
  • 5
  • 23
  • 45
  • thanks for the help however this isn't working on my full data set. the error I am receiving is: Error: Expected a vector, not NULL Run `rlang::last_error()` to see where the error occurred. In addition: Warning message: Expected 2 pieces. Additional pieces discarded in 174 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...]. – Stavrum Feb 05 '20 at 15:31
  • Sorry to hear - can you share info on your full data set such as `dput(head(df))`? And edit the question above with the results of this? This would at least provide the first 6 rows of data and show the structure of your columns and names. – Ben Feb 05 '20 at 15:46
  • I have managed to play about with it a bit, however there appears to still be an error. There are three blocks or chunks which I have tried (unsuccessfully to reorganise this further, Ideally, I would still like a simpler solution – Stavrum Feb 07 '20 at 16:49
  • @Stavrum see edited answer, I hope this might be helpful. I'm happy to try again if this is closer to what you need. – Ben Feb 07 '20 at 17:54