Wide to Long Form Regex R

Question

I'm trying to convert data from wide to long form with tidyr, but also open to other options. This is fake dataset with repeating values but it has the same structure as real dataset

structure(list(Category = c("Pre", "Pre", "Pre", "post_med_1", 
"post_med_1", "post_med_1", "post_med_2", "post_med_2", "post_med_2"
), Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), Subj_1_tox = c(4.2, 
5, 2.3, 4.2, 5, 2.3, 4.2, 5, 2.3), Subj_2_tox = c(23L, 1L, 4L, 
23L, 1L, 4L, 23L, 1L, 4L), Subj_3_tox = c(6, 4.9, 3.2, 6, 4.9, 
3.2, 6, 4.9, 3.2), Subj_1_a1 = c(4.2, 5, 2.3, 4.2, 5, 2.3, 4.2, 
5, 2.3), Subj_2_a1 = c(23L, 1L, 4L, 23L, 1L, 4L, 23L, 1L, 4L), 
    Subj_3_a1 = c(6, 4.9, 3.2, 6, 4.9, 3.2, 6, 4.9, 3.2)), class = "data.frame", row.names = c(NA, 
-9L))

The confusing part for me is how in one call could I convert the tox columns and the a1 columns to long form and maintain the category and time columns. First off is the regex for the name pattern. I have looked up regex patterns, but not clear how to get it and second how to include 2 different value columns in 1 call?

Basically something like this in one call

df_longer<-df %>% 
  pivot_longer(
    cols=contains("tox") & contains("a1"), 
    names_to = c("subject", "tox", "a1"), 
    names_pattern = "(Subj_['all_numbers') (tox and a1) "
    values_to = c("tox_value", "a1"))

With the end result being the Subject(#) being in one column called subject and the tox values and the a1 values being in other columns. Is it possible to do this in one call? I'm also open to other solutions but am trying to learn tidyr more

The final result should look something like this, but values are not right in this one but the other parts are accurate.

score 2 · Accepted Answer · answered Jun 30 '21 at 01:11

You can use names_to and names_pattern as -

tidyr::pivot_longer(df, 
                    cols = starts_with("Subj"), 
                    names_to = c("subject", ".value"), 
                    names_pattern = "(Subj_\\d+)_(.*)")

#  Category    Time subject   tox    a1
#   <chr>      <int> <chr>   <dbl> <dbl>
# 1 Pre            1 Subj_1    4.2   4.2
# 2 Pre            1 Subj_2   23    23  
# 3 Pre            1 Subj_3    6     6  
# 4 Pre            2 Subj_1    5     5  
# 5 Pre            2 Subj_2    1     1  
# 6 Pre            2 Subj_3    4.9   4.9
# 7 Pre            3 Subj_1    2.3   2.3
# 8 Pre            3 Subj_2    4     4  
# 9 Pre            3 Subj_3    3.2   3.2
#10 post_med_1     1 Subj_1    4.2   4.2
# … with 17 more rows

`.value` indicates that part of the column names in the original dataframe will comprise of new column names. We define the part in `names_pattern`. — Ronak Shah, Jun 30 '21 at 02:34

score 0 · Answer 2 · answered Jun 29 '21 at 23:11

If I understood your post correctly, then this is possible in one line. See my solution below,

# load library;
library(tidyverse)

# Store data;
tmpData <- structure(
                list(
                        Category = c(
                                "Pre",
                                "Pre",
                                "Pre",
                                "post_med_1",
                                "post_med_1",
                                "post_med_1",
                                "post_med_2",
                                "post_med_2",
                                "post_med_2"
                        ),
                        Time = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L),
                        Subj_1_tox = c(4.2,
                                       5, 2.3, 4.2, 5, 2.3, 4.2, 5, 2.3),
                        Subj_2_tox = c(23L, 1L, 4L,
                                       23L, 1L, 4L, 23L, 1L, 4L),
                        Subj_3_tox = c(6, 4.9, 3.2, 6, 4.9,
                                       3.2, 6, 4.9, 3.2),
                        Subj_1_a1 = c(4.2, 5, 2.3, 4.2, 5, 2.3, 4.2,
                                      5, 2.3),
                        Subj_2_a1 = c(23L, 1L, 4L, 23L, 1L, 4L, 23L, 1L, 4L),
                        Subj_3_a1 = c(6, 4.9, 3.2, 6, 4.9, 3.2, 6, 4.9, 3.2)
                ),
                class = "data.frame",
                row.names = c(NA,
                              -9L)
        )

# Pivot longer;
tmpData %>% pivot_longer(cols = contains("Subj"),
                         names_to = "subject")

You do not necessarily need regex if your colums have common, and distinc, prefixes as is the case for you data where all colums you need starts with Subj. Hence, you can just use contains() from dplyr.

Please let me know if I misunderstood your question.

you need 2 columns for the values which are tox and a1. Furthermore, you need to get the Subjects by themselves in their own column without the _tox or _a1 added on. — rj44, Jun 29 '21 at 23:26

Wide to Long Form Regex R

2 Answers2