Sorting out the data with specific headers in R

Question

A small sample of the data are as follows:

df<-read.table (text=" ID   Class1a Time1a  MD1a    MD2a    Class1b Time1b  MD1b    MD2b    Class2a Time2a  MD3a    MD4a    Class2b Time2b  MD3b    MD4b    Class3a Time3a  MD5a    MD6a    Class3b Time3b  MD5b    MD6b
1   1   1   1   2   2   1   1   2   9   2   2   2   10  2   1   1   17  3   2   2   18  3   1   1
2   3   1   1   1   4   1   2   1   11  2   2   1   12  2   1   1   19  3   2   1   20  3   1   1
3   5   1   2   1   6   1   2   2   13  2   1   1   14  2   2   2   21  3   1   1   22  3   2   2
4   7   1   1   1   8   1   2   2   15  2   1   1   16  2   1   1   23  3   1   1   24  3   1   1
", header=TRUE)

I want to get the following output, especially headers

ID  Class   Time    MD  MD1 MD2
1   1   1   1-2 1   2
2   3   1   1-2 1   1
3   5   1   1-2 2   1
4   7   1   1-2 1   1
1   2   1   1-2 1   2
2   4   1   1-2 2   2
3   6   1   1-2 2   2
4   8   1   1-2 2   2
1   9   2   3-4 2   2
2   11  2   3-4 2   1
3   13  2   3-4 1   1
4   15  2   3-4 1   1
1   10  2   3-4 2   1
2   12  2   3-4 2   1
3   14  2   3-4 2   2
4   16  2   3-4 2   1
1   17  3   5-6 2   2
2   19  3   5-6 2   2
3   21  3   5-6 1   2
4   23  3   5-6 1   2
1   18  3   5-6 1   1
2   20  3   5-6 1   1
3   22  3   5-6 2   2
4   24  3   5-6 1   1

 df1<- df %>% pivot_longer(
      cols = starts_with("Time"),
      names_to = "Q",
      values_to = "Score",
      values_drop_na = TRUE)
    df2<- df1 %>% pivot_longer(
      cols = starts_with("Class"),
      names_prefix = "MD",
      values_drop_na = TRUE
    ) %>% dplyr::select(-value)

But I have failed the get the output of interest

r2evans · Accepted Answer · 2022-12-22T16:48:16.940

1

This answer started as a pivot_longer example using names_pattern, but while renaming some of them made sense, it becomes less intuitive how to easily extract the MD column (e.g., 1-2, 3-4) during the pivoting process.

Instead, let's split the frame by column-group, rename the columns as you'd like, then bind_rows them.

bind_rows(
  lapply(split.default(df[,-1], cumsum(grepl("Class", names(df)[-1]))), 
         function(Z) {
           out <- transform(Z, 
             ID = df$ID,
             MD = paste(gsub("\\D", "", grep("^MD", names(Z), value = TRUE)), collapse = "-"))
           names(out)[1:4] <- c("Class", "Time", "MD1", "MD3")
           out
         })
)
#    Class Time MD1 MD3 ID  MD
# 1      1    1   1   2  1 1-2
# 2      3    1   1   1  2 1-2
# 3      5    1   2   1  3 1-2
# 4      7    1   1   1  4 1-2
# 5      2    1   1   2  1 1-2
# 6      4    1   2   1  2 1-2
# 7      6    1   2   2  3 1-2
# 8      8    1   2   2  4 1-2
# 9      9    2   2   2  1 3-4
# 10    11    2   2   1  2 3-4
# 11    13    2   1   1  3 3-4
# 12    15    2   1   1  4 3-4
# 13    10    2   1   1  1 3-4
# 14    12    2   1   1  2 3-4
# 15    14    2   2   2  3 3-4
# 16    16    2   1   1  4 3-4
# 17    17    3   2   2  1 5-6
# 18    19    3   2   1  2 5-6
# 19    21    3   1   1  3 5-6
# 20    23    3   1   1  4 5-6
# 21    18    3   1   1  1 5-6
# 22    20    3   1   1  2 5-6
# 23    22    3   2   2  3 5-6
# 24    24    3   1   1  4 5-6

This relies on:

ID being the first column (ergo df[,-1] and names(df)[-1]), and
Each group of columns starting with a Class* column.

edited Dec 22 '22 at 16:48

answered Dec 22 '22 at 15:00

r2evans

141,215
6
77
149

If you type `rename` by itself (no parens, no args) and hit enter, does it say it is from `namespace:dplyr`? It seems possible that it may instead say `namespace:plyr`; if so, unless you _know_ you are using something from `plyr`, I recommend against even loading it. See https://stackoverflow.com/q/26106146/3358272. (I'm replying to a since-deleted comment about a `rename` error.) – r2evans Dec 22 '22 at 15:22
Sure, anything is possible, but it is a lot of work. It would require selecting `ID` and each group of columns (e.g., `Class1a`, `Class1b`, etc), pivoting each individually, then iteratively joining all of them together based on `ID` and whatever you call the `1a`/`1b` component. The efforts that are necessary to do this are one reason why `names_pattern=` exists: more concise code (and performance). Is there a reason you do not want to use this regex-based approach? – r2evans Dec 22 '22 at 15:41
Even that is going to have difficulty, though: if you create a list of pivoted frames, we either "join" them together (produces 5000+ rows, since `ID` alone is insufficient to identify individual rows), or we "column-bind" them together, ***HOPING*** that the order of rows is assured to be the same (it may be, but I don't know ... I don't like the fragility). – r2evans Dec 22 '22 at 15:52
Many thanks for your help. I made a mistake, and I am sorry. I missed adding the column MD to my output. I have updated the output, and I wonder if you could update your code. If not, I understand it. Apologetically – user330 Dec 22 '22 at 16:11
See my edit. I hope the structure of your data is such that this is safe enough. The most fragile part of this (in my head) is the renaming of the columns, it might be more robust. I hope this helps. – r2evans Dec 22 '22 at 16:48
1

Superb! Many appreciated , upvoted – user330 Dec 22 '22 at 17:08

Sorting out the data with specific headers in R

1 Answers1