How to unpivot specific columns in R dataframe

Question

I have a dataframe in R formatted as shown in the first table below. I would like to merge columns "M1.1," "M1.2," and "M1.3" into a single column "M1" so that entries are on their own rows (id and values in other columns would be repeated) as shown in the second table. What function(s) could I use to accomplish this?

id	M1.1	M1.2	M1.3	M2	M3	M4	M5	M6
test a	test t	test a	test y	test	test t	test y	test u	test w
test s	test r	test a	test h	test r	test j	test j	test w	test d

id	M1	M2	M3	M4	M5	M6
test a	test t	test	test t	test y	test u	test w
test a	test a	test	test t	test y	test u	test w
test a	test y	test	test t	test y	test u	test w
test s	test r	test r	test j	test j	test w	test d
test s	test a	test r	test j	test j	test w	test d
test s	test h	test r	test j	test j	test w	test d

Does this answer your question? [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) — Martin Gal, Aug 23 '21 at 19:18

TarJae · Accepted Answer · 2021-08-23T19:02:50.540

5

We could use pivot_longer:

library(dplyr)
library(tidyr)
df %>% 
    pivot_longer(
        cols = c(M1.1, M1.2, M1.3),
        names_to = "names",
        values_to = "M1"
    ) %>% 
    select(id, M1, M2:M6)

 A tibble: 6 x 7
  id     M1     M2     M3     M4     M5     M6    
  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1 test a test t test   test t test y test u test w
2 test a test a test   test t test y test u test w
3 test a test y test   test t test y test u test w
4 test s test r test r test j test j test w test d
5 test s test a test r test j test j test w test d
6 test s test h test r test j test j test w test d

data:

structure(list(id = c("test a", "test s"), M1.1 = c("test t", 
"test r"), M1.2 = c("test a", "test a"), M1.3 = c("test y", "test h"
), M2 = c("test", "test r"), M3 = c("test t", "test j"), M4 = c("test y", 
"test j"), M5 = c("test u", "test w"), M6 = c("test w", "test d"
)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))

edited Aug 23 '21 at 19:02

answered Aug 23 '21 at 19:01

TarJae

72,363
6
19
66

I did it by hand. I have added the data for you :-) – TarJae Aug 23 '21 at 19:04
@AnoushiravanR Just mark the answer as a solution, it is considered as double upvote. – utubun Aug 23 '21 at 19:09
1

@AnoushiravanR You could use a `tribble`, but you need some manual effort for this, which is kind of annoying. In this case you could hit the `Edit` button and copy the source of the shown table. Next use `readr::read_delim([COPY-PASTE-CODE-in-""], delim = "|", trim_ws = TRUE)`. Now you just need to remove two columns. – Martin Gal Aug 23 '21 at 19:12
2

Sorry, I am becoming blind :) Doublevoted! – utubun Aug 23 '21 at 19:12
3

@AnoushiravanR Best Practice: Wait for some like TarJae to import the data by hand and provide a `dput()`-structure. ;-) – Martin Gal Aug 23 '21 at 19:13
@TarJae thank you, seems to be working except for in cases where there are NA's in the data. How would I keep the NA values in this case? For example if column M1.2 had "NA" instead of "test a" in the first row? – piper180 Aug 23 '21 at 19:33
It works also with NA. I have tested it. Same code gives: ` id M1 M2 M3 M4 M5 M6 1 test a test t test test t test y test u test w 2 test a NA test test t test y test u test w 3 test a test y test test t test y test u test w 4 test s test r test r test j test j test w test d 5 test s test a test r test j test j test w test d 6 test s test h test r test j test j test w test d` – TarJae Aug 23 '21 at 19:37
1

@TarJae you're correct, it works great -- my bad! Thanks for the help! – piper180 Aug 23 '21 at 20:00
@TarJae, I was able to implement your method on a small dataset, however am running into the following error on my actual dataset: "Error: cannot allocate vector of size 261.5 Mb." Do you have any suggestions on how to accomplish this without running into the memory issue? Thank you. – piper180 Aug 30 '21 at 13:46
1

@ava. Please have a look here: . My guess is that your new data must be huge? Try to manage with the given answers here and tell me. – TarJae Aug 30 '21 at 14:12

score 1 · Answer 2 · answered Aug 23 '21 at 19:35

Using @TarJae data, and presumably faster data.table:

library(data.table)

dat <- data.table(dat)

melt(dat, , paste('M1', 1:3, sep = '.'), , 'M1')[
  order(id),
  c('id', paste('M', 1:6, sep = ''))
]

#        id     M1     M2     M3     M4     M5     M6
# 1: test a test t   test test t test y test u test w
# 2: test a test a   test test t test y test u test w
# 3: test a test y   test test t test y test u test w
# 4: test s test r test r test j test j test w test d
# 5: test s test a test r test j test j test w test d
# 6: test s test h test r test j test j test w test d

How to unpivot specific columns in R dataframe

2 Answers2