1

I have a dataframe in R formatted as shown in the first table below. I would like to merge columns "M1.1," "M1.2," and "M1.3" into a single column "M1" so that entries are on their own rows (id and values in other columns would be repeated) as shown in the second table. What function(s) could I use to accomplish this?

id M1.1 M1.2 M1.3 M2 M3 M4 M5 M6
test a test t test a test y test test t test y test u test w
test s test r test a test h test r test j test j test w test d
id M1 M2 M3 M4 M5 M6
test a test t test test t test y test u test w
test a test a test test t test y test u test w
test a test y test test t test y test u test w
test s test r test r test j test j test w test d
test s test a test r test j test j test w test d
test s test h test r test j test j test w test d
piper180
  • 329
  • 2
  • 12
  • Does this answer your question? [Reshaping data.frame from wide to long format](https://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) – Martin Gal Aug 23 '21 at 19:18

2 Answers2

5

We could use pivot_longer:

library(dplyr)
library(tidyr)
df %>% 
    pivot_longer(
        cols = c(M1.1, M1.2, M1.3),
        names_to = "names",
        values_to = "M1"
    ) %>% 
    select(id, M1, M2:M6)
 A tibble: 6 x 7
  id     M1     M2     M3     M4     M5     M6    
  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr> 
1 test a test t test   test t test y test u test w
2 test a test a test   test t test y test u test w
3 test a test y test   test t test y test u test w
4 test s test r test r test j test j test w test d
5 test s test a test r test j test j test w test d
6 test s test h test r test j test j test w test d

data:

structure(list(id = c("test a", "test s"), M1.1 = c("test t", 
"test r"), M1.2 = c("test a", "test a"), M1.3 = c("test y", "test h"
), M2 = c("test", "test r"), M3 = c("test t", "test j"), M4 = c("test y", 
"test j"), M5 = c("test u", "test w"), M6 = c("test w", "test d"
)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • I did it by hand. I have added the data for you :-) – TarJae Aug 23 '21 at 19:04
  • @AnoushiravanR Just mark the answer as a solution, it is considered as double upvote. – utubun Aug 23 '21 at 19:09
  • 1
    @AnoushiravanR You could use a `tribble`, but you need some manual effort for this, which is kind of annoying. In this case you could hit the `Edit` button and copy the source of the shown table. Next use `readr::read_delim([COPY-PASTE-CODE-in-""], delim = "|", trim_ws = TRUE)`. Now you just need to remove two columns. – Martin Gal Aug 23 '21 at 19:12
  • 2
    Sorry, I am becoming blind :) Doublevoted! – utubun Aug 23 '21 at 19:12
  • 3
    @AnoushiravanR Best Practice: Wait for some like TarJae to import the data by hand and provide a `dput()`-structure. ;-) – Martin Gal Aug 23 '21 at 19:13
  • @TarJae thank you, seems to be working except for in cases where there are NA's in the data. How would I keep the NA values in this case? For example if column M1.2 had "NA" instead of "test a" in the first row? – piper180 Aug 23 '21 at 19:33
  • It works also with NA. I have tested it. Same code gives: ` id M1 M2 M3 M4 M5 M6 1 test a test t test test t test y test u test w 2 test a NA test test t test y test u test w 3 test a test y test test t test y test u test w 4 test s test r test r test j test j test w test d 5 test s test a test r test j test j test w test d 6 test s test h test r test j test j test w test d` – TarJae Aug 23 '21 at 19:37
  • 1
    @TarJae you're correct, it works great -- my bad! Thanks for the help! – piper180 Aug 23 '21 at 20:00
  • @TarJae, I was able to implement your method on a small dataset, however am running into the following error on my actual dataset: "Error: cannot allocate vector of size 261.5 Mb." Do you have any suggestions on how to accomplish this without running into the memory issue? Thank you. – piper180 Aug 30 '21 at 13:46
  • 1
    @ava. Please have a look here: . My guess is that your new data must be huge? Try to manage with the given answers here and tell me. – TarJae Aug 30 '21 at 14:12
1

Using @TarJae data, and presumably faster data.table:

library(data.table)

dat <- data.table(dat)

melt(dat, , paste('M1', 1:3, sep = '.'), , 'M1')[
  order(id),
  c('id', paste('M', 1:6, sep = ''))
]

#        id     M1     M2     M3     M4     M5     M6
# 1: test a test t   test test t test y test u test w
# 2: test a test a   test test t test y test u test w
# 3: test a test y   test test t test y test u test w
# 4: test s test r test r test j test j test w test d
# 5: test s test a test r test j test j test w test d
# 6: test s test h test r test j test j test w test d
utubun
  • 4,400
  • 1
  • 14
  • 17