R wide to long form based on numbers in column name

Question

I have a wide form dataframe

structure(list(Id = structure(2:1, .Label = c("Jake", "Jim"
), class = "factor"), Q2.2 = c(NA, NA), Q2.4 = c(NA, NA), Q2.5 = c(NA, 
NA), Q2.6 = c(NA, 4L), Q3.2 = c(NA, NA), Q3.4 = c(NA, NA), Q3.5 = c(NA, 
NA), Q3.6 = c(4L, NA), Q4.2 = c(NA, NA), Q4.4 = c(2L, NA), Q4.5 = c(NA, 
3L), Q4.6 = c(4L, NA), Q5.2 = c(NA, NA), Q5.4 = c(2L, NA), Q5.5 = c(NA, 
NA), Q5.6 = c(NA, NA)), class = "data.frame", row.names = c(NA, 
-2L))

I'm trying to turn this into a long form dataframe based only on the number after the period. So all the values in columns with a .2 should go in one column and all values in columns with a .4 should go in another column. This goes on for each of the suffixes. To explain more clearly, I want something that looks like this

You need to create a new column with the relevant substring, then follow the link in @jay.sf's comment. There really isn't an elegant way of shortcutting that step, unfortunately. — r2evans, Mar 01 '19 at 07:11
@jay.sf i understand wide to long format a bit, but I think this is not a duplicate. The main issue is the shaping based on the number after the `.` — Jin, Mar 01 '19 at 07:38

rove · Accepted Answer · 2019-03-01T07:30:57.780

I think OPs question has an added detail to the reshape question posted in the comments

You can use data.table:

library(data.table)                                                                                                                                                                                                                                                                                                                                       

setDT(dt) # Convert to data.table object

# Melt it into long format
dt <- melt(dt, id.vars = "Id")
dt[, variable := as.character(variable)]
dt[, value := as.character(value) ]

# make a substring of the type ID
type_id <- sapply((strsplit(dt$variable,
 split = ".", fixed = TRUE)),
 '[[', 2) # This is a subsetting function passed to sapply() - gets the second element of the output list

type_id <- paste0("type", type_id)

# Make a substring of the Quarter ID
quarter_id <- sapply((strsplit(dt$variable, split = ".", fixed = TRUE)), '[[', 1)

# Use data.table's dcast() to reshape the data again to get the shape you require
dt[, type_id := type_id]
dt[, quarter_id := quarter_id]
dcast(dt, Id + quarter_id ~ type_id, value.var = "value")

## Output
    Id quarter_id type2 type4 type5 type6
1: Jake         Q2  <NA>  <NA>  <NA>     4
2: Jake         Q3  <NA>  <NA>  <NA>  <NA>
3: Jake         Q4  <NA>  <NA>     3  <NA>
4: Jake         Q5  <NA>  <NA>  <NA>  <NA>
5:  Jim         Q2  <NA>  <NA>  <NA>  <NA>
6:  Jim         Q3  <NA>  <NA>  <NA>     4
7:  Jim         Q4  <NA>     2  <NA>     4
8:  Jim         Q5  <NA>     2  <NA>  <NA>

You can actually do this quite elegantly using `patterns()` in `melt`: `cols <- names(DT)[-1]; unique_cols <- unique(substr( x = cols, nchar(cols), nchar(cols))); pattern <- paste0(unique_cols, "$"); melt(DT, id.vars = "Id", measure.vars = patterns(pattern), value.name = paste0("type_", unique_cols))` — markus, Mar 01 '19 at 08:22

score 1 · Answer 2 · answered Mar 01 '19 at 07:36

@rove beat me to it, but wanted to post another approach as long as I had 80% of it:

library(reshape2)

x.melted <- melt(x, id.vars=c("Id"))

x.melted$quarter.id <- gsub("\\..*","", x.melted$variable)

x.melted$number.after.period <- gsub("Q[0-9]\\.", "", x.melted$variable)

x.casted <- dcast(x.melted, Id + quarter.id ~ number.after.period, )

colnames(x.casted)[3:ncol(x.casted)] <- paste("Type.", 
                                  colnames(x.casted)[3:ncol(x.casted)], sep="")

x.casted

    Id quarter.id Type.2 Type.4 Type.5 Type.6
1 Jake         Q2     NA     NA     NA      4
2 Jake         Q3     NA     NA     NA     NA
3 Jake         Q4     NA     NA      3     NA
4 Jake         Q5     NA     NA     NA     NA
5  Jim         Q2     NA     NA     NA     NA
6  Jim         Q3     NA     NA     NA      4
7  Jim         Q4     NA      2     NA      4
8  Jim         Q5     NA      2     NA     NA

R wide to long form based on numbers in column name

2 Answers2