1

I have a data.frame with colnames: A01, A02, ..., A25, ..., Z01, ..., Z25 (altogether 26*25). For example:

set.seed(1)
df <- data.frame(matrix(rnorm(26*25),ncol=26*25,nrow=1))
cols <- c(paste("0",1:9,sep=""),10:25)
colnames(df) <- c(sapply(LETTERS,function(l) paste(l,cols,sep="")))

and I want to dcast it to a data.frame of 26x25 (rows will be A-Z and columns 01-25). Any idea what would be the formula for this dcast?

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
dan
  • 6,048
  • 10
  • 57
  • 125
  • With base R `reshape` you can get a good start with `reshape(df, direction="long", sep="", varying=TRUE, timevar=NULL)` – thelatemail May 22 '17 at 05:02
  • How about `melt` `(data.table)` first to get 2 columns, the `value` and the the `original column` name. Then, use `tstrsplit` to separate the letters and numbers, then paste the numbers together, then use `dcast`. Is this a bad idea? – din May 22 '17 at 05:13
  • @din - that sounds like a reasonable solution to me. – thelatemail May 22 '17 at 05:21

3 Answers3

1

The removing of columns doesn't look nice (still learning data.table). Someone needs to make that one nice.

# convert to data.table
df <- data.table(df)

# melt all the columns first
test <- melt(df, measure.vars = names(df))

# split the original column name by letter
# paste the numbers together
# then remove the other columns
test[ , c("ch1", "ch2", "ch3") := tstrsplit(variable, "")][ , "ch2" := 
paste(ch2, ch3, sep = "")][ , c("ch3", "variable") := NULL]

# dcast with the letters (ch1) as rows and numbers (ch2) as columns
dcastOut <- dcast(test, ch1 ~ ch2 , value.var = "value")

Then just remove the first column which contains the number?

din
  • 692
  • 5
  • 12
1

We can use tidyverse

library(tidyverse)
res <- gather(df) %>%
           group_by(key = sub("\\D+", "", key))  %>% 
           mutate(n = row_number()) %>%
           spread(key, value) %>%
           select(-n)
dim(res)
#[1] 26 25
akrun
  • 874,273
  • 37
  • 540
  • 662
1

The "formula" you're looking for can come from the patterns argument in the "data.table" implementation of melt. dcast is for going from a "long" form to a "wide" form, while melt is for going from a wide form to a long(er) form. melt() does not use a formula approach.

Essentially, you would need to do something like:

library(data.table)
setDT(df)                     ## convert to a data.table
cols <- sprintf("%02d", 1:25) ## Easier way for you to make cols in the future
melt(df, measure.vars = patterns(cols), variable.name = "ID")[, ID := LETTERS][]
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485