0

need to reshape a data.frame from this

  TestID Machine1Measure Machine1Count Machine2Measure Machine2Count
1  10006              11            14              16            24
2  10007              23            27              32            35

To this:

  TestID Machine Measure Count
1  10006       1      11    14
2  10006       2      16    24
3  10007       1      23    27
4  10007       2      32    35

Below is code to create each. Looked at reshape in R but couldn't figure out how to split the names

Note: this is a subset of the columns - there are 70-140 machines. How can I make this simpler?

b <-data.frame(10006:10007, matrix(c(11,23,14,27,16,32,24,35),2,4)) 
colnames(b) <- c("TestID", "Machine1Measure", "Machine1Count", "Machine2Measure", "Machine2Count") 

a<-data.frame(matrix(c(10006,10006,10007,10007,1,2,1,2,11,16,23,32,14,24,27,35),4,4)) 
colnames(a) <- c("TestID", "Machine", "Measure", "Count") 

b
a
eAndy
  • 323
  • 2
  • 9
  • similar: https://stackoverflow.com/q/2185252/5977215 – SymbolixAU Sep 27 '18 at 05:56
  • 1
    See also [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns) and the 'Linked' posts therein. – Henrik Sep 27 '18 at 07:09

3 Answers3

5

The following reproduces your expected output:

df %>%
    gather(key, value, -TestID) %>%
    separate(key, into = c("tmp", "what"), sep = "(?<=\\d)") %>%
    separate(tmp, into = c("tmp", "Machine"), sep = "(?=\\d+)") %>%
    spread(what, value) %>%
    select(-tmp)
#  TestID Machine Count Measure
#1  10006       1    14      11
#2  10006       2    24      16
#3  10007       1    27      23
#4  10007       2    35      32

Explanation: We reshape data from wide to long, and use two separate calls to separate the various values and ids before reshaping again from long to wide. (We use a positive look-ahead and positive look-behind to separate the keys into the required fields.)


Sample data

df <- read.table(text =
    "  TestID Machine1Measure Machine1Count Machine2Measure Machine2Count
1  10006              11            14              16            24
2  10007              23            27              32            35", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
3

data.table can do all this within one melt, which is almost 30x faster than the (perfectly working) tidyverse solution provided by MauritsEvers.

It uses patterns to define the columns with 'Measure' and 'Count' in their names, and then melts these columns to the columns names in value.name

library( data.table )
melt( setDT( b), 
      id.vars = c("TestID"), 
      measure.vars = patterns( ".*Measure", ".*Count"), 
      variable.name = "Machine", 
      value.name = c("Measure", "Count") )

#    TestID Machine Measure Count
# 1:  10006       1      11    14
# 2:  10007       1      23    27
# 3:  10006       2      16    24
# 4:  10007       2      32    35

Benchmarking

# Unit: microseconds
#       expr      min        lq      mean    median        uq        max neval
# data.table  182.265  200.3405  245.0403  234.0825  264.6605   3137.967  1000
# reshape    1757.575 1840.7240 2180.4957 1938.3335 2011.3895 100429.392  1000
# tidyverse  6173.203 6430.7830 6925.6034 6569.9670 6763.9810  29722.714  1000
Wimpel
  • 26,031
  • 1
  • 20
  • 37
3

And since nobody else likes reshape() any longer, I'll add an answer:

reshape(
  setNames(b, sub("^.+(\\d+)(.+)$", "\\2.\\1", names(b))),
  idvar="TestID", direction="long", varying=-1, timevar="Machine"
)

#        TestID Machine Measure Count
#10006.1  10006       1      11    14
#10007.1  10007       1      23    27
#10006.2  10006       2      16    24
#10007.2  10007       2      32    35

It'll never compete with data.table for pure speed, but brief testing on 2M rows using:

bbig <- b[rep(1:2,each=1e6),]
bbig$TestID <- make.unique(as.character(bbig$TestID))

#data.table -  0.06 secs
#reshape    -  2.30 secs
#tidyverse  - 56.60 secs
thelatemail
  • 91,185
  • 12
  • 128
  • 188