1

I have the following data frame with temperature and pressure data from 3 sensors:

df <- data.frame(
         Test = 1:10, 
         temperature_sensor1=rnorm(10,25,5), 
         temperature_sensor2 = rnorm(10,25,5), 
         temperature_sensor1 = rnorm(10,25,5), 
         pressure_sensor1 = rnorm(10,10,2),
         pressure_sensor2 = rnorm(10,10,2), 
         pressure_sensor3 = rnorm(10,10,2))

How can I reshape it into the long format, such that each row has temperature and pressure data for a single sensor

Test Sensor Temperature Pressure

Thanks!

jazzurro
  • 23,179
  • 35
  • 66
  • 76
Sasha
  • 5,783
  • 8
  • 33
  • 37

1 Answers1

2

Here are a couple of approaches:

1) dplyr/tidyr Convert df to long form using gather and then separate the generated variable column by underscore into two columns. Finally convert from long to wide based on the variable column (which contains the strings pressure and temperature and value column (which contains the number):

library(dplyr)
library(tidyr)
df %>% 
   gather("variable", "value", -Test) %>% 
   separate(variable, c("variable", "sensor"), sep = "_") %>%
   spread(variable, value)

2) Can use reshape. No packages needed. The line marked optional removes the row names. It could be omitted if that does not matter.

unames <- grep("_", names(df), value = TRUE)

varying <- split(unames, sub("_.*", "", unames))
sensors <- unique(sub(".*_", "", unames))

long <- reshape(df, dir = "long", varying = varying, v.names = names(varying),
         times = sensors, timevar = "sensor")
rownames(long) <- NULL # optional

If df has fixed columns then we could simplify the above a bit by hard coding varying and sensors using these definitions in place of the more complex but general code above:

varying <- list(pressure = 2:4, temperature = 5:7)
sensors <- c("sensor1", "sensor2", "sensor3")

Note: To create df reproducibly we must set the seed first because random numbers were used so to be definite we created df like this. Also note that in the question temperature_sensor1 was used on two columns and we assumed that the second occurrence was intended to be temperature_sensor3.

set.seed(123)
df <- data.frame(
         Test = 1:10, 
         temperature_sensor1=rnorm(10,25,5), 
         temperature_sensor2 = rnorm(10,25,5), 
         temperature_sensor3 = rnorm(10,25,5), 
         pressure_sensor1 = rnorm(10,10,2),
         pressure_sensor2 = rnorm(10,10,2), 
         pressure_sensor3 = rnorm(10,10,2))
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341