2

I am trying to (efficiently) rearrange a dataframe in R.

My data are experimental data gathered over four different experiments from two populations of participants (1 or 0, i.e. disease and control groups).

Example dataframe:

Subject type    Experiment 1    Experiment 2    Experiment 3    Experiment 4
           0             4.6             2.5             1.4             5.3
           0             4.7             2.4             1.8             5.1
           1             3.5             1.2             5.6             7.5
           1             3.8             1.7             6.2             8.1

I would like to rearrange my dataframe so that it is structured as follows (the reason being, it makes it easier for me to run functions on the data when they are structured like this in R):

Subject type    Experiment    Measure
           0             1        4.6
           0             2        2.5
           0             3        1.4
           0             4        5.3
           0             1        4.7
           0             2        2.4
           0             3        1.8
           0             4        5.1
           1             1        3.5
           1             2        1.2
           1             3        5.6
           1             4        7.5
           1             1        3.8
           1             2        1.7
           1             3        6.2
           1             4        8.1

As you can see, what has happened is that each subject now occupies four rows; each row now pertains to a single measurement rather than a single subject. This is (at least for now) more convenient for me to plug into R functions. Maybe in time I'll figure out a way of skipping this step altogether, but I'm new to R and this seems like the best way of doing things.

Anyway - the question is, what is the most efficient way of doing this dataframe transformation? At present I'm doing it like this:

# Input dframe1
dframe1 <- structure(list(subject_type = c(0L, 0L, 1L, 1L), experiment_1 = c(4.6, 
4.7, 3.5, 3.8), experiment_2 = c(2.5, 2.4, 1.2, 1.7), experiment_3 = c(1.4, 
1.8, 5.6, 6.2), experiment_4 = c(5.3, 5.1, 7.5, 8.1)), .Names = c("subject_type", 
"experiment_1", "experiment_2", "experiment_3", "experiment_4"
), class = "data.frame", row.names = c(NA, -4L))

# Create a matrix
temporary_matrix <- matrix(ncol=3, nrow=nrow(dframe1) * 4)
colnames(temporary_matrix) <- c("subject_type","experiment","measure")

# Rearrange dframe1 so that a different measure is in each column
for(i in 1:nrow(dframe1)) {
  temporary_matrix[i*4-3,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-3,"experiment"] <- 1
  temporary_matrix[i*4-3,"measure"] <- dframe1$experiment_1[i]
  temporary_matrix[i*4-2,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-2,"experiment"] <- 2
  temporary_matrix[i*4-2,"measure"] <- dframe1$experiment_2[i]
  temporary_matrix[i*4-1,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-1,"experiment"] <- 3
  temporary_matrix[i*4-1,"measure"] <- dframe1$experiment_3[i]
  temporary_matrix[i*4-0,"subject_type"] <- dframe1$subject_type[i]
  temporary_matrix[i*4-0,"experiment"] <- 4
  temporary_matrix[i*4-0,"measure"] <- dframe1$experiment_4[i]
}

# Convert matrix to a data frame
dframe2 <- data.frame(temporary_matrix)

# NOTE: For some reason, this has to be converted back into a double (at some point above it becomes a factor)
dframe2$measure <- as.double(as.character(dframe2$measure))

Surely there's a better way of doing this?!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
CaptainProg
  • 5,610
  • 23
  • 71
  • 116

3 Answers3

5

Using the reshape2 package, this is very straightforward.

library(reshape2)

# assuming your data.frame is called `dat`
melt(dat, id.vars=c("Subject type"))

you can make it perrtier if you'd like:

newdat <- melt(dat, id.vars=c("Subject type"), variable.name="Experiment", value.name="Measure")

# remove "experiment " from the names, and convert to numeric
newdat$Experiment <- as.numeric(gsub("Experiment\\s*", "", as.character(newdat$Experiment)))
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
4

Base reshape method:

Get the data:

dframe1 <- structure(list(subject_type = c(0L, 0L, 1L, 1L), experiment_1 = c(4.6, 
4.7, 3.5, 3.8), experiment_2 = c(2.5, 2.4, 1.2, 1.7), experiment_3 = c(1.4, 
1.8, 5.6, 6.2), experiment_4 = c(5.3, 5.1, 7.5, 8.1)), .Names = c("subject_type", 
"experiment_1", "experiment_2", "experiment_3", "experiment_4"
), class = "data.frame", row.names = c(NA, -4L))

Set your variables to stack:

expandvars <- paste('experiment',1:4,sep='_')

Reshape away!

dfrm1res <- reshape(
                   dframe1,
                   idvar="subject_type",
                   varying=list(expandvars),
                   v.names=c("value"),
                   direction="long",
                   new.row.names=1:16
                    )

Result:

> dfrm1res
   subject_type time value
1             0    1   4.6
2             0    1   4.7
3             1    1   3.5
4             1    1   3.8
5             0    2   2.5
6             0    2   2.4
7             1    2   1.2
8             1    2   1.7
9             0    3   1.4
10            0    3   1.8
11            1    3   5.6
12            1    3   6.2
13            0    4   5.3
14            0    4   5.1
15            1    4   7.5
16            1    4   8.1
thelatemail
  • 91,185
  • 12
  • 128
  • 188
4
data.frame(subject_type=dframe1$subject_type, stack(dframe1[2:5] )  )
   subject_type values          ind
1             0    4.6 experiment_1
2             0    4.7 experiment_1
3             1    3.5 experiment_1
4             1    3.8 experiment_1
5             0    2.5 experiment_2
6             0    2.4 experiment_2
7             1    1.2 experiment_2
8             1    1.7 experiment_2
9             0    1.4 experiment_3
10            0    1.8 experiment_3
11            1    5.6 experiment_3
12            1    6.2 experiment_3
13            0    5.3 experiment_4
14            0    5.1 experiment_4
15            1    7.5 experiment_4
16            1    8.1 experiment_4

Or use base reshape ( although it appears my preferences are different than thelatemal's use of it.):

dframe1$subject=1:4
reshape(dframe1, direction="long", idvar=c("subject_type", "subject"),
                 varying=2:5, sep="_", v.names="exp_value")
 #--------------------------
      subject_type subject time exp_value
0.1.1            0       1    1       4.6
0.2.1            0       2    1       4.7
1.3.1            1       3    1       3.5
1.4.1            1       4    1       3.8
0.1.2            0       1    2       2.5
0.2.2            0       2    2       2.4
1.3.2            1       3    2       1.2
1.4.2            1       4    2       1.7
0.1.3            0       1    3       1.4
0.2.3            0       2    3       1.8
1.3.3            1       3    3       5.6
1.4.3            1       4    3       6.2
0.1.4            0       1    4       5.3
0.2.4            0       2    4       5.1
1.3.4            1       3    4       7.5
1.4.4            1       4    4       8.1
IRTFM
  • 258,963
  • 21
  • 364
  • 487