3

I'm trying to combine multiple data.frame()s in R in a similar fashion to rbind(), but when the new data.frame() is created, I'd like to know which of the original data.frame()s the data came from.

For example, if I have the following data:

Right eye

Vision    Colour    Prescription
  0.30    blue             -1.00
 -0.10    blue             +1.50
 (etc)    (etc)             (etc)

Left eye

Vision    Colour    Prescription
  0.00    blue             +1.00
  0.10    brown            -2.50
 (etc)    (etc)             (etc)

... I would like to end up with a data.frame() that looks like this:

Vision    Colour    Prescription      Eye
  0.30    blue             -1.00      Right
 -0.10    blue             +1.50      Right
  0.00    blue             +1.00      Left
  0.10    brown            -2.50      Left

melt() collapses the data to a long format, which I don't want. Using rbind() doesn't provide any clue as to where the data originally came from. What I need to do is have the extra column created that refers to the original source of the data (i.e. right and left in the example above).

I know this would be possible by adding an 'eye' column to each of the original data.frame()s and then using rbind(), but I wonder if there is a neater solution available?

CaptainProg
  • 5,610
  • 23
  • 71
  • 116
  • 1
    You could use the `.id` argument in `bind_rows()` from `dplyr` - `bind_rows(df1, df2, .id = "id")` – Steven Beaupré Dec 20 '16 at 15:17
  • 1
    Fyi, it's best to make reproducible data that can simply be copy pasted by answerers (to more easily test potential solutions). Some guidance: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 For an example, see allinr's answer below, too, though it should probably use `set.seed`. – Frank Dec 20 '16 at 17:24

2 Answers2

2

Should you simply want a numeric identifier for each data.frame you could do:

library(dplyr)
bind_rows(Right, Left, .id = "Eye")

Which gives:

 Eye Vision Colour Prescription
1   1    0.3   blue         -1.0
2   1   -0.1   blue          1.5
3   2    0.0   blue          1.0
4   2    0.1  brown         -2.5

You could also put your data.frames in a list and use the names as identifier.

From the documentation:

When .id is supplied, a new column of identifiers is created to link each row to its original data frame. The labels are taken from the named arguments to bind_rows(). When a list of data frames is supplied, the labels are taken from the names of the list. If no names are found a numeric sequence is used instead.

Something like:

dat <- c("Right", "Left")
lst <- mget(dat)
bind_rows(lst, .id = "Eye")

Which gives:

    Eye Vision Colour Prescription
1 Right    0.3   blue         -1.0
2 Right   -0.1   blue          1.5
3  Left    0.0   blue          1.0
4  Left    0.1  brown         -2.5
Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77
1
# Generate random data
set.seed(42)
Right = setNames(object = data.frame(replicate(3,sample(0:1,3,rep=TRUE))),
                 nm = c('Vision', 'Color', 'Prescription'))
Left = setNames(object = data.frame(replicate(3,sample(0:1,3,rep=TRUE))),
                nm = c('Vision', 'Color', 'Prescription'))

rbind(cbind(Right, Eye = "Right"), cbind(Left, Eye = "Left"))
#  Vision Color Prescription   Eye
#1      1     1            1 Right
#2      1     1            0 Right
#3      0     1            1 Right
#4      1     1            1  Left
#5      0     0            1  Left
#6      1     0            0  Left
d.b
  • 32,245
  • 6
  • 36
  • 77