Easy way to add observations to an existing dataframe?

Question

I have an existing dataframe to which I would like to add updated observations. I can identify these updated observations by an ID and a time point variable. I've tried removing the outdated observations from the existing dataframe and then tried using the merge() function to merge with a dataframe with just the updated observations, but I get duplicated columns. Is there an elegant way to do this (particularly using dplyr?)

Here's an example of what I'd like to do: Let's say I have a df, called practice

practice

ID     Time  score 1 score 2 
 1   hour 1        3       7
 1   hour 2        4       2
 2   hour 1        3       4

Let's say I want to change the score 1 variable for third observation (for which ID==2 and Time=="hour 1"), from 3 to 5.

What I've tried is making a new dataframe, called practice1:

ID     Time  score 1  score 2 
 1   hour 1        3        7
 1   hour 2        4        2

Which removes the third observation, and then creating another new dataframe with the corrected observations, called practice2:

   ID     Time  score 1  score 2 
    2   hour 1        3        4

I've then tried to do something like this:

Practice3 <- merge(practice2, practice1, by = "ID", all = T)

However, I'll get duplicate columns, and when I try to include multiple variables in the by= statement in the merge function, I get this error:

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

Which may be due to the longitudinal nature of the data?

Thanks

Please provide some example data http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — neilfws, Apr 19 '17 at 23:28

thc · Answer 1 · 2017-04-20T19:34:18.083

1

You can do in place substitution on a variable in a data frame. E.g.:

practice[["Score 1"]][practice$ID == 2 & practice$Time=="hour 1"] <- 5

edited Apr 20 '17 at 19:34

answered Apr 20 '17 at 00:36

thc

9,527
1
24
39

1

Or `practice[practice$ID == 2 & practice$Time == "hour 1", "score 1"] <- 5` to simplify slightly – thelatemail Apr 20 '17 at 00:51
Also, you need to reference `practice$Time` instead of just `Time`. – thelatemail Apr 20 '17 at 00:54
Thanks, this might be just what I need. I thought that the double brackets were only used when dealing with lists. – baldirony Apr 20 '17 at 00:57
@baldirony - data.frames are lists. – thelatemail Apr 20 '17 at 01:02
@thelatemail Oh, so are dfs just lists of object = 1? – baldirony Apr 20 '17 at 01:04
@baldirony - data.frames are a list where each column is a list item of the same length. Compare `as.list(mtcars)` to `mtcars` – thelatemail Apr 20 '17 at 01:07
Yes, data.frames are a "derived class" of lists, and inherits all the list methods: https://www.programiz.com/r-programming/inheritance – thc Apr 20 '17 at 19:35

score 0 · Answer 2 · answered Apr 20 '17 at 00:43

0

Here's an update using dplyr::mutate. Note: I renamed columns to remove spaces.

library(dplyr)
practice %>% 
  mutate(score1 = ifelse(ID == 2 & Time == "hour 1", 5, score1))

answered Apr 20 '17 at 00:43

neilfws

32,751
5
50
63

score 0 · Answer 3 · answered Apr 20 '17 at 02:10

If you already have the new data in a data.frame, you can use anti_join to take out old cases and then just use bind_rows to add the new cases:

library(dplyr)

practice <- read.table(text = 'ID     Time  score1 score2 
                                1    hour1       3      7
                                1    hour2       4      2
                                2    hour1       3      4', 
                       header = TRUE, stringsAsFactors = FALSE)

practice2 <- read.table(text = 'ID     Time  score1  score2 
                                 2    hour1       5       5', 
                        header = TRUE, stringsAsFactors = FALSE)

practice %>% 
    anti_join(practice2, by = c('ID', 'Time')) %>% 
    bind_rows(practice2)

#>   ID  Time score1 score2
#> 1  1 hour2      4      2
#> 2  1 hour1      3      7
#> 3  2 hour1      5      5

However, that won't work well if practice2 is missing columns, in which case you can use coalesce to overwrite old values with new ones:

left_join(practice, practice2, by = c('ID', 'Time')) %>% 
    mutate(score1 = coalesce(score1.y, score1.x), 
           score2 = coalesce(score2.y, score2.x)) %>% 
    select(-contains('.'))

#>   ID  Time score1 score2
#> 1  1 hour1      3      7
#> 2  1 hour2      4      2
#> 3  2 hour1      5      5

Easy way to add observations to an existing dataframe?

3 Answers3