0

(This question is not answered at the linked Vlookup thread)

I'm looking for a way to replace values in one dataframe (DF2) with values from another (DF1), where DF2 contains duplicate entries, but I want to keep those duplicates.


As a made-up example:

Let's say I've got 2 dataframes. One, called DF1, contains correct numbers for hotel umbrellas across different dates.

We have line items for Hilton_A on 5/20, 5/25, 6/01, and associated umbrella #'s. Same with Hilton_B and Hilton_C.

Here's dput for DF1, the reference dataframe:

structure(list(Date = structure(c(15852, 15859, 15852, 15859, 
15852, 15859, 15852), class = "Date"), Hotel = structure(c(1L, 
1L, 2L, 2L, 3L, 3L, 4L), .Label = c("Hilton_A", "Hilton_B", "Hilton_C", 
"Hilton_D"), class = "factor"), Umbrellas = c(9340L, 6401L, 9089L, 
7716L, 5542L, 5565L, 8158L), datename = c("2013-05-27_Hilton_A", 
"2013-06-03_Hilton_A", "2013-05-27_Hilton_B", "2013-06-03_Hilton_B", 
"2013-05-27_Hilton_C", "2013-06-03_Hilton_C", "2013-05-27_Hilton_D"
)), .Names = c("Date", "Hotel", "Umbrellas", "datename"), row.names = c(NA, 
-7L), class = "data.frame")

DF2 contains information for a bunch of other hotels on different dates, as well as information for the Hiltons in DF1. The problem is, the umbrella #'s in DF2 are wrong for the Hiltons, and I need to replace them with the #'s from DF1.

Here's the dput for DF2, with the incorrect Hilton numbers, along with some other data that I don't want to touch:

structure(list(Date = structure(c(15845, 15852, 15859, 15852, 
15859, 15845, 15859, 15845, 15845, 15852, 15845, 15845, 15882
), class = "Date"), Hotel = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 3L, 4L, 5L, 6L, 7L), .Label = c("Hilton_A", "Hilton_B", 
"Hilton_C", "Hilton_D", "RedRoof_A", "RedRoof_D", "Sheraton_D"
), class = "factor"), Umbrellas = c(263L, 287L, 258L, 110L, 234L, 
212L, 265L, 542L, 81L, 51L, 162L, 232L, 493L), datename = c("2013-05-20_Hilton_A", 
"2013-05-27_Hilton_A", "2013-06-03_Hilton_A", "2013-05-27_Hilton_A", 
"2013-06-03_Hilton_A", "2013-05-20_Hilton_B", "2013-06-03_Hilton_B", 
"2013-05-20_Hilton_B", "2013-05-20_Hilton_C", "2013-05-27_Hilton_D", 
"2013-05-20_RedRoof_A", "2013-05-20_RedRoof_D", "2013-06-26_Sheraton_D"
)), .Names = c("Date", "Hotel", "Umbrellas", "datename"), row.names = c(NA, 
-13L), class = "data.frame")

Normally this would work:

DF2$Umbrellas<- replace(DF2$Umbrellas, DF2$datename%in% DF1$datename, DF1$Umbrellas)

(where "datename" is just a concatenation of hotel and date, since the same hotel has information across multiple dates (so we can "unique=ify" the list))

But DF2 actually has multiple observations for each hotel and date that I want to keep (i.e., Hilton_A on 5/27 shows up 2 times in DF2).

So when I try to replace the Umbrella #'s from DF1 into DF2, I get the error message:

Warning message:
In replace(DF2$Umbrellas, DF2$hoteldatename %in% DF1$hoteldatename ,  :
  number of items to replace is not a multiple of replacement length

And the numbers are all wrong.

Does anybody know what's happening here and how I might go about getting the number in DF1 to replace all of the applicable observations in DF2?

Metrics
  • 15,172
  • 7
  • 54
  • 83
Marc Tulla
  • 1,751
  • 2
  • 20
  • 34
  • 1
    It seems that you already have two data frames (`DF1` & `DF2`). Why can't you please share a sample of them? See [here](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) how to `dput` or `reproduce` a sample your data. People on SO are willing to spend their time helping you with your actual coding problem. But most often reluctant to (re-)create your dummy data. That is your duty. Thanks. – Henrik Oct 26 '13 at 19:09
  • Metrics, that link is where I started with this question, but it doesn't answer what I'm trying to do. Henrik, I had no idea about dput -- sorry about that! I'll edit the original post... – Marc Tulla Oct 26 '13 at 20:04

1 Answers1

1
df3$Umbrellas<-df1$Umbrellas[match(df2$datename,df1$datename)]
> df3
         Date      Hotel Umbrellas              datename
1  2013-05-20   Hilton_A        NA   2013-05-20_Hilton_A
2  2013-05-27   Hilton_A      9340   2013-05-27_Hilton_A
3  2013-06-03   Hilton_A      6401   2013-06-03_Hilton_A
4  2013-05-27   Hilton_A      9340   2013-05-27_Hilton_A
5  2013-06-03   Hilton_A      6401   2013-06-03_Hilton_A
6  2013-05-20   Hilton_B        NA   2013-05-20_Hilton_B
7  2013-06-03   Hilton_B      7716   2013-06-03_Hilton_B
8  2013-05-20   Hilton_B        NA   2013-05-20_Hilton_B
9  2013-05-20   Hilton_C        NA   2013-05-20_Hilton_C
10 2013-05-27   Hilton_D      8158   2013-05-27_Hilton_D
11 2013-05-20  RedRoof_A        NA  2013-05-20_RedRoof_A
12 2013-05-20  RedRoof_D        NA  2013-05-20_RedRoof_D
13 2013-06-26 Sheraton_D        NA 2013-06-26_Sheraton_D

df3$Umbrellas<-ifelse(is.na(df3$Umbrellas),df2$Umbrellas,df3$Umbrellas)
> df3
         Date      Hotel Umbrellas              datename
1  2013-05-20   Hilton_A       263   2013-05-20_Hilton_A
2  2013-05-27   Hilton_A      9340   2013-05-27_Hilton_A
3  2013-06-03   Hilton_A      6401   2013-06-03_Hilton_A
4  2013-05-27   Hilton_A      9340   2013-05-27_Hilton_A
5  2013-06-03   Hilton_A      6401   2013-06-03_Hilton_A
6  2013-05-20   Hilton_B       212   2013-05-20_Hilton_B
7  2013-06-03   Hilton_B      7716   2013-06-03_Hilton_B
8  2013-05-20   Hilton_B       542   2013-05-20_Hilton_B
9  2013-05-20   Hilton_C        81   2013-05-20_Hilton_C
10 2013-05-27   Hilton_D      8158   2013-05-27_Hilton_D
11 2013-05-20  RedRoof_A       162  2013-05-20_RedRoof_A
12 2013-05-20  RedRoof_D       232  2013-05-20_RedRoof_D
13 2013-06-26 Sheraton_D       493 2013-06-26_Sheraton_D
Metrics
  • 15,172
  • 7
  • 54
  • 83
  • Thanks Metrics, this works as advertised. But what about those NA's? I want to keep the original numbers I had (as in, if something isn't covered under the match operation, it should be left as-is (i.e, Hilton_A on 5/20 should still be 263) – Marc Tulla Oct 27 '13 at 01:14
  • No problem. See my updates. – Metrics Oct 27 '13 at 01:42