0

I'm having some difficulty executing a conditional operation on two dataframes. For problem illustration, I have three variables: Price, State, and Item, which are stored in a data frame (data1) with those column names. I use ddply to generate a dataframe (data2) that includes columns State and Item, and the average price(or some other function) for that State/Item combination.

What I then want to do is fill in a column in the originating data frame(i.e. a simple prediction vector), where the column's value is the mean value for a given observations combination of State and Item in data1. (e.g., if an observation in data1 has state="Arizona" and item="pen", I then want to retrieve the average price stored in data2 that corresponds to that state/item combination, and insert it into the column.)

Thank you for any help.

Thomas
  • 43,637
  • 12
  • 109
  • 140
user2187656
  • 935
  • 1
  • 7
  • 8
  • It's a good idea to provide a reproducible example illustrating your question. You will usually get a quick answer if you supply enough information and at least a sample of your data. – Chargaff Mar 19 '13 at 18:37
  • Hi there! Please make your post reproducible by having a look at [**How to make a great reproducible example**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for us to help you. Thank you. – Arun Mar 19 '13 at 19:54

1 Answers1

1

The plyr package comes with a great little function called join. You can use this to complete your task.

join(dat1,dat2, by=c('State','Item'))

Review ?join to see the different types of joins possible. I'm pretty sure you want a left join.

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255