How do I use values from one dataframe to recode values in another?

Question

I have one file named traffic that looks like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140

and a set of separate files that tell me what the numerical code in each column refers to. For example, I have a file called city that looks like this:

Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN

How do I use the codes in the city file to replace the coded values in traffic? The output would look like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
Blank     6      23       1       1     47 1082010      8     50      1     150
NA        6      25       1       1     73 1042010      4      0      1    2200
NA        6     652       1       4     71 3282010     28      1      3    1901
ACAMPO    6    1289       1       2     71 7152010     15     40      7    2140
ADIN      6    1289       1       3     71 7152010     15     40      7    2140
ACAMPO    6    1289       1       4     71 7152010     15     40      7    2140

All the solutions I've seen using recode or likewise involve explicitly stating which value corresponds to which as in the cars packages example: recode(x, "c(1,2)='A'; else='B'") What I'd like to do, instead, is to have the strings in city$Definition replace the numerical codes in traffic$city if city$Code matches traffic$city.

I could do traffic<-merge(traffic, city, by.x = "city", by.y = "Code") and then traffic$city<-traffic$Definition and then traffic$Definition<-NULL, but it just seems like this would be a common enough operation that there would be a convenient function for doing this.

Bonus points for a solution which allows me to specify multiple columns to be replaced by values from multiple files without repeating myself too much.

The match solution offered certainly appears to be the approach I would have attempted. Your request of a solution in the last sentence appears far too vague for effort. Why not post another question that offers a starting point to make it more concrete? — IRTFM, Jul 07 '12 at 21:50
Let me make the last part more clear - I've got more files than just `city`. I'd love to be able to recode city and county and accmon and so on with their respective values from their respective files, without having to write a separate match statement for each one. Probably more effort than it's worth, though. — William Gunn, Jul 07 '12 at 22:13
Wouldn't this be an obvious case for `merge`? Details (as previously suggested would be needed to be sure. — IRTFM, Jul 08 '12 at 01:38
As I mentioned in my original question, merge doesn't give me the behavior I want. The match() approach below does what I want, but I have to re-do it for every column I want to re-code, which is laborious. I was just hoping since this is surely a common activity, that there was a convenience function for this. — William Gunn, Jul 08 '12 at 02:35
This is basically telling you the same thing with a little more detail: http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right — Rob, Jul 08 '12 at 13:25
@WilliamGunn To your last point: Since R can save each file into a variable as a dataframe object, you can use sqldb in the same way you would for a relational database. You just need to include them in the from statement when using sqldf. Notice I added the [county] table with the alias of 'co' so we can map your county codes to the names. I then added co.county to the sql statement. library(sqldf) t <- file1 c <- file2 co <- file3 sqldf("SELECT c.city, t.vnumber t.pnumber, co.county from traffic t, city c, county co where t.city=c.Code, t.county=co.countycode") — Rob, Jul 08 '12 at 13:38

shhhhimhuntingrabbits · Accepted Answer · 2012-07-07T22:03:30.003

this maybe what you want

traffic<-read.table(header=T,text="city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140")

city<-read.table(header=T,text="Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN")

traffic$city<-city$Definition[match(traffic$city,city$Code)]

but I may have mistaken your meaning

or much more fun

library(sqldf)
sqldf("SELECT c.Definition,t.statenum,t.casenum,t.vnumber,t.pnumber,t.county,t.accdate,t.accday,t.accmin,t.accmon from traffic t, city c where t.city=c.Code")

I would advocate sqldf and SQL type SELECTS as maybe answering your last part. I cant comment on how it performs with large dataframes however.

EDIT: I would like to have SELECT c.Definition as city..... here but it throws an error

Aha, I was on the trail of a solution using match. You could also write it `traffic$city<-city$Definition[city$Code %in% traffic$city]` right? Any suggestions towards doing multiple variable replacement from multiple files? — William Gunn, Jul 07 '12 at 21:48
Thanks, I haven't gotten started with sqldf, but I know many people end up there, especially as their data gets bigger, so should probably just get on with it ;-) — William Gunn, Jul 07 '12 at 23:20

score 2 · Answer 2 · answered Jul 08 '12 at 06:23

Perhaps the easiest way is to rename the columns in your lookup tables so that the merge operation just "works":

names(city) <- c("city", "City Name")
merge(traffic, city)

  city statenum casenum vnumber pnumber county accdate
1   -1        6      23       1       1     47 1082010
2    0        6      25       1       1     73 1042010
3    0        6     652       1       4     71 3282010
4    1        6    1289       1       2     71 7152010
5    1        6    1289       1       4     71 7152010
6    4        6    1289       1       3     71 7152010
  accday accmin accmon acctime City Name
1      8     50      1     150     Blank
2      4      0      1    2200      <NA>
3     28      1      3    1901      <NA>
4     15     40      7    2140    ACAMPO
5     15     40      7    2140    ACAMPO
6     15     40      7    2140      ADIN

Since this is the structure that one would expect in relational databases, this should make it easy if you then wish use sqldf or data.table.

How do I use values from one dataframe to recode values in another?

2 Answers2