0

I have one file named traffic that looks like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140

and a set of separate files that tell me what the numerical code in each column refers to. For example, I have a file called city that looks like this:

Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN

How do I use the codes in the city file to replace the coded values in traffic? The output would look like this:

city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
Blank     6      23       1       1     47 1082010      8     50      1     150
NA        6      25       1       1     73 1042010      4      0      1    2200
NA        6     652       1       4     71 3282010     28      1      3    1901
ACAMPO    6    1289       1       2     71 7152010     15     40      7    2140
ADIN      6    1289       1       3     71 7152010     15     40      7    2140
ACAMPO    6    1289       1       4     71 7152010     15     40      7    2140

All the solutions I've seen using recode or likewise involve explicitly stating which value corresponds to which as in the cars packages example: recode(x, "c(1,2)='A'; else='B'") What I'd like to do, instead, is to have the strings in city$Definition replace the numerical codes in traffic$city if city$Code matches traffic$city.

I could do traffic<-merge(traffic, city, by.x = "city", by.y = "Code") and then traffic$city<-traffic$Definition and then traffic$Definition<-NULL, but it just seems like this would be a common enough operation that there would be a convenient function for doing this.

Bonus points for a solution which allows me to specify multiple columns to be replaced by values from multiple files without repeating myself too much.

William Gunn
  • 2,925
  • 8
  • 26
  • 22
  • The match solution offered certainly appears to be the approach I would have attempted. Your request of a solution in the last sentence appears far too vague for effort. Why not post another question that offers a starting point to make it more concrete? – IRTFM Jul 07 '12 at 21:50
  • Let me make the last part more clear - I've got more files than just `city`. I'd love to be able to recode city and county and accmon and so on with their respective values from their respective files, without having to write a separate match statement for each one. Probably more effort than it's worth, though. – William Gunn Jul 07 '12 at 22:13
  • Wouldn't this be an obvious case for `merge`? Details (as previously suggested would be needed to be sure. – IRTFM Jul 08 '12 at 01:38
  • As I mentioned in my original question, merge doesn't give me the behavior I want. The match() approach below does what I want, but I have to re-do it for every column I want to re-code, which is laborious. I was just hoping since this is surely a common activity, that there was a convenience function for this. – William Gunn Jul 08 '12 at 02:35
  • This is basically telling you the same thing with a little more detail: http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right – Rob Jul 08 '12 at 13:25
  • @WilliamGunn To your last point: Since R can save each file into a variable as a dataframe object, you can use sqldb in the same way you would for a relational database. You just need to include them in the from statement when using sqldf. Notice I added the [county] table with the alias of 'co' so we can map your county codes to the names. I then added co.county to the sql statement. library(sqldf) t <- file1 c <- file2 co <- file3 sqldf("SELECT c.city, t.vnumber t.pnumber, co.county from traffic t, city c, county co where t.city=c.Code, t.county=co.countycode") – Rob Jul 08 '12 at 13:38

2 Answers2

3

this maybe what you want

traffic<-read.table(header=T,text="city statenum casenum vnumber pnumber county accdate accday accmin accmon acctime
-1       6      23       1       1     47 1082010      8     50      1     150
0        6      25       1       1     73 1042010      4      0      1    2200
0        6     652       1       4     71 3282010     28      1      3    1901
1        6    1289       1       2     71 7152010     15     40      7    2140
4        6    1289       1       3     71 7152010     15     40      7    2140
1        6    1289       1       4     71 7152010     15     40      7    2140")

city<-read.table(header=T,text="Code     Definition
-1       Blank
0        NA
1        ACAMPO
2        ACTON
3        ADELANTO
4        ADIN")

traffic$city<-city$Definition[match(traffic$city,city$Code)]

but I may have mistaken your meaning

or much more fun

library(sqldf)
sqldf("SELECT c.Definition,t.statenum,t.casenum,t.vnumber,t.pnumber,t.county,t.accdate,t.accday,t.accmin,t.accmon from traffic t, city c where t.city=c.Code")

I would advocate sqldf and SQL type SELECTS as maybe answering your last part. I cant comment on how it performs with large dataframes however.

EDIT: I would like to have SELECT c.Definition as city..... here but it throws an error

shhhhimhuntingrabbits
  • 7,397
  • 2
  • 23
  • 23
  • Aha, I was on the trail of a solution using match. You could also write it `traffic$city<-city$Definition[city$Code %in% traffic$city]` right? Any suggestions towards doing multiple variable replacement from multiple files? – William Gunn Jul 07 '12 at 21:48
  • Thanks, I haven't gotten started with sqldf, but I know many people end up there, especially as their data gets bigger, so should probably just get on with it ;-) – William Gunn Jul 07 '12 at 23:20
2

Perhaps the easiest way is to rename the columns in your lookup tables so that the merge operation just "works":

names(city) <- c("city", "City Name")
merge(traffic, city)

  city statenum casenum vnumber pnumber county accdate
1   -1        6      23       1       1     47 1082010
2    0        6      25       1       1     73 1042010
3    0        6     652       1       4     71 3282010
4    1        6    1289       1       2     71 7152010
5    1        6    1289       1       4     71 7152010
6    4        6    1289       1       3     71 7152010
  accday accmin accmon acctime City Name
1      8     50      1     150     Blank
2      4      0      1    2200      <NA>
3     28      1      3    1901      <NA>
4     15     40      7    2140    ACAMPO
5     15     40      7    2140    ACAMPO
6     15     40      7    2140      ADIN

Since this is the structure that one would expect in relational databases, this should make it easy if you then wish use sqldf or data.table.

Andrie
  • 176,377
  • 47
  • 447
  • 496