1

I am struggling to do something apparently easy.

So I have a list of codes and their recoding.

> head(codesTv)

  X5000 TV.Diary.Event
1  5001           Play
2  5002   Drama Series
3  5003    Other Drama
4  5004           Film
5  5005      Pop Music
6  5006         Comedy

Then I have a vector that needs to be recoded named ttest.

> head(as.data.frame(ttest))
                ttest
1        SPITTING IMA
2                5999
3        KRAMERVSKRAM
4                NEWS
5           BROOKSIDE
6             NOTHING

What I need is to simply recode from the codesTv the values that need to be recoded.

But the only way I found to do this is this cumbersome code :

ttest [ ttest %in% codesTv$X5000 ] = codesTv$TV.Diary.Event [ match(ttest [ttest %in% codesTv$X5000], codesTv$X5000) ] 

Would someone have a simpler idea of doing this ?

data

ttest = c("SPITTING IMA", "5999", "KRAMERVSKRAM", "NEWS", "BROOKSIDE", 
"NOTHING", "NOTHING", "BROOKSIDE", "5004", "5004", "5999", "YANKS", 
"5999", "5999", "5999", "5999", "\"V\"", "GET FRESH", "5999", 
"5999", "HEIDI", "FAME", "SAT  SHOW", "5021", "BLUE PETER", "V", 
"EASTENDERS", "WORLD  CUP", "GRANDSTAND", "SPORT", "WORLD CUP", 
"BLUE PETER", "WORLD CUP", "HORIZON", "REGGIEPERRIN", "5999", 
"BROOKSIDE", "HNKYTNK MAN", "5999", "5999")

 codesTv = structure(list(X5000 = c("5001", "5002", "5003", "5004", "5005", 
"5006", "5007", "5008", "5009", "5010", "5011", "5012", "5013", 
"5014", "5015", "5016", "5017", "5019", "5020", "5021", "5022", 
"5023", "5888", "5999"), TV.Diary.Event = c("Play", "Drama Series", 
"Other Drama", "Film", "Pop Music", "Comedy", "Chat Show", "Quiz/Panel Game", 
"Cartoon", "Special L/E Event", "Classical Music", "Contemporary Music", 
"Arts", "News", "Politics", "Consumer Affairs", "Spec Current Affairs", 
"Documentary", "Religious Affairs", "Sport", "Childrens TV", 
"Party Political", "Continuation Event", "Non-event (Missing)"
)), .Names = c("X5000", "TV.Diary.Event"), row.names = c(NA, 
-24L), class = "data.frame")
giac
  • 4,261
  • 5
  • 30
  • 59

1 Answers1

2

The OP's solution should work fine. Here's one other way:

library(data.table)

# confirm that there is overlap
intersect(ttest, codesTv$X5000) # "5999" "5004" "5021"  

# replace values in ttest
setDT(list(X5000=ttest))[codesTv, X5000 := i.TV.Diary.Event, on="X5000"]

# confirm that the values were overwritten
intersect(ttest, codesTv$X5000) # character(0)

Stole this idea from @eddi. This should be memory efficient, since we are modifying ttest by reference instead of making a copy.

Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
  • nice solution - just a question though : does my cumbersome way works right ? – giac Oct 07 '15 at 16:59
  • 1
    @giacomoV I think so. It looks right and also has an empty intersection afterwards. – Frank Oct 07 '15 at 17:00
  • by the way @Frank - I need to cite your help on several things. What's the best way to do it you think ? – giac Oct 07 '15 at 17:04
  • @giacomoV I suspect that it is not necessary to cite online sources like SO in most journals' and grad schools' style guides. Personally, I just put citation comments in the code, containing a user name and a link to a post. It looks like SO does not have built-in citation tools, but you can see what it looks like on math.SE here: http://meta.stackexchange.com/questions/49760/citing-stack-overflow-discussions – Frank Oct 07 '15 at 17:27