VLookup type method in R

Question

I have a df with thousands of tickers for different future contracts. They have the abbreviated name (which appears later) and the long name (which I want to have in other df)

full_list <- structure(
  list(
    Ticker = c("AC", "AIC", "BBS", "BO", "C", "DF"),
    Long_Name = c("Ethanol -- CBOT", "DJ UBS Commodity Index -- CBOT", "South American Soybeans -- CBOT", "Soybean Oil -- CBT", "Corn -- CBT", "Dow Jones Industrial Average -- CBT")
  ),
  .Names = c("Ticker", "Long_Name"),
  row.names = c(NA, 6L),
  class = "data.frame"
)

This df has the list that I receive daily. I have to go and lookup the abbreviated name and match it to the long name.

replace <- structure(
  list(
    Type = c("F", "F", "F", "F", "F", "F"),
    Location = c("US", "US", "US", "US", "US", "US"),
    Symbol = c("BO", "C", "DF", "AIC", "AC", "BBS"),
    Month = c("V13", "U13", "U13", "U13", "U13", "U13")
  ),
  .Names = c("Type", "Location", "Symbol", "Month"),
  row.names = c(NA, 6L),
  class = "data.frame"
)

What I am looking for R to do is take replace$Symbol column and find those values in full_list$Ticker column and add a column, replace$Long_Name, where the respective full_list$Long_Name is copied over. Hope this makes sense. I understand the column names are difficult to follow.

This would be an easy VLookup in excel but I have a script I will use on a daily basis almost completed in R.

score 16 · Accepted Answer · answered Sep 05 '13 at 20:26

16

merge them:

> merge(full_list, replace, by.x="Ticker", by.y="Symbol")
  Ticker                           Long_Name Type Location Month
1     AC                     Ethanol -- CBOT    F       US   U13
2    AIC      DJ UBS Commodity Index -- CBOT    F       US   U13
3    BBS     South American Soybeans -- CBOT    F       US   U13
4     BO                  Soybean Oil -- CBT    F       US   V13
5      C                         Corn -- CBT    F       US   U13
6     DF Dow Jones Industrial Average -- CBT    F       US   U13

answered Sep 05 '13 at 20:26

Joshua Ulrich

173,410
32
338
418

Will this method work if there are duplicate "tickers"? will it use the "long_name" more than once. – Tim Sep 05 '13 at 21:06
@Tim: Yes: `merge(full_list, rbind(replace,transform(replace,Month="Z13")), by.x="Ticker", by.y="Symbol")`. – Joshua Ulrich Sep 05 '13 at 21:10

dayne · Answer 2 · 2013-09-05T20:31:45.527

You could use match - which gives the index of where the first argument falls in the second argument. For example:

arg1 <- c("red","blue")
arg2 <- c("blue","red")

> match(arg1,arg2)
[1] 2 1

Then just create a new column in your replace data frame (note - you should call it something else, because replace is actually a function in r) using the full_list data frame with the matched symbols.

replace$Long_Name <- full_list$Long_Name[match(replace$Symbol,full_list$Ticker)]

> replace
  Type Location Symbol Month                           Long_Name
1    F       US     BO   V13                  Soybean Oil -- CBT
2    F       US      C   U13                         Corn -- CBT
3    F       US     DF   U13 Dow Jones Industrial Average -- CBT
4    F       US    AIC   U13      DJ UBS Commodity Index -- CBOT
5    F       US     AC   U13                     Ethanol -- CBOT
6    F       US    BBS   U13     South American Soybeans -- CBOT

Is it possible to match on two criteria? Like if `Long_Name` varied by year, is it possible to add a `$year` argument in `match`? — Rafael, Apr 28 '17 at 12:20
@RafaelMartins I would look at the `data.table` package. You can do merging on multi-column keys that is easy and very efficient. You can also use `merge` and supply a vector of column names to the 'by' parameters. See: http://stackoverflow.com/questions/6709151/how-do-i-combine-two-data-frames-based-on-two-columns — dayne, Apr 28 '17 at 14:17

score 6 · Answer 3 · answered Sep 05 '13 at 20:33

If it's a big data set you may benefit from an environment lookup:

library(qdap)
replace$Long_Name <- lookup(replace$Symbol, full_list)

## > replace
##   Type Location Symbol Month                           Long_Name
## 1    F       US     BO   V13                  Soybean Oil -- CBT
## 2    F       US      C   U13                         Corn -- CBT
## 3    F       US     DF   U13 Dow Jones Industrial Average -- CBT
## 4    F       US    AIC   U13      DJ UBS Commodity Index -- CBOT
## 5    F       US     AC   U13                     Ethanol -- CBOT
## 6    F       US    BBS   U13     South American Soybeans -- CBOT

score 6 · Answer 4 · edited May 23 '17 at 12:34

6

Obligatory data.table answer

library(data.table)
full_list <- data.table(full_list, key='Symbol')
replace <- data.table(replace, key='Ticker')

replace[full_list]

FWIW on a data set above about 1e5 rows a keyed data.table will be significantly faster than the other approaches listed (except for the qdap version, I haven't tried that). merge timings can be found here

edited May 23 '17 at 12:34

Community

1
1

answered Sep 05 '13 at 21:16

Justin

42,475
9
93
111

Jellio · Answer 5 · 2013-09-05T20:49:50.293

If you're using a large data set, you might run into some time/memory issues, if that's the case, try this:

require(plyr)

colnames(replace)<-c("Type", "Location", "Ticker", "Month")

Full<-join(full_list, replace, by = "Ticker", type = "left", match = "all")

> Full
  Ticker                           Long_Name Type Location Month
1     AC                     Ethanol -- CBOT    F       US   U13
2    AIC      DJ UBS Commodity Index -- CBOT    F       US   U13
3    BBS     South American Soybeans -- CBOT    F       US   U13
4     BO                  Soybean Oil -- CBT    F       US   V13
5      C                         Corn -- CBT    F       US   U13
6     DF Dow Jones Industrial Average -- CBT    F       US   U13

Although its more than just a one line solution, merge can take some time to process with larger dataframes. Also, the plyr package can be your best friend.

VLookup type method in R

5 Answers5

Linked

Related