-1

VLOOKUP in R Programming Language - Problem Statement: I need to perform a VLOOKUP in R, for monthly reports. I currently am doing it in Excel, however, I would like to do it in R.

Background:

1.) df1 = 50,000 Rows / 115 Columns – The column I want to match with: "account_number"

 A.) df1$account_number

2.) df2 = 11,000 lines / 2 columns – The column I want to match against: "account_number_1"

 B.) df2$account_number_1

3.) df1$flag – Where I want a “Y” or “N” for the match of df1$account_number == df2$account_number_1

C.) df1$flag

Research Attempts:

• I've been searching, attempting, and trying to code what I thought would be easy, however, I do not know why it is so difficult to do.

• My searches, and trial-and-error have resulted to Merge, Match, Duplicate and other methods that I cannot remember, and none of them give me exactly what I need. Any assistance on what I am doing wrong, and/or how to complete a VLOOKUP for the desired result is much appreciated! Thanks!

I thought I would get it through this StackOverflow link, however, I still cannot get the desired results:

How to do vlookup and fill down (like in Excel) in R?

Here are my searches and script attempts:

Merge: # COMBINE 2 Dataframes:

Combine two data frames by rows (rbind) when they have different sets of columns by-rows-rbind-when-they-have-different-sets-of-columns

(FYI: cbind, left_join, inner_join - all give me errors, however, "smartbind" WORKS!!!)

df_merge <- smartbind(df1, df2)

My attempt to TRUE/FALSE – cbind and rbind # https://www.youtube.com/watch?v=NFaK1Qn4u3A - Logic Statements (TRUE/FALSE) and cbind and rbind Command in R (R Tutorial 1.9)

 df_merge$flag <- df_merge$account_number == df_merge$account_number_1

https://www.youtube.com/watch?v=LKoknpFOEUw - Search "in R how to match values"

 duplicated(df_merge) which(duplicated(df_merge))

https://www.youtube.com/watch?v=eVEx_pBEkRI

df_merge$flag <- any(df_merge$account_number == df_merge$account_number_1)

any(df_merge$account_number == df_merge$account_number_1)

 if (which(duplicated(df_merge$account_number == df_merge$account_number_1))) {   df_merge$flag <- "Y" } else if (which(duplicated(df_merge$account_number != df_merge$account_number_1)) {   status <- "N" }

I attempted these links, and got lost:

How to do vlookup and fill down (like in Excel) in R?

http://stat.ethz.ch/R-manual/R-devel/library/base/html/match.html

 match(x, table, nomatch = NA_integer_, incomparables = NULL)

 x %in% table

 df_match <- match(df_merge$account_number , df_merge$ account_number_1, nomatch=NA_integer_, incomparables=NULL)
T V
  • 3
  • 4
  • 1
    Look at `join` functions in `dplyr` package. They are great for this kind of applications. – Shree Sep 24 '18 at 20:31
  • 4
    Without [a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) it's hard to say for sure, but I suspect this is a duplicate of https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – alistaire Sep 24 '18 at 20:34
  • Is all you want `df1$flag <- df1$account_number %in% df2$account_number_1`? If not explain what is wrong with it. – Dason Sep 25 '18 at 14:34

3 Answers3

2

You don't need to merge anything.

df1$flag <- df1$account_number %in% df2$account_number_1

This will create a column of TRUE/FALSE values. That should be sufficient for anything you want to do. If you must have Yes/No values:

df1$flag <- ifelse(df1$account_number %in% df2$account_number_1, "Yes", "No")
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • Hello Hong Ooi!!! That's it! It worked perfect, the TRUE/FALSE changed to Y/N values. Checked it against my old method of wrestling with Excel, and the results matched! Have been struggling w/these huge Excel reports I inherited. So far, what takes me 1-2 days to get 250+mb down to 30+mb, should now take me a couple hours to prep my data, and w/this last VLOOKUP function as part of my script, it takes 4 mins:10 secs. – T V Sep 28 '18 at 00:27
  • I can't Thank everyone enough for their assistance, & how powerfully simple the function worked. I was sort of close, however, not close enough. I need to further my R tutorials and experience. Thanks again! I know I'll be back here on StackOverflow, as it has been my #1 search for all the weird data wrestling and shortcutting I'm trying to pull off. :) – T V Sep 28 '18 at 00:27
  • Excellent. If it solved your problem, consider upvoting and/or accepting this answer. – Hong Ooi Sep 28 '18 at 01:28
0

If you want to add a flag in the first dataset, then you can do something like this:

LOS_X <- c(1:10,21,22,33,44,55,66,69,77,78,88,89,90,91,92)
LOS_Y <- c(1:20)

LOS_X <- data.frame(LOS_X)
LOS_Y <- data.frame(LOS_Y)

names(LOS_X) <- "X"
names(LOS_Y) <- "Y"

LOS_Y$flag <- "Y" #To add to the dataset

LOS_new <- merge(LOS_X, LOS_Y, by.x = "X", by.y = "Y", all.x = TRUE) #You should have the column flag in the first dataset
LOS_new$flag <- ifelse(is.na(LOS_new$flag), "N", LOS_new$flag)
SmitM
  • 1,366
  • 1
  • 8
  • 14
  • Hello all, Thanks for the assistance. I've tried the above mentioned from Dason, Shree and SmitM. I even looked at the link from alistaire. Here's what I attempted: – T V Sep 27 '18 at 18:31
  • Here are the results: X flag 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 Y – T V Sep 27 '18 at 18:34
  • Sorry for the sloppy commenting here, I'm a newbie to StackOverflow, let me edit the above posts. – T V Sep 27 '18 at 18:43
  • I am going to update my answer with your efforts and minor changes. Try it out – SmitM Sep 27 '18 at 18:59
  • I have been at this for a couple months now, driving me crazy, and just when I posted my questions, I had to work on those reports all week, Thanks all for your patience. – T V Sep 27 '18 at 19:05
0

Here's a demonstration of doing this using dplyr and magrittr (for %>%) packages

library(dplyr)
library(magrittr)

df1 <- data.frame(AccountID = 1:3, AccountName = c("Superman", "Batman", "Joker"))
df2 <- data.frame(AccountID = 1:2, AccountBalance = c(1000, 1000000000))

# joining (or vlookup) to get account balance for all accounts

left_join(df1, df2, by = "AccountID") %>% mutate(Flag = ifelse(is.na(AccountBalance), "N", "Y"))

#   AccountID AccountName AccountBalance Flag
# 1         1    Superman          1e+03    Y
# 2         2      Batman          1e+09    Y
# 3         3       Joker             NA    N

Hope this is what you were looking for. If not, share some sample data and desired output.

Shree
  • 10,835
  • 1
  • 14
  • 36
  • Hi Shree, Thanks for your answer, I thought I was close with that, however, I would need "AccountID", correct? If so, I do not have matching Account ID's for reference. I also need to match the rows of "AccountName" with "AccountBalance", and if there is no match for that "AccountName" Row, then it should show an "N". Hope this makes sense. Thanks! – T V Sep 27 '18 at 19:02