1

I have 10000 or more texts in one column of a csv file_1. In another csv file_2 I have some words which I need to search in file_1, and need to record in next column if text contain that words. need to search all the words in all the texts many a times single text can contains multiple words from file_2, want all the words in next column to text with comma separated. case matching also can be one challenge, and I want exact match only: Example: file_1 File_1 file_2

Disney, Hollywood

Desired Output: Desired Output

oguz ismail
  • 1
  • 16
  • 47
  • 69
  • It would be great if you could supply a minimal reproducible example to go along with your question. Something we can work from and use to show you how it might be possible to answer your question. That way others can also befit form your question, and the accompanying answer, in the future. You can have a look at [this SO post](http://stackoverflow.com/help/mcve) on how to make a great reproducible example in R. Also, please describe what you have tried so far or what research you have done to answer your own question. – Eric Fail Jan 15 '16 at 22:07
  • I have uploaded the images of both the data files. Shall I attach data in csv file or paste here. – Anurag Sharma Jan 17 '16 at 11:05
  • Attached/paste the data using `dput()`. Explained very well in [this a bit longer post](http://stackoverflow.com/a/5963610/1305688). Don't forget to make your example minimal :) – Eric Fail Jan 17 '16 at 13:28
  • 1
    > dput(droplevels(head(df1, 2))) structure(list(text = c("RT @dnceparaguay: \"Lamariabeck: #jonas #breakfast #hollywoodlife #celebrity #nickjonas #joejonas #nobigdeal #hollywood\" #DNCENews https://t…", "Lemme Clear This Up .. I Have NO Hoes . It's just me dput(droplevels(head(df2, 3))) structure(list(Name = c("hollywood","celebrity","Disney")), .Names = "Name", row.names = c(NA, 4L), class = "data.frame") – Anurag Sharma Jan 17 '16 at 16:30

1 Answers1

1

I assume you will read the files into two separate data frames such as df1 and df2.

You can subset your search values from df2 as needed, or turn it into one large vector to search through using:

  df2 <- as.vector(t(df2))

Then create a new column "Match" on df1 using containing the items matched in df2.

  for (i in 1:nrow(df1)) {
  df1$Match[i] <- paste0(df2[which(df2 %in df1$SearchColumn[i])],collapse = ",") 
  }

This loops from row 1 to the max number of rows in df1, finds the indices of matches in df2 using the where function and then calls those values and pastes them together separated by a comma. I'm sure someone else can find a way to achieve this without a loop but I hope this works for you.

  • Good start, this would show IF there is a match, not which ones were matched which seems to be what he's looking for. – Brandon Bertelsen Jan 15 '16 at 23:52
  • Thanks all for your inputs. Special Thanks to "Kahlan M." for efforts. But When I am trying this code this is giving me following error: "> for (i in 1:nrow(df1)) { + df1$Match[i] <- paste0(df2[which(df2 %in df1$SearchColumn[i])],collapse = ",") Error: unexpected input in: "for (i in 1:nrow(df1)) { df1$Match[i] <- paste0(df2[which(df2 %in df1$SearchColumn[i])],collapse = ",") " > } Error: unexpected '}' in "}" one more thing in df2 length of vector and in df1 length of content column are also not same. Please same. – Anurag Sharma Jan 17 '16 at 10:04
  • > dput(droplevels(head(df1, 2))) structure(list(text = c("RT @dnceparaguay: \"Lamariabeck: #jonas #breakfast #hollywoodlife #celebrity #nickjonas #joejonas #nobigdeal #hollywood\" #DNCENews https://t…", "Lemme Clear This Up .. I Have NO Hoes . It's just me dput(droplevels(head(df2, 3))) structure(list(Name = c("hollywood","celebrity","Disney")), .Names = "Name", row.names = c(NA, 4L), class = "data.frame") – Anurag Sharma Jan 17 '16 at 16:30
  • Can someone please help me on above question? Same thing we are doing using VBA macro, I want to do this in R. – Anurag Sharma Jan 24 '16 at 16:41