0

I'm a student and new here.Im trying to do text analysis for my project. So I'm trying to copy this row of data to another dataframe when this word appear in this sentence.

*df1*
ID    Text

1     This apple is delicious and I like this apple a lot.
2     This orange is nice and sweet. 
3     This apple is too sweet and I dislike this kind of apple. 
4     This apple is worth the price, definitely will purchase it again from this store. 

As you can see in ID 1, 3 and 4. The word "apple" appears twice except ID 4 appears once.

My objective is no matter how many times did the same word appear once or more than once, it will copy that row of data to another dataframe.

Result that I want

*df2*
ID    Text

1     This apple is delicious and I like this apple a lot.
2     This apple is too sweet and I dislike this kind of apple. 
3     This apple is worth the price, definitely will purchase it again from this store. 

If possible, please teach me how to remove "ID" column and the column header "Text". As this is Text Analysis , I don't need ID column and I'm not sure the column header will affect my analysis.

Thanks alot!

Edward
  • 10,360
  • 2
  • 11
  • 26
Nick
  • 1
  • 3

1 Answers1

1

We can use grep to identify the word 'apple' to subset

subset(df1, grepl('apple', Text))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could you tell me more about subset? And how can I use this and copy to df2? – Nick Jul 31 '20 at 22:41
  • And for grep, I need to install the package right? – Nick Jul 31 '20 at 22:42
  • @Nick. No package needed. Just use this code exactly as it is. you could add `df2 <- subset(df1, grepl('\\bapple\\b', Text))` – Onyambu Jul 31 '20 at 23:05
  • @Onyambu. Is 'apple' or '\\bapple\\b' won't it become bapple? And would like to check if example word with space inbetween like "apple pie" does it work? – Nick Aug 01 '20 at 00:09
  • And could how can i use two word like "apple" and "delicious"? – Nick Aug 01 '20 at 00:10
  • @Nick "\\bapple\\b" means do not match things like pineapple. We just need apple. so "\\b" means a word boundary "\\bapple\\b" will not match bapple but 'apple' will match 'bapple'. Adding the "\\b" mean we only need the whole word apple and not a composite word of apple – Onyambu Aug 01 '20 at 00:20