0

I have this dataframe:

ID      Description    
1     Tree fell on car 
2     Tree was uprooted
3     While cutting tree, it came down
4     Tree came down

I am trying to search a column in a dataframe for weather words. I am doing this by using multiple GREPL functions seperated by an 'OR'. However, I want to combine two grepl functions to say "If the description has THIS WORD and THIS WORD, but not THIS WORD, it is weather". If you look at the dataframe above, one can assume that "Tree came down" would be classified as weather, but "While cutting tree, it came down" is non-weather related.

The code that I tried from other stack overflow answers is :

Data$Type<-ifelse(grepl(' Tree|^Tree|- 
Tree|:Tree',Data$DESCRIPTION,ignore.case=TRUE)& 
grepl('^[^Cutting]*[Feel|Fell|Fall|Up Rooted|Uprooted|Came Down| Down|Knocked 
Onto|Caused Damage] 
[^Cutting]*$',Data$DESCRIPTION,ignore.case=TRUE)), "weather", "Not 
Classified")

But this is not working. I tried:

Data$Type<-ifelse(grepl(' Tree|^Tree|- 
Tree|:Tree',Data$DESCRIPTION,ignore.case=TRUE)& grepl('Feel|Fell|Fall|Up 
Rooted|Uprooted|Came Down| Down|Knocked Onto|Caused 
Damage',Data$DESCRIPTION,ignore.case=TRUE) & 
!grepl('Cutting',Data$DESCRIPTION,ignore.case=TRUE)), "Weather", "Not 
Classified")

I am expecting this outcome:

ID      Description                      Type
1     Tree fell on car                   "Weather"
2     Tree was uprooted                  "Weather"
3     While cutting tree, it came down   "Non-Weather"
4     Tree came down                     "Weather"

But these do not work. Thank you

Reagan
  • 49
  • 1
  • 5
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 27 '18 at 15:12

2 Answers2

0

Since you have only two cases (Weather and Non-Weather), I think it would be sufficient to use grepl for one only:

df$Type <- sapply(df$Description, 
                  function(x) ifelse(grepl(pattern = 'Tree|fell|^cutting',x = x),'Weather','Non-Weather'))

[1] "Weather"     "Weather"     "Non-Weather" "Weather"   
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • 1
    So if I add the "^" in front of cutting, it will search for every description that has "Tree" and "Fell", but not "Cutting"? – Reagan Jul 27 '18 at 15:42
  • When I do this i get, "unexpected '|'" – Reagan Jul 27 '18 at 15:43
  • seems strange, because I get the same output as expected. are you using the exact code? – YOLO Jul 27 '18 at 15:46
  • I fixed it! My mistake, I forgot to add the function part. However, it is not changing the classification, I am still having the "cutting" descriptions classified as weather. – Reagan Jul 27 '18 at 15:52
  • I am still using the two grepl functions because I don't want to have every "Tree" as weather related. – Reagan Jul 27 '18 at 15:55
  • 1
    can you add more rows in your data frame? please ensure that they are diverse cases so that I can ensure that no problem happens in the answer. – YOLO Jul 27 '18 at 17:01
0

I ended up just doing things like this to make sure "Ice" is a weather word, but "Maker".

ifelse(grepl('Ice$| Ice |,Ice |^Ice | Ice,',Data$DESCRIPTION,ignore.case=TRUE) & 
!grepl('Maker',Data$DESCRIPTION,ignore.case=TRUE))
Reagan
  • 49
  • 1
  • 5