1

My data:

Topic   Content
Sunny   "Today is a sunny day." 
John     He listened and walked away."Should I visit Dr.Mary today?"
May      May is playing alone.

I want to extract everything that is inside "everything in this quotation".

Also, I would like to create another column to give the sentence the name, such as if the content has the keyword of "Sunny", the new column will input as "Sunny", if the content has the keyword of "visit", "Hospital" will be inputted in the new column for the row.

I would like to get the output below:

Topic  Content                           Tag
Sunny  "Today is a sunny day!"           Sunny 
John   "Should I visit Dr.Mary today?"   Hospital

dput:

structure(list(Topic = structure(c(3L, 1L, 2L), .Label = c("John", 
"May", "Sunny"), class = "factor"), Content = structure(c(3L, 
1L, 2L), .Label = c("He listened and walked away.\"Should I visit Dr.Mary   today?\"", 
"May is playing alone.", "Today is a sunny day. "), class = "factor")),   .Names =   c("Topic", 
"Content"), class = "data.frame", row.names = c(NA, -3L))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294

1 Answers1

1

You may try this,

df <-structure(list(Topic = structure(c(3L, 1L, 2L), .Label = c("John", 
                                                           "May", "Sunny"), class = "factor"), Content = structure(c(3L, 
                                                                                                                     1L, 2L), .Label = c("He listened and walked away.\"Should I visit Dr.Mary   today?\"", 
                                                                                                                                         "May is playing alone.", "Today is a sunny day. "), class = "factor")),   .Names =   c("Topic", 
                                                                                                                                                                                                                                "Content"), class = "data.frame", row.names = c(NA, -3L))
x <- df[grepl('"', df$Content),]
x$Content <- sub('.*"(.*)".*', "\\1", x$Content)
x$Tag <- ifelse(grepl("visit",x$Content), "Hospital", ifelse(grepl("sunny",x$Content), "Sunny", ""))
x
#   Topic                         Content      Tag
#   2  John Should I visit Dr.Mary   today? Hospital
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • 1
    If the sentence contain both "visit" and "sunny", it will only choose the first one to tag. Can make it both tag? Like Hospital & Sunny in x$Tag @AvinashRaj – poppp Nov 03 '15 at 03:03
  • 1
    try `x$Tag <- ifelse(grepl("visit.*sunny|sunny.*visit",x$Content), "Hospital & Sunny", ifelse(grepl("sunny",x$Content), "Sunny", ifelse(grepl("visit",x$Content), "Hospital", "")))` – Avinash Raj Nov 03 '15 at 04:20