0

Please find below 5 sentences and have a look on word "further". I need to frame a logic in such a way that i need to pick two words before "further" word and pick two words after "further" word.

For instance observe below five sentences, For sentence 1- I need to pick two words before further "to" & "advance" and there are no words after "further". For sentence 2 - I need to pick "one" & "morning" after and there are no words before "further" For sentence 3 - I need to pick "Then" and "morning" & "hills" as two words after and one word before as there are no two words before "further" For sentence 4 - I need to pick "mount" & refused and also "advance" as there are two words before "further" and one word after "further" For sentence 5 - "morning", "mills" & "refused", "to" as two wordsbefore and two words after "Further"

Any help is appriciated - the logic am looking should be from R Language

1)Then one morning Mills refused to mount refused to advance further
2)further one morning Mills refused to mount refused to advance 
3)Then further morning Mills refused to mount refused to advance 
4)Then one morning Mills refused to mount refused further advance 
5)Then one morning Mills further refused to mount refused to advance 
thrinadhn
  • 1,673
  • 22
  • 32
  • 1
    Have a look of [**this question**](https://stackoverflow.com/questions/48727546/extract-n-words-around-defined-term-multicase/48730236#48730236). This offers you what you are after. – jazzurro Feb 25 '18 at 13:07

3 Answers3

2

Here's one way with stringr and dplyr:

library(stringr)
library(dplyr)

x %>% 
  str_extract(regex('(?:[^ ]+ ){0,2}further(?: [^ ]+){0,2}', ignore_case = TRUE)) %>% 
  str_remove(regex("further", ignore_case = TRUE)) %>% 
  str_squish()

[1] "to advance"               "one morning"              "Then morning Mills"      
[4] "mount refused advance"    "morning Mills refused to"

Data:

x <- c("Then one morning Mills refused to mount refused to advance further",
       "further one morning Mills refused to mount refused to advance",
       "Then further morning Mills refused to mount refused to advance",
       "Then one morning Mills refused to mount refused further advance", 
       "Then one morning Mills further refused to mount refused to advance")
tyluRp
  • 4,678
  • 2
  • 17
  • 36
1

Plenty of ways to do this; here is one possibility in base R:

# Your sample strings
ss <- c("Then one morning Mills refused to mount refused to advance further",
"further one morning Mills refused to mount refused to advance",
"Then further morning Mills refused to mount refused to advance",
"Then one morning Mills refused to mount refused further advance",
"Then one morning Mills further refused to mount refused to advance")

sapply(ss, function(x) {
    v <- unlist(strsplit(x, " "));
    idx <- grep("further", v);
    idx <- c(idx - 2, idx - 1, idx + 1, idx + 2);
    idx <- idx[idx > 0 & idx <= length(v)];
    return(v[idx]);
})
#$`Then one morning Mills refused to mount refused to advance further`
#[1] "to"      "advance"
#
#$`further one morning Mills refused to mount refused to advance`
#[1] "one"     "morning"
#
#$`Then further morning Mills refused to mount refused to advance`
#[1] "Then"    "morning" "Mills"
#
#$`Then one morning Mills refused to mount refused further advance`
#[1] "mount"   "refused" "advance"
#
#$`Then one morning Mills further refused to mount refused to advance`
#[1] "morning" "Mills"   "refused" "to"

Explanation: strsplit every sentence into words; find the location of "further" and select and return the two preceding and succeeding words (if they exist); sapply the whole procedure to every sentence.


Update

Or the word "further" should be included in the output:

sapply(ss, function(x) {
    v <- unlist(strsplit(x, " "));
    idx <- grep("further", v);
    idx <- c(idx - 2, idx - 1, idx, idx + 1, idx + 2);
    idx <- idx[idx > 0 & idx <= length(v)];
    return(v[idx]);
})
#$`Then one morning Mills refused to mount refused to advance further`
#[1] "to"      "advance" "further"
#
#$`further one morning Mills refused to mount refused to advance`
#[1] "further" "one"     "morning"
#
#$`Then further morning Mills refused to mount refused to advance`
#[1] "Then"    "further" "morning" "Mills"
#
#$`Then one morning Mills refused to mount refused further advance`
#[1] "mount"   "refused" "further" "advance"
#
#$`Then one morning Mills further refused to mount refused to advance`
#[1] "morning" "Mills"   "further" "refused" "to"
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • thanks for the your valuable suggestions. Here need to output along with further like a -->to advance further --> further one morning -->Then further morning Mills --> mount refused further advance --> – thrinadhn Feb 27 '18 at 05:36
  • @thrinadhn Sorry, I don't understand what you're saying. Do you want your output to include the word "further"? In that case, please see my updated solution. If this is a *different* issue, open a new question, and close this one by accepting an answer (place the check mark next to the solution that provides the best answer to your question). – Maurits Evers Feb 27 '18 at 06:08
  • Thanks Maurits Evers! Its Working – thrinadhn Feb 27 '18 at 06:47
1

Another way:

library(stringr)

get_values <- function(str)
{
    val <- str_extract(str, "([^\\s]+\\s){0,2}further(\\s[^\\s]+){0,2}")
    val <- str_trim(gsub(pattern = 'further', replacement = '', x = val))
    return (val)
}

# you can further unlist this to get answer as a vector instead of a list
answer <- lapply(text, get_values)

[[1]]
[1] "to advance"

[[2]]
[1] "one morning"

[[3]]
[1] "Then  morning Mills"

[[4]]
[1] "mount refused  advance"

[[5]]
[1] "morning Mills  refused to"
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • thanks for the your valuable suggestions. Here need to output along with further like a -->to advance further --> further one morning -->Then further morning Mills --> mount refused further advance --> – thrinadhn Feb 27 '18 at 05:36
  • Thanks Manish Saraswat! Its Working – thrinadhn Feb 27 '18 at 06:47