0

I've done some searching and could not find a solution to this, other packages/methods are welcome. I am extracting a series of job titles from sentences in order to build up a timeline of people's careers from their biographies. I'm using the stringr package to extract these job titles the problem is that they do not come out in the order in which they appear in the sentence but the order that they are in my list. Here's a simplified example below:

sentence <- "He was a chief executive officer, chairman of the board and 
president"
Job <- list("chairman of the board","chief executive officer", "president")
str_extract_all(sentence,unlist(Jobb))

The output of this is:

[[1]]
[1] "chairman of the board"

[[2]]
[1] "chief executive officer"

[[3]]
[1] "president"

Ideally these job titles would be in the order that they appear (i.e chairman of the board and chief executive officer swap positions) I can't just change the order of the Job list since every sentence will be different. Thanks in advance for the help

Dyem
  • 9
  • 1
  • 2
  • 8

1 Answers1

2

You can supply the possible titles as one single regex instead of multiple different. Concatenate them with the regex "or" which is |:

> str_extract_all(sentence, paste0(unlist(Job), collapse = "|"))
[[1]]
[1] "chief executive officer" "chairman of the board"   "president" 
AEF
  • 5,408
  • 1
  • 16
  • 30