-1

I am trying to read the data but as such regular expression is not helping out. Sample data is below:

Country Cross Transaction (ID: 12345)
Country Capital (Id: 23445)
Cross Country Trade Relation (Id:47639)

All the above values are in different documents, so while documents parsing I need to capture "Country Cross Transaction","Country Capital","Cross Country Trade Relation",

I can't specify how many words I need, but I need everything before (ID: xxxxx) term

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

1 Answers1

0

You can use str_remove for this. Given that the string you want to remove starts with (.

library(stringr)
string <- "Country Cross Transaction (ID: 12345)"
string %>% str_remove(pattern = "\\(.*")

EDIT

Say your document has the following content and is saved in test.txt, you could do something like this:

test.txt

Country Cross Transaction (ID: 12345)
This should not be fetched
Country Capital (ID: 23445)
Cross Country Trade Relation (ID:47639)

Code

library(plyr)
output <- list()
document <- readLines("test.txt") %>% as.list()
for (line in seq_along(document)){
  if (str_detect(document[[line]], "\\(ID:")){
    output[[line]] <- str_remove(document[[line]], pattern = "\\(.*")
  }
}

output %>% compact
MKR
  • 1,620
  • 7
  • 20
  • But the concern is to detect the string in the documents, and the only way is: I need to take all the info before (Id:xxxxx)... That is main concern – Sunil Raperia Feb 25 '20 at 09:41
  • I don't get it. What is your expected output for the string that I used as an example? – MKR Feb 25 '20 at 09:42
  • I need to detect (Id:xxx), and from the line, I need to capture all the previous info from that line in the document – Sunil Raperia Feb 25 '20 at 09:42
  • My expected Output should be, country cross transaction, country capital, Cross Country Trade Relation – Sunil Raperia Feb 25 '20 at 09:44
  • So, some of your lines don't have (id:xxx) and you don't want to capture those lines, but if there is an (id:xxx), you want to extract everything before (id:xxx)? – MKR Feb 25 '20 at 09:44
  • So basically in the first document, Country Cross Transaction (ID: 12345) is present. I need to capture only Country Cross Transaction from it. In the second document Country Capital (Id: 23445) is present, I need to capture Country Capital from it. – Sunil Raperia Feb 25 '20 at 09:46
  • So, I need to find (Id =xxx) first, and then need to capture the remaining value(or the information on the left of (Id=xxx) from that line only... – Sunil Raperia Feb 25 '20 at 09:49