Data Extraction while Image Processing in R

Question

I am trying to read the data but as such regular expression is not helping out. Sample data is below:

Country Cross Transaction (ID: 12345)
Country Capital (Id: 23445)
Cross Country Trade Relation (Id:47639)

All the above values are in different documents, so while documents parsing I need to capture "Country Cross Transaction","Country Capital","Cross Country Trade Relation",

I can't specify how many words I need, but I need everything before (ID: xxxxx) term

MKR · Answer 1 · 2020-02-25T09:59:40.260

0

You can use str_remove for this. Given that the string you want to remove starts with (.

library(stringr)
string <- "Country Cross Transaction (ID: 12345)"
string %>% str_remove(pattern = "\\(.*")

EDIT

Say your document has the following content and is saved in test.txt, you could do something like this:

test.txt

Country Cross Transaction (ID: 12345)
This should not be fetched
Country Capital (ID: 23445)
Cross Country Trade Relation (ID:47639)

Code

library(plyr)
output <- list()
document <- readLines("test.txt") %>% as.list()
for (line in seq_along(document)){
  if (str_detect(document[[line]], "\\(ID:")){
    output[[line]] <- str_remove(document[[line]], pattern = "\\(.*")
  }
}

output %>% compact

edited Feb 25 '20 at 09:59

answered Feb 25 '20 at 09:39

MKR

1,620
7
20

But the concern is to detect the string in the documents, and the only way is: I need to take all the info before (Id:xxxxx)... That is main concern – Sunil Raperia Feb 25 '20 at 09:41
I don't get it. What is your expected output for the string that I used as an example? – MKR Feb 25 '20 at 09:42
I need to detect (Id:xxx), and from the line, I need to capture all the previous info from that line in the document – Sunil Raperia Feb 25 '20 at 09:42
My expected Output should be, country cross transaction, country capital, Cross Country Trade Relation – Sunil Raperia Feb 25 '20 at 09:44
So, some of your lines don't have (id:xxx) and you don't want to capture those lines, but if there is an (id:xxx), you want to extract everything before (id:xxx)? – MKR Feb 25 '20 at 09:44
So basically in the first document, Country Cross Transaction (ID: 12345) is present. I need to capture only Country Cross Transaction from it. In the second document Country Capital (Id: 23445) is present, I need to capture Country Capital from it. – Sunil Raperia Feb 25 '20 at 09:46
So, I need to find (Id =xxx) first, and then need to capture the remaining value(or the information on the left of (Id=xxx) from that line only... – Sunil Raperia Feb 25 '20 at 09:49

Data Extraction while Image Processing in R

1 Answers1