I have following column 'checks' in my data frame 'B' which has input statments in different rows. These statements have a variable 'abc' , and corresponding to them there is a value entry as well. The entry done are manual and are not coherent for each entry. I have to extract just 'abc' and followed by its 'value'
< B$checks
rows Checks
[1] there was no problem reported measures abc-96 xyz 450 327bbb11869 xyz 113 aaa 4 poc 470 b 3 surveyor issue
[2] abc(107 to 109) xyz 115 jbo xyz 104 optim
[3] problemm with caller abc 95 19468 4g xyz 103 91960 1 Remarks new loc reqd is problem
[4] abc_107 xyz 116 dor problem
[5] surevy done , no approximation issues abc 103 xyz 109 crux xyz 104
[6] ping test ok abc(86 rxlevel 84
[7] field is clean , can be used to buiild the required set up abc-86 xyz 94 Digital DSL No Building class Residential Building Type Multi
[8] abc 89 xyz 99 so as the user has no problem , check ping test
Expected output
rows Variable Value
[1] abc 96
[2] abc 107
[3] abc 95
[4] abc 107
[5] abc 103
[6] abc 86
[7] abc 86
[8] abc 89
I tried the following using references under similar queries
usisng str_match
library(stringr)
m1 <- str_match(B$checks, "abc.*?([0-200.]{1,})") # value is between 0 to 200
which yielded some thing like below
row var value
1 abc-96 xyz 450 0
2 abc(10 10
3 abc 95 1 1
4 abc_10 10
5 abc 10 10
6 NA NA
7 NA NA
8 NA NA
Then I tried the following
B$Checks <- gsub("-", " ", B$Checks)
B$Checks <- gsub("/", " ", B$Checks)
B$Checks <- gsub("_", " ", B$Checks)
B$Checks <- gsub(":", " ", B$Checks)
B$Checks <- gsub(")", " ", B$Checks)
B$Checks <- gsub("((((", " ", B$Checks)
B$Checks <- gsub(".*abc", "abc", B$Checks)
B$Checks <- gsub("[[:punct:]]", " ", B$Checks)
regexp <- "[[:digit:]]+"
m <- str_extract(B$Checks, regexp)
m <- as.data.frame(m)
and was able to get the "expected output",
But now I am looking for following
1) Simpler set of commands or way to extract the expected output
2) Get values which are represented as range e.g. I want the below input row
rows Checks
[2] abc(107 to 109) xyz 115 jbo xyz 104 optim
as
output >
rows Variable Value1 Value2
[2] abc 107 109
Need the solution for 1) and 2) as am working on larger data sets with same patterns and lot of mixed Variable-Value combinations.
Thanks in advance.