I have a data.frame with 11717 obs. of 15 variables. See below:
$ SCC : Factor w/ 11717 levels "10100101","10100102",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Data.Category : Factor w/ 6 levels "Biogenic","Event",..: 6 6 6 6 6 6 6 6 6 6 ...
$ Short.Name : Factor w/ 11238 levels "","2,4-D Salts and Esters Prod /Process Vents, 2,4-D Recovery: Filtration",..: 3283 3284 3293 3291 3290 3294 3295 3296 3292 3289 ...
$ EI.Sector : Factor w/ 59 levels "Agriculture - Crops & Livestock Dust",..: 18 18 18 18 18 18 18 18 18 18 ...
$ Option.Group : Factor w/ 25 levels "","C/I Kerosene",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Option.Set : Factor w/ 18 levels "","A","B","B1A",..: 1 1 1 1 1 1 1 1 1 1 ...
$ SCC.Level.One : Factor w/ 17 levels "Brick Kilns",..: 3 3 3 3 3 3 3 3 3 3 ...
$ SCC.Level.Two : Factor w/ 146 levels "","Agricultural Chemicals Production",..: 32 32 32 32 32 32 32 32 32 32 ...
$ SCC.Level.Three : Factor w/ 1061 levels "","100% Biosolids (e.g., sewage sludge, manure, mixtures of these matls)",..: 88 88 156 156 156 156 156 156 156 156 ...
$ SCC.Level.Four : Factor w/ 6084 levels "","(NH4)2 SO4 Acid Bath System and Evaporator",..: 4455 5583 4466 4458 1341 5246 5584 5983 4461 776 ...
$ Map.To : num NA NA NA NA NA NA NA NA NA NA ...
$ Last.Inventory.Year: int NA NA NA NA NA NA NA NA NA NA ...
$ Created_Date : Factor w/ 57 levels "","1/27/2000 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Revised_Date : Factor w/ 44 levels "","1/27/2000 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Usage.Notes : Factor w/ 21 levels ""," ","includes bleaching towers, washer hoods, filtrate tanks, vacuum pump exhausts",..: 1 1 1 1 1 1 1 1 1 1 ...
I am trying to make a search for the words "Combustion" and "Coal" and create a subset showing only where "Combustion" and "Coal" are combined in the same sentence OR the same row anywhere in the data.frame:
example of the words used in same sentence:
Fuel Comb - Electric Generation - Coal.
example of the words used in same row / different columns:
see screenshot (I don't have enough creds to attach a img). [screenshot][1]
Using RStudio search shows: 675 results for "Comb" and 251 results for "Coal". So the final combination should be equal or less than 251 if I'm correct.
I tried using grep
and grepl
. However the only way for me to use these functions is to repeat the process across each column before creating the subset (using match
function for instance).
I find this to be a time consuming process. Would you have a better one?
[1]: https://i.stack.imgur.com/YJr5B.png