I'm trying to create association rules for some error messages. But because many of the errors have numeric measurements within the text they are reading in as 64k unique error, in reality its about 200 unique error messages. I want to put the numeric values into categories (10 to 15 bins) in order to make the data more manageable. But I do not want to edit the text part of the error only the numeric.
Example Errors:
error. volt 0.025, system failure sup 22 percent
error. volt 0.0015, aux system failure sup 53 percent
system monitor. bal 882 units. cross is -1.8
Desired output(
error. volt 1, system failure sup 50 percent
error. volt 1, aux system failure sup 50 percent
system monitor. bal 1000 units. cross is -1
I was trying to use gsub but ran into a problem with creating bins and also so many gsubs in one.
y<- gsub("\\d\\.\\d\\d","1",data)
Any ideas on how to create bins for only the numeric part of the error message with out effecting the text? I'm not very picky on the number of bins.