2

I'm parsing out a bunch of employee incident reports for reporting purposes.

The incident reports themselves are free text, and I have to categorize the injuries by body location. I'm trying to avoid if{}elseif{}elseif{}....}else{}.

Example incident reports:

Employee slipped on wet stairs and injured her knee and right arm, and struck her head on the handrail.

Should add "knee", "arm", and "head" to affected area.

Employee was lifting boxes without approved protective equipment resulting in a back strain.

Should add "back" to affected area.

While attempting to unjam copier, employee got right index finger caught in machinery resulting in a 1-inch cut.

Should add "finger" to affected area.

Right now, I have:

private static StaffInjuryData setAffectedAreas(String incident, StaffInjuryData sid){
   incident = incident.toUpperCase(); //eliminate case issues

   if(incident.contains("HEAD")){
       sid.addAffectedArea("HEAD");
   }else if(incident.contains("FACE")){
       sid.addAffectedArea("FACE");
   }else if(incident.contains("EYE")){
       sid.addAffectedArea("EYE");
   }else if(incident.contains("NOSE")){
       sid.addAffectedArea("NOSE");
   }
   //etc, etc, etc
   return sid;
}

Is there an easier/more efficient way then if-elseif-ad inifinitum to do this?

Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
Bob Stout
  • 1,237
  • 1
  • 13
  • 26
  • 1
    Note also that natural language processing is _hard_. A simple dictionary solution will only get you so far (maybe far enough in your case). Unless the context of words is taken into consideration, you'll end up with some false positives from homonyms. – Mick Mnemonic Dec 18 '17 at 19:12
  • 1
    Yeah, "Employee injured her knee in the back room" will have a false positive on "back" which you'll have to manually check. – D M Dec 18 '17 at 19:16
  • Yeah, false positives are unavoidable. But it'll still be faster for the admin people than going through every single incident manually. – Bob Stout Dec 18 '17 at 19:20

4 Answers4

5

One approach is to construct a regular expression from the individual body parts, use it for searching the string, and add the individual matches to the list:

Pattern bodyParts = Pattern.compile("\\b(head|face|eye|nose)\\b", Pattern.CASE_INSENSITIVE);

Use of \b on both ends prevents partial matches, e.g. finding "head" in the text containing "forehead" or "eye" inside an "eyelid".

This Q&A explains how to search text using regex in Java.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
2

Add a Set<String> as parameter where you provide all expected keyword :

private static StaffInjuryData setAffectedAreas(String incident, StaffInjuryData sid,  Set<String> keywords){

   incident = incident.toUpperCase(); //eliminate case issues

   for (String keyword : keywords){        
     if(incident.contains(keyword)){
       sid.addAffectedArea(keyword);  
     }
   }

   return sid;
}
davidxxx
  • 125,838
  • 23
  • 214
  • 215
0

Perhaps creating a list containing all parts {neck,shoulder,back,etc} and then checking if the entry contains any of those values?

moe_nyc
  • 307
  • 1
  • 4
  • 18
0

you might be able to create some sort of container (like a list or set) with all of the different parts (IE Head, Face, Eye, Nose, Finger, etc), split the string using the .split() method, and then compare each part of that string to each item in your container.

This might be easier, but could possibly be less efficient

tranfuria
  • 11
  • 1
  • 1
  • 3