0

So I'm attempting to modify a dataset in Java based on rules acquired from a separate file

The data set is a in the form of a .dat file and looks like this

54 59 63 85 86 90 93 98 107 113 Annot_4 Annot_5

39 40 52 55 59 63 85 86 90 93 99 108 114 Annot_1 Annot_4 Annot_5

The generalization rules look like this

Annot_1, Annot_3 => Annot_X

Annot_2, Annot_5 => Annot_Y

So basically I want to go over every line in the dataset and add the right hand side if the line contains any annotations from the left hand sides, so that the new dataset looks like this:

54 59 63 85 86 90 93 98 107 113 Annot_4 Annot_5 Annot_Y

39 40 52 55 59 63 85 86 90 93 99 108 114 Annot_1 Annot_4 Annot_5 Annot_X Annot_Y

What I have so far is only applying the first rule and then stopping.

try {
        BufferedReader rulesBR = new BufferedReader(new FileReader(generalizationRules));
        BufferedReader datasetBR = new BufferedReader(new FileReader(dataset));
        String rulesLine;
        String datasetLine;
        String parts1[];
        String rhs;
        rulesLine = rulesBR.readLine();

        while (rulesLine!=null){
            //System.out.println(rulesLine);
            String parts[] = rulesLine.split("=>");
            String lhs[] = parts[0].split(",");

            rhs = parts[1];
            for (String part : lhs){
                System.out.println(part);
                while ((datasetLine =datasetBR.readLine())!=null){      
                    parts1 = datasetLine.split("\\S+");
                    System.out.println(parts1);
                    if (datasetLine.contains(part))
                        writer.write(datasetLine.concat(rhs));  
                    else
                        writer.write(datasetLine);
                }
                ArrayList<String> ruleSetRow = new ArrayList<String>();
            }
            rulesLine =rulesBR.readLine();
        }
        rulesBR.close();
        datasetBR.close();
    }

Any help would be greatly appreciated.

Community
  • 1
  • 1

1 Answers1

0

Your code has got several problems. First and foremost that it's structured wrong: your inner loop only runs once, because datasetBR will run out of lines and is never rewound, while the outer loop is still busy parsing rules.

What I'd do:

  • Read all the rules into a HashMap<String, String> for the left hand side and the right hand side. (Better still would be to use Guava's HashMultimap so you can even store the splitted version of the rules, but this is not really necessary.)
  • Then in a second loop run through all lines of your dataset (like your inner loop).
  • For each line do a method call findRulesByDatasetLine(rulesMap, datasetLine). The result of this function will return a string with the found rules (e.g. "Annot_X Annot_Y"). You can concat this directly to the end of the line and write it directly your to writer.

The function String findRulesByDatasetLine(Map<String, String> rulesMap, String datasetLine) will start by defining an empty result string, then simply loop through each entry in the map, split the entry key and if any of the split parts are found in the datasetLine add the entry value to the result string.

Hope this helps. Good luck!

Community
  • 1
  • 1
Lodewijk Bogaards
  • 19,777
  • 3
  • 28
  • 52