0

I need to parse log files into an ArrayList of ArrayLists. The regex is working, and I can get the correct results in a variable or .csv output. The problem is that I need to manipulate the output by adding a value in entries where a condition is not true, and appending additional values based on index[0] (filename) matches between the original and to-be-appended rows.

Each log file can have 1-~200 entries, depending on number of field collected inputs. Log file entries are multiline and variable; but structured, so all variations are known (n=18 regexes - not all relevant to the snippet, below). I need to be able to manipulate row content based on some of those variations.

This means I need to loop through individual, potentially unequal-length rows (i.e across the table) to edit and append, and loop over each of the rows (i.e. down the table). So, simple arrays won't work as well as ArrayLists.

I'm successfully creating an ArrayList of a single ArrayList (all of what should be individual rows are put into a single ArrayList, which then goes into the parent ArrayList...).

Trying to get individual ArrayLists by moving 'covArrayList = new ArrayList(covArrayList);' between the 'while ((corrLine...)' and 'for (String..)' loops, or into the 'if(fileMatcher.find)' block returns multiple outputs per regex match, and changes the order, so values can't each be linked to a specific 'file1Name'...

FYI: I'm using JDK 10. I'll have to refactor down so JRE 8 can run the program, but want to do that later for developmental reasons.

This is a subset of my code, which is all within the main method:

//arraylist of covArrayLists init:
    List<List<String>> coverage = new ArrayList<>();
//coverage arrayList init:
    List<String> covArrayList = new ArrayList<String>();
//log file Reader init:
    File corrFile = new File("D:\\Utilities\\Development\\Java\\HPGPSLogParser\\Correct_2015-10-13_10-51.txt");
    BufferedReader corrReader = new BufferedReader(new InputStreamReader(new FileInputStream(corrFile),"UTF-16LE"));
        //NOTE: PFO differential correction log files are encoded in UTF-16 LE
    String corrText = "";
    String corrLine = "";
//corrWriter init:
    File stateCSV = new File("D:\\Utilities\\Development\\Java\\HPGPSLogParser\\tcov.csv");
    BufferedWriter corrWriter = new BufferedWriter(new FileWriter(stateCSV, true));
    String coverageOutput = "";
    String processingOutput = "";
//regex variables:
        //Coverage Details regex
    Pattern fileName1 = Pattern.compile("Rover file: (?<fileName1>[A-Z]{2}-\\d{3}-\\d{5}-SP\\d\\.SSF)+");
    String firstFileName =  "";
    Pattern noBase = Pattern.compile("(?<noBase>No matching base data found)");
    String noBaseText =  "";
    Pattern totalCoverage = Pattern.compile("(?<totalCoverage>[\\d]{1,3})\\% total coverage");
    String totalCovText =  "";
    Pattern coverageBy = Pattern.compile("(?<coverageBy>[\\d]{1,3})+\\% coverage by (?<baseStation>\\b\\w+\\b\\.[zZ].*)+", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
    String covByPct =  "";
    String covByProvider =  "";

    try(corrReader)
    {
        while ((corrLine = corrReader.readLine())!=null)
        {
            corrText = corrLine.trim();
            String delim = " ";
            String[] words = corrLine.split(delim);
            covArrayList = new ArrayList<String>(covArrayList);
            for (String s : words)
            {
            //Coverage details regex search begin - write to coverageOutput
                Matcher file1Matcher = fileName1.matcher(corrText); 
                if(file1Matcher.find())
                {
                    firstFileName = file1Matcher.group("fileName1");
                    covArrayList.add(firstFileName);
                } //end if(file1Matcher)
                Matcher baseMatcher = noBase.matcher(corrText);
                if (baseMatcher.find()) 
                {
                    noBaseText = baseMatcher.group("noBase");
                    covArrayList.add("TRUE");
                } //end if(baseMatcher)
                Matcher totCovMatcher = totalCoverage.matcher(corrText);
                if(totCovMatcher.matches()) 
                {
                    totalCovText = totCovMatcher.group("totalCoverage");
                    covArrayList.add(totalCovText);
                } //end if(totCovMatcher)
                Matcher covByMatcher = coverageBy.matcher(corrText);
                if(covByMatcher.matches()) 
                {
                    covByPct = covByMatcher.group("coverageBy");
                    covArrayList.add(covByPct);
                    covByProvider = covByMatcher.group("baseStation");
                    covArrayList.add(covByProvider);
                } //end if(covByMatcher)
            } //end for(String)
        } //end while loop - regex searches & initial output file end
        coverage.add(covArrayList);
        processing.add(procArrayList);

        corrWriter.write(coverage.toString());
        corrWriter.flush();
        outWriter.write(processing.toString());
        outWriter.flush();

The catch/finally blocks are in the code, not in the snippet.

Here's a snippet of log file with the three potential variations in this section:

--------Coverage Details:-------------------- Rover file: AA-123-12345-SP1.SSF Local time: 2/3/2015 4:06:14 PM to 2/3/2015 4:06:44 PM 0% total coverage. No matching base data found. Rover file: AA-123-12345-SP2.SSF Local time: 2/17/2014 5:51:01 PM to 2/1 7/2014 6:18:57 PM 100% total coverage 4% coverage by guug04914003.zip 100% coverage by guug04914022.zip Rover file: AA-123-12345-SP3.SSF Local time: 2/17/2014 9:53:40 PM to 2/17/2014 10:45:59 PM 100% total coverage 100% coverage by guug04914044.zip

NOTE: The line endings aren't being recognized: Actual Log File format

The closest match I can get to the log file encoding is UTF-16LE, no other option gets close to the charset/formatting of the log files.

The output I need should look like:

NOTE: Please pretend there isn't an extra line between entries (the algorithm eliminating whitespace is really screwing with the formatting I need to illustrate).

NOTE: When "noBase" is matched, no subsequent regexes will be matched (from this block).

NOTE: "covByPct" and "baseStation" may not occur, or will occur once or twice.

[["fileName1", "totalCoverage", "covByPct", "baseStation"]

["fileName1", "noBase"]

["fileName1", "totalCoverage", "covByPct", "baseStation", "covByPct", "baseStation"]]

The output closest to what I need is:

[["fileName1", "totalCoverage", "covByPct", "baseStation", "fileName1", "noBase", "fileName1", "totalCoverage", "covByPct", "baseStation", "covByPct", "baseStation"]]

I'm a beginner, and am working on a project for work that's way above current skill level. :(

Can someone help me correct my code so that the group of regex matches gets put into a new ArrayList for each entry in a log file?

Thanks so much!!

  • HA!! That linked post is awesome!! Whoever downvoted thoughtfully linked to the Cthulu HTML regex algorithm. ...not quite what I'm attempting, tho. My regex is working perfectly against the target text file. Maybe I should change the title, it might be confusing... – jacobshillman Jul 09 '18 at 16:56
  • "Can someone help me correct my code so that the group of regex matches gets put into a new ArrayList for each entry in a log file?", so you need a code review? – Geno Chen Jul 09 '18 at 17:36
  • @GenoChan if it's not working as they intended then no they don't need a code review, they need assistance with a separate aspect of programming and not a full code review. – Thomas Ward Jul 09 '18 at 17:38

1 Answers1

0

So, I figured it out. :) yay me.

My previous code would add all of the regex matches to a variable (worked great!), then add that variable to an ArrayList, and finally try to add those ArrayLists to an ArrayList of ArrayLists.

...and so I'd end up with multiple copies of all the values in an ArrayList, or all the values in a single ArrayList, inside an ArrayList.

The following code initializes a new "ArrayList coverageOutput" per filename found, then puts the subsequent regex matches into the correct ArrayList; then each new ArrayList is added to an ArrayList of ArrayLists. Not sure how exactly this working, but it is.

If someone smarter/more experienced than me would explain how it's working, I'll upvote your explanation and think you're the BOMB! :)

        try(corrReader)
    {
        while ((corrLine = corrReader.readLine())!=null)
        {
            corrText = corrLine.trim();
        //Coverage details regex search begin - write to coverageOutput
            Matcher file1Matcher = fileName1.matcher(corrText); 
            if(file1Matcher.find())
            {
                coverageOutput = new ArrayList<String>();
                coverageOutput.add(file1Matcher.group("fileName1"));
                coverage.add(coverageOutput);
            } //end if(file1Matcher)

            Matcher baseMatcher = noBase.matcher(corrText);
            if (baseMatcher.find()) 
            {
                noBaseText = baseMatcher.group("noBase");
                noBaseText = "noBaseData";
                coverageOutput.add(noBaseText);
            } //end if(baseMatcher)
            Matcher totCovMatcher = totalCoverage.matcher(corrText);
            if(totCovMatcher.matches()) 
            {
                totalCovText = totCovMatcher.group("totalCoverage");
                coverageOutput.add(totalCovText);
            } //end if(totCovMatcher)
            Matcher covByMatcher = coverageBy.matcher(corrText);
            if(covByMatcher.matches()) 
            {
                covByPct = covByMatcher.group("coverageBy");
                covByProvider = covByMatcher.group("baseStation");
                coverageOutput.add(covByPct);
                coverageOutput.add(covByProvider);
            } //end if(covByMatcher)

The output I'm getting is the desired output of:

[[filename1, totalCoverage, covByPct, covByProvider], [fileName1, noBaseData], [filename1, totalCoverage, covByPct, covByProvider, covByPct, covByProvider]]

I still need to clean up the entries to remove extraneous lines, but wanted to post.