1

I have the following scenario:

I have a one liner flat file. The line is structured such as it has a a header and then the corresponding data. It looks something like this:

HEADER1 data data data data data HEADER2 data data HEADER3 data HEADER4 data ....

I have to convert this one liner to a format, where each header is on a separate line, along with its data. So, it should look like this:

HEADER1 data data data data data
HEADER2 data data 
HEADER3 data

The "HEADER" itself follows a consistent pattern in length and type of characters it could use. So, i figured Java Regex Pattern and a Matcher would be the way to go.

I am using a StringBuilder, since it has an insert() method, which i am using to insert a line separator.

The problem i am having is that there is always a line at end of my newly created file (the one with the line separator inserts) that consists of several headers i.e they don't seem to get broken into new lines. It seems the reason for that is the fact that as soon as Matcher.find() stumbles upon a match that has a start index outside of the Matcher's region the execution exits the code where a new line is inserted.

This behavior is very inconsistent. I have flat files that are fairly short (about 50 lines), where the problem does not appear. Then i have a flat files that are 20K bytes/characters, where the problem appears.

It seems the Matcher does Matcher.find() it goes of the initial data (region) that was supplied when reading the one liner. Let's say the Matcher region is from 0 to 19688. But, then as i am inserting System.lineSeparator() the size of the StringBuilder dynamically increases by 2 bytes (\r\n)

I have tried using Matcher.reset() or modifying the Matcher's region as it was suggested here: Replace text in StringBuilder via regex

How do i deal with this issue in the most efficient and correct way? Thanks

p.s. Regex is not the problem. My regex matches every single header i have in the one liner. Just thought i'd point that out to avoid discussing the regex itself.

Here is my code:

    BufferedReader br = new BufferedReader(new FileReader(Constants.SOURCE_LOCATION+fileName));
    try {

        String origLine = br.readLine();

        StringBuilder line = null;

        while (origLine != null) {              
            line = new StringBuilder(origLine);
             Pattern pattern = Pattern.compile(Constants.AL3GROUP_REGEX_PATTERN);
             Matcher matcher = pattern.matcher(line);

                while (matcher.find()) {                            
                        line.insert(matcher.start(), System.lineSeparator());                           
                }                   


            origLine = br.readLine();
        }

        converterFileContents = line.toString();

        PrintWriter writer = new PrintWriter("sample\\output.txt");
        writer.println(converterFileContents);
        writer.close();


        System.out.println(converterFileContents);
    } finally {
        br.close();
    }
VLAZ
  • 26,331
  • 9
  • 49
  • 67
Mechkov
  • 4,294
  • 1
  • 17
  • 25
  • I would read the whole file into a `String`, and split on the header `Pattern`. Then output each element of the resulting array on a new line, prepended with your HEADER information. – Blake Yarbrough Jul 14 '15 at 13:58
  • @LanguidSquid This sounds like a decent idea. I was able to make what Evgeniy suggested below work, but will entertain your idea as well and see what works best for me. I do need the the data portion of the one liner so if i have it populated into a String array already might save me a step or two. Thanks for answering! – Mechkov Jul 14 '15 at 14:21

1 Answers1

2

try replaceAll

    str = str.replaceAll(" (HEADER\\d+)", "\r\n$1");
Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • I tried this but it does not seem to be adding the line separator before each header. Once i open the written file i still have a single one liner flat file with no "\r\n" pre-pended to each header. Any ideas? – Mechkov Jul 14 '15 at 14:06
  • I had to make some modifications to the code listed above and made your suggestion work. Thanks for the input pal! – Mechkov Jul 14 '15 at 14:18