0

I have a text file dump that I need to convert to a delimited file. The file contains a series of "records" (for lack of a better word) formatted like this:

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 123456
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

User: abc123 
Date: 7/3/12
Subject: the foo is bar
Project: 234567
Problem: foo bar in multiple lines of text
Resolution: foo un-barred in multiple lines of text

...

My end result is to get a flat file of delimited values. Using the records above, we would see:

abc123;7/3/12;the foo is bar;123456;foo bar in multiple lines of text;foo un-barred in multiple lines of text
abc123;7/3/12;the foo is bar;234567;foo bar in multiple lines of text;foo un-barred in multiple lines of text

Code appears below, and following that, the problem I'm experiencing.

    import java.util.*;
import java.io.*;
import java.nio.file.*;
//
public class ParseOutlookFolderForSE
{
        public static void main(String args[])
        {
            String user = "";
            String PDLDate = "";
            String name = "";
            String PDLNum = "";
            String problemDesc = "test";
            String resolutionDesc = "test";
            String delim = ";";
            int recordCounter = 0;
            //
            try
            {
                Path file = Paths.get("testfile2.txt");
                FileInputStream fstream = new FileInputStream("testfile2.txt");
               // Get the object of DataInputStream
                /* DataInputStream in = new DataInputStream(fstream);  */
                BufferedReader br = new BufferedReader(new InputStreamReader(fstream));  //Buffered Reader
                String inputLine = null;     //String
                StringBuffer theText = new StringBuffer();  //StringBuffer
// problem: output contains last record ONLY. program is cycling through the entire file, overwriting records until the end.
// add a for loop based on recordCounter
                for(recordCounter=0;recordCounter<10;recordCounter++)
                {
                while((inputLine=br.readLine())!=null)
                {
                    if(inputLine.toLowerCase().startsWith("from:"))
                    {

                /*      recordCounter = recordCounter++;    */  // commented out when I added recordCounter++ to the for loop
                        user = inputLine.trim().substring(5).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("effective date"))
                    {

                        PDLDate = inputLine.trim().substring(15).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("to:"))
                    {

                        name = inputLine.trim().substring(3).trim();
                    }
                    else
                    if(inputLine.toLowerCase().startsWith("sir number"))
                    {

                        PDLNum = inputLine.trim().substring(12).trim();
                    }
                }      //close for loop
                }   // close while
                System.out.println(recordCounter + "\n" + user + "\n" + name + "\n" + PDLNum + "\n" + PDLDate + "\n" + problemDesc + "\n" + resolutionDesc);
                System.out.println(recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc);
                String lineForFile = (recordCounter + ";" + user + ";" + name + ";" + PDLNum + ";" + PDLDate + ";" + problemDesc + ";" + resolutionDesc + System.getProperty("line.separator"));
                System.out.println(lineForFile);
                try
                {
                    BufferedWriter out = new BufferedWriter(new FileWriter("testfileoutput.txt"));
                    out.write(lineForFile);
                    out.close();
                }
                catch (IOException e)
                {
                    System.out.println("Exception ");
                }
            } //close try
            catch (Exception e)
            {
                System.err.println("Error: " + e.getMessage());
            }
        }

    }

My final output is ONLY the last record. I believe that what's happening is that the program is reading every line, but only the LAST one doesn't get overwritten with the next record. Makes sense. So I added a FOR loop, incrementing by 1 if(inputLine.toLowerCase().startsWith("user:")) and outputting the counter variable with my data to validate what's happening.

My FOR loop begins after step 3 in my pseudocode...after BufferedReader but before my IF statements. I terminate it after I write to the file in step 6. I'm using for(recCounter=0;recCounter<10;recCounter++) and while I get ten records in my output file, they are all instances of the LAST record of the input file, numbered 0-9.

Leaving the for loop in the same place, I modified it to read for(recCounter=0;recCounter<10;) and placed recCounter's increment WITHIN the IF statement, incrementing every time the line starts with User:. In this case, I also got ten records in my output file, they were ten instances of the last record in the input file, and all the counters are 0.

EDIT: Given how the file is formatted, the ONLY way to determine w=one record from the next is a subsequent instance of the word "User:" at the start of the line. Each time that occurs, until the NEXT time it occurs is what constitutes a single record.

It appears as though I'm not setting my "recCounter" appropriately, or I'm not interpreting the results of what IS being set as "start a new record".

Anyone have any suggestions for how to read this file as multiple records?

dwwilson66
  • 6,806
  • 27
  • 72
  • 117
  • 1
    you have 2 lines marked '} //close for loop' but only 1 for loop! The code also does not compile for me, without removing one of them. Try removing the first one and letting us know if that fixes things. – Colin D Jul 03 '12 at 18:23
  • Also not sure why all the append() work. The Java String class doesn't have a append() method. You should be working with StringBuilder instead. As @ColinD mentioned you also have an extra } that doesn't make sense. Where you have the //close for loop and //close for while comments. – Chris911 Jul 03 '12 at 18:29
  • @ColinD Code updated to the right version...the problem with having too many windows open! Code still will not work as expected--last record only. – dwwilson66 Jul 03 '12 at 18:33
  • @Chris911 append() was based on this problem http://stackoverflow.com/questions/11311452/parsing-a-file-with-single-and-multi-lines-of-data I was having earlier today. Was I given bad advice on that? – dwwilson66 Jul 03 '12 at 18:36
  • @dwwilson66 Chris911 was pointing out that you were calling append on a string. Your posted question, suggested using it on a StringBuilder. – Colin D Jul 03 '12 at 18:37
  • @ColinD Yep...I realized that & still need to fix that issue; that's why the append code's been deleted until I figure out what I'm messing up in the file read/write. Thanks for clarifying – dwwilson66 Jul 03 '12 at 18:40

2 Answers2

3

Okay, so your pseudo-code should go something like this:

declare variables
open file
while not eof
  read input
  if end of set
    format output
    write output
    clear variables
  figure out which variable
  store in correct variable
end-while

There might be a trick to figuring out when you've finished one set and can start the next. If a set is supposed to be terminated by a blank line as appears from your example, then you could just check for the blank line. Otherwise, how do you know? Does a set always start with "user"?

Also, don't forget to write the last record. You don't want to leave unwritten stuff in your buffer/table.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • I think the "if end of set" piece is what I'm missing; I'll mess with that a bit. Yes, a set ALWAYS starts with "user:" and that's about the only reliable marker in the file. I've clarified the question to add that note. – dwwilson66 Jul 03 '12 at 18:16
1

From your description it sounds like the following is the case: you are actually not writing the output strings as you complete them, but instead doing all of the writing at the end. It does not sound like you are saving the output strings outside of the loop, and so each time you find a record, you are overwriting the output string you previously calculated.

You should test that you are actually writing to the file after each record is found and has its output string created.

Without posting your code, I am not sure I can help you much further.

Colin D
  • 5,641
  • 1
  • 23
  • 35
  • that's what I suspect is happening too, but I can't figure out where to make that happen. Code's been posted though...so if you have any further input, I'd be happy to learn what you've got! – dwwilson66 Jul 03 '12 at 18:12