0

I have a log file that contains many things and among them it contains xml message that I would like to extract and write to a file if inside of that xml message I find specific string (transID).

For example, this is a file I want to search for string 'TODPG201412041625130415', and once I find it, I want to grab everything between:

<?xml version = "1.0" encoding = "ISO-8859-1" ?>
<SalesOrderAcknowledgement>
  <HeaderData>
    <TransID>TODPG201412041625130415</TransID>

and:

</SalesOrderAcknowledgement>

File:

05/12/2014 15:07:53  INFO [Search.java 445] - The Trans ID: TODPG201412041625130370 has already been processed.
05/12/2014 15:07:53  INFO [Search.java 316] - The message for Trans ID TODPG201412041625130370 was ALREADY CONSUMED.  Consumed Original Message: <?xml version = "1.0" encoding = "ISO-8859-1" ?>
<SalesOrderAcknowledgement>
  <HeaderData>
    <TransID>TODPG201412041625130415</TransID>
    <Description>Estimate</Description>
    <SiteQueueName>TODPG</SiteQueueName>
    <LineItems>5</LineItems>
    <TimeStamp>201412041625130370</TimeStamp>
  </HeaderData>
  <SalesOrderDetail>
    <SalesID>2002726862</SalesID>
  </SalesOrderDetail>
  <SalesOrderLineItems>    
    <LineItem>
      <SalesLineNum>20</SalesLineNum>
      <UnitPrice>0.4300</UnitPrice>
      <BurdenRate>0.0000</BurdenRate>
      <ExtendedPrice>0.00</ExtendedPrice>
      <RecordStatus>A</RecordStatus>
      <ErrorMessage1>Sales Order 2002726862 modified</ErrorMessage1>
      <ErrorMessage2></ErrorMessage2>
      <ErrorMessage3></ErrorMessage3>
    </LineItem>
    <LineItem>
      <SalesLineNum>30</SalesLineNum>
      <UnitPrice>3.6500</UnitPrice>
      <BurdenRate>0.0000</BurdenRate>
      <ExtendedPrice>0.00</ExtendedPrice>
      <RecordStatus>A</RecordStatus>
      <ErrorMessage1>Sales Order 2002726862 modified</ErrorMessage1>
      <ErrorMessage2></ErrorMessage2>
      <ErrorMessage3></ErrorMessage3>
    </LineItem>    
  </SalesOrderLineItems>
</SalesOrderAcknowledgement>
05/12/2014 15:07:55  INFO [Search.java 232] - ****  XML Message: 
<?xml version = "1.0" encoding = "ISO-8859-1" ?>
<SalesOrderAcknowledgement>
  <HeaderData>
    <TransID>TODPG201412041635120944</TransID>
    <Description>Estimate</Description>
    <SiteQueueName>TODPG</SiteQueueName>
    <LineItems>5</LineItems>
    <TimeStamp>201412041635120944</TimeStamp>
  </HeaderData>
  <SalesOrderDetail>
    <SalesID>2002720443</SalesID>
  </SalesOrderDetail>
  <SalesOrderLineItems>
    <LineItem>
      <SalesLineNum>10</SalesLineNum>
      <UnitPrice>0.0870</UnitPrice>
      <BurdenRate>0.0000</BurdenRate>
      <ExtendedPrice>0.00</ExtendedPrice>
      <RecordStatus>A</RecordStatus>
      <ErrorMessage1>Sales Order 2002720443 modified</ErrorMessage1>
      <ErrorMessage2></ErrorMessage2>
      <ErrorMessage3></ErrorMessage3>
    </LineItem>
  </SalesOrderLineItems>
</SalesOrderAcknowledgement>

the transID will be always different and there can be multiple transID's in the same file.

I got to the point where I am printing the line number where the string is found, but I don't know how to get the string from <?xml version = "1.0" .... :

import java.util.ArrayList;
import java.util.Scanner;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.*;


public class installation
{
    public static String searchString = "TODPG201412041625130415";  

    public static void main(String args[])
    {   
        final File folder = new File("C:/Users/Administrator/Desktop/Estimated_Acualized/LogBackup/2014");      
        listFilesForFolder(folder);
    }

    public static void listFilesForFolder(final File folder) 
    {       

        for (final File fileEntry : folder.listFiles()) 
        {           
            findWord(searchString, fileEntry);      

        }
    }


    public static void findWord(String word, File file){
        try
        {
             Scanner scanner = new Scanner(file);

            int lineNum = 0;
            while (scanner.hasNextLine()) 
            {
                String line = scanner.nextLine();
                lineNum++;
                if(line.indexOf(searchString) > -1)
                { 
                    System.out.println("found string on line " +lineNum);
                    System.out.println(line);
                }
            }
        }
        catch(Exception ex){
            ex.printStackTrace();
        }
    }
}

Can someone please, shed some light as I am stuck.

Angelina
  • 2,175
  • 10
  • 42
  • 82
  • 1
    why don't use `SaX` parser or `JaXB`??? – Jordi Castilla Aug 18 '15 at 13:58
  • 1
    You have a nice tutorial to parse [xml in java](http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser). Have a look at it. – Uma Kanth Aug 18 '15 at 14:05
  • 1
    Will this sequence ` ` allways be the same? The same exact non blank characters, the same amount of spaces, and new lines? If yes it would lead to a simple workflow. – Serge Ballesta Aug 18 '15 at 14:15
  • 1
    As has been said [time and again](http://stackoverflow.com/a/1732454/113632), you need to use an XML parser to parse XML. If you do so, it will be trivial to select the element(s) you're interested in. – dimo414 Aug 18 '15 at 14:30
  • 1
    You should probably tag XML ... – Mazino Aug 18 '15 at 14:32

1 Answers1

1

Here you have to find in that order:

  • 3 lines containing fixed string (<?xml version = "1.0", <SalesOrderAcknowledgement>, <HeaderData>)
  • the specific searched string (TODPG201412041625130415)

Once you get them, you copy the found lines (with the exception that first one should start on <?xml...), and everything until you find </SalesOrderAcknowledgement>

I would just use a copy mode if you have found the beginning and copy everything until the end and a search mode (! copy) where you need the 4 strings. In that part, if you find next string on next line you iterate and save the line, and reset to search for first string on first error

Here is a limited adaptation of your code that just output messages on err and copy the found text to out:

public class Installation {

    private static String[] preIdents = {"<?xml version = \"1.0\"",
        "<SalesOrderAcknowledgement>", "<HeaderData>", ""};
    private static String postIdent = "</SalesOrderAcknowledgement>";
    public static String searchString = "TODPG201412041625130415";

    public static void main(String args[]) {
        final File folder = new File("Z:/Documents/SO_test/2014");
        preIdents[preIdents.length - 1] = searchString;
        listFilesForFolder(folder);
    }

    public static void listFilesForFolder(final File folder) {

        for (final File fileEntry : folder.listFiles()) {
            findWord(searchString, preIdents, postIdent, fileEntry);

        }
    }

    public static void findWord(String word, String[] pre, String post, File file) {
        try {
            Scanner scanner = new Scanner(file);

            String[] prefix = new String[pre.length];

            int status = 0;
            boolean copy = false;
            int lineNum = 0;
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();
                lineNum++;
                if (copy) {
                    System.out.println(line);
                    if (line.indexOf(post) > -1) {
                        copy = false;
                        status = 0;
                    }
                } else {
                    int index = line.indexOf(pre[status]);
                    if (index > -1) {
                        // System.err.println("found " + pre[status] + " on line " + lineNum); only for tests
                        prefix[status] = (status == 0) ? line.substring(index) : line;
                        if (++status == pre.length) {
                            prefix[status - 1] = line;
                            copy = true;
                            for (String p : prefix) {
                                System.out.println(p);
                            }
                        }
                    } else {
                        status = 0;
                    }
                }
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }
}
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 1
    @Angelina: it was just a spy for controlling on a real file. You have just to comment the `System.err.println` out. Post edited with it – Serge Ballesta Aug 18 '15 at 15:37