20

I've got a file from a vendor that has 115 fixed-width fields per line. How can I parse that file into the 115 fields so I can use them in my code?

My first thought is just to make constants for each field like NAME_START_POSITION and NAME_LENGTH and using substring. That just seems ugly, so I'm curious about better ways of doing this. None of the couple of libraries a Google search turned up seemed any better, either.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
MattGrommes
  • 11,974
  • 9
  • 37
  • 40

10 Answers10

22

I would use a flat file parser like flatworm instead of reinventing the wheel: it has a clean API, is simple to use, has decent error handling and a simple file format descriptor. Another option is jFFP but I prefer the first one.

Marcello Nuccio
  • 3,901
  • 2
  • 28
  • 28
Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
  • 1
    I just wanted to follow up with a thanks for a pointer to Flatworm. It works like a champ and my whole team at work is now using it. – MattGrommes Jan 23 '10 at 19:17
  • 1
    @MattGrommes I'm glad to know you liked it. And thank you very much for the follow up, it's very much appreciated! – Pascal Thivent Jan 23 '10 at 19:49
  • I tried the library a few days ago and it was broken beyond repair. I would try the previous version but i do not see any docs for it – Monachus Mar 07 '11 at 14:33
  • This is a great tool! Is there a way to integrate it into some kind of editor - eclipse? – Rekin Apr 10 '12 at 08:56
  • Are you guys still using this flatworm tool? the DTD reference is broken in the file format XML definition. How can I resolve this? – Iofacture Apr 27 '16 at 22:52
  • 1
    Late to the game but https://github.com/ffpojo/ffpojo looks nice as it maps to and from POJOs – Usman Ismail Jun 21 '17 at 13:55
8

I've played arround with fixedformat4j and it is quite nice. Easy to configure converters and the like.

p3t0r
  • 1,980
  • 1
  • 16
  • 22
  • 1
    Note that ff4j uses runtime annotations, which makes mass parsing pretty slow. – ron Sep 11 '12 at 14:32
7

uniVocity-parsers comes with a FixedWidthParser and FixedWidthWriter the can support tricky fixed-width formats, including lines with different fields, paddings, etc.

// creates the sequence of field lengths in the file to be parsed
FixedWidthFields fields = new FixedWidthFields(4, 5, 40, 40, 8);

// creates the default settings for a fixed width parser
FixedWidthParserSettings settings = new FixedWidthParserSettings(fields); // many settings here, check the tutorial.

//sets the character used for padding unwritten spaces in the file
settings.getFormat().setPadding('_');

// creates a fixed-width parser with the given settings
FixedWidthParser parser = new FixedWidthParser(settings);

// parses all rows in one go.
List<String[]> allRows = parser.parseAll(new File("path/to/fixed.txt")));

Here are a few examples for parsing all sorts of fixed-width inputs.

And here are some other examples for writing in general and other fixed-width examples specific to the fixed-width format.

Disclosure: I'm the author of this library, it's open-source and free (Apache 2.0 License)

Jeronimo Backes
  • 6,141
  • 2
  • 25
  • 29
1

Here is a basic implementation I use:

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Reader;
import java.io.Writer;

public class FlatFileParser {

  public static void main(String[] args) {
    File inputFile = new File("data.in");
    File outputFile = new File("data.out");
    int columnLengths[] = {7, 4, 10, 1};
    String charset = "ISO-8859-1";
    String delimiter = "~";

    System.out.println(
        convertFixedWidthFile(inputFile, outputFile, columnLengths, delimiter, charset)
        + " lines written to " + outputFile.getAbsolutePath());
  }

  /**
   * Converts a fixed width file to a delimited file.
   * <p>
   * This method ignores (consumes) newline and carriage return
   * characters. Lines returned is based strictly on the aggregated
   * lengths of the columns.
   *
   * A RuntimeException is thrown if run-off characters are detected
   * at eof.
   *
   * @param inputFile the fixed width file
   * @param outputFile the generated delimited file
   * @param columnLengths the array of column lengths
   * @param delimiter the delimiter used to split the columns
   * @param charsetName the charset name of the supplied files
   * @return the number of completed lines
   */
  public static final long convertFixedWidthFile(
      File inputFile,
      File outputFile,
      int columnLengths[],
      String delimiter,
      String charsetName) {

    InputStream inputStream = null;
    Reader inputStreamReader = null;
    OutputStream outputStream = null;
    Writer outputStreamWriter = null;
    String newline = System.getProperty("line.separator");
    String separator;
    int data;
    int currentIndex = 0;
    int currentLength = columnLengths[currentIndex];
    int currentPosition = 0;
    long lines = 0;

    try {
      inputStream = new FileInputStream(inputFile);
      inputStreamReader = new InputStreamReader(inputStream, charsetName);
      outputStream = new FileOutputStream(outputFile);
      outputStreamWriter = new OutputStreamWriter(outputStream, charsetName);

      while((data = inputStreamReader.read()) != -1) {
        if(data != 13 && data != 10) {
          outputStreamWriter.write(data);
          if(++currentPosition > (currentLength - 1)) {
            currentIndex++;
            separator = delimiter;
            if(currentIndex > columnLengths.length - 1) {
              currentIndex = 0;
              separator = newline;
              lines++;
            }
            outputStreamWriter.write(separator);
            currentLength = columnLengths[currentIndex];
            currentPosition = 0;
          }
        }
      }
      if(currentIndex > 0 || currentPosition > 0) {
        String line = "Line " + ((int)lines + 1);
        String column = ", Column " + ((int)currentIndex + 1);
        String position = ", Position " + ((int)currentPosition);
        throw new RuntimeException("Incomplete record detected. " + line + column + position);
      }
      return lines;
    }
    catch (Throwable e) {
      throw new RuntimeException(e);
    }
    finally {
      try {
        inputStreamReader.close();
        outputStreamWriter.close();
      }
      catch (Throwable e) {
        throw new RuntimeException(e);
      }
    }
  }
}
Constantin
  • 1,506
  • 10
  • 16
  • 2 years later but I hope you see this. Why do you need to check if the read in character, data, is equal to 13 or 10 if the only possible returns are the character from the inputstream or -1 which denotes the end of a file? – Efie Jul 31 '18 at 13:27
  • You are correct ... This implementation is used for fixed width records that end in newline. – Constantin Sep 24 '18 at 12:36
1

Most suitable for Scala, but probably you could use it in Java

I was so fed up with the fact that there is no proper library for fixed length format that I have created my own. You can check it out here: https://github.com/atais/Fixed-Length

A basic usage is that you create a case class and it's described as an HList (Shapeless):

case class Employee(name: String, number: Option[Int], manager: Boolean)

object Employee {

    import com.github.atais.util.Read._
    import cats.implicits._
    import com.github.atais.util.Write._
    import Codec._

    implicit val employeeCodec: Codec[Employee] = {
      fixed[String](0, 10) <<:
        fixed[Option[Int]](10, 13, Alignment.Right) <<:
        fixed[Boolean](13, 18)
    }.as[Employee]
}

And you can easily decode your lines now or encode your object:

import Employee._
Parser.decode[Employee](exampleString)
Parser.encode(exampleObject)
Atais
  • 10,857
  • 6
  • 71
  • 111
1

If your string is called inStr, convert it to a char array and use the String(char[], start, length) constructor

char[] intStrChar = inStr.toCharArray();
String charfirst10 = new String(intStrChar,0,9);
String char10to20 = new String(intStrChar,10,19);
Daniel Puiu
  • 962
  • 6
  • 21
  • 29
user300778
  • 21
  • 1
0

The Apache Commons CSV project can handle fixed with files.

Looks like the fixed width functionality didn't survive promotion from the sandbox.

Jherico
  • 28,584
  • 8
  • 61
  • 87
  • That seems to be "in the sandbox". I'm not familiar with commons, but I get the impression that it means it's not 'done' yet? – Ape-inago Mar 08 '13 at 17:07
  • It means there is no official release. This is significantly different from "doesn't work". Based on the amount of time it's been in the sandbox, no one appears to to be pushing it towards release, but it still ends up getting widely used. – Jherico Mar 09 '13 at 01:19
  • Can you elaborate on that? I just had a look at the API and could not find any hint/proof that it actually supports fixed width columns instead of delimiters. BTW the current URL is http://commons.apache.org/proper/commons-csv/ – Gandalf Feb 25 '14 at 13:27
  • You could vote for such a feature https://issues.apache.org/jira/browse/CSV-272 – Holger Brandl Feb 27 '21 at 06:34
0

Here is the plain java code to read fixedwidth file:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;

public class FixedWidth {

    public static void main(String[] args) throws FileNotFoundException, IOException {
        // String S1="NHJAMES TURNER M123-45-67890004224345";
        String FixedLengths = "2,15,15,1,11,10";

        List<String> items = Arrays.asList(FixedLengths.split("\\s*,\\s*"));
        File file = new File("src/sample.txt");

        try (BufferedReader br = new BufferedReader(new FileReader(file))) {
            String line1;
            while ((line1 = br.readLine()) != null) {
                // process the line.

                int n = 0;
                String line = "";
                for (String i : items) {
                    // System.out.println("Before"+n);
                    if (i == items.get(items.size() - 1)) {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim();
                    } else {
                        line = line + line1.substring(n, n + Integer.parseInt(i)).trim() + ",";
                    }
                    // System.out.println(
                    // S1.substring(n,n+Integer.parseInt(i)));
                    n = n + Integer.parseInt(i);
                    // System.out.println("After"+n);
                }
                System.out.println(line);
            }
        }

    }

}
0
/*The method takes three parameters, fixed length record , length of record which will come from schema , say 10 columns and third parameter is delimiter*/
public class Testing {

    public static void main(String as[]) throws InterruptedException {

        fixedLengthRecordProcessor("1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10,1,2,3,4,5,6,7,8,9,10", 10, ",");

    }

    public static void fixedLengthRecordProcessor(String input, int reclength, String dilimiter) {
        String[] values = input.split(dilimiter);
        String record = "";
        int recCounter = 0;
        for (Object O : values) {

            if (recCounter == reclength) {
                System.out.println(record.substring(0, record.length() - 1));// process
                                                                                // your
                                                                                // record
                record = "";
                record = record + O.toString() + ",";
                recCounter = 1;
            } else {

                record = record + O.toString() + ",";

                recCounter++;

            }

        }
        System.out.println(record.substring(0, record.length() - 1)); // process
                                                                        // your
                                                                        // record
    }

}
fregante
  • 29,050
  • 14
  • 119
  • 159
0

Another library that can be used to parse a fixed width text source: https://github.com/org-tigris-jsapar/jsapar

Allows you to define a schema in xml or in code and parse fixed width text into java beans or fetch values from an internal format.

Disclosure: I am the author of the jsapar library. If it does not fulfill your needs, on this page you can find a comprehensive list of other parsing libraries. Most of them are only for delimited files but some can parse fixed width as well.

stenix
  • 3,068
  • 2
  • 19
  • 30
  • 1
    If you're going to link to a library you wrote, as can be seen on the project's [contributor's page](https://github.com/org-tigris-jsapar/jsapar/graphs/contributors), you **must** disclose that it's yours *directly in your answer*. Posts that link to affiliated content and do not disclose that affiliation will be marked as **spam** and removed. Please read [this guide](https://stackoverflow.com/help/promotion) for how to format your posts. – Das_Geek Jan 31 '20 at 20:15