4

I want a library that I can give it a file and a config param of column length, name, and possibly type and from that get back a map of the columns of each row.

This isn't difficult thing to do on my own, but I would be surprised if there wasn't already a great solution. I've tried searching for one, but have had no luck.

mfollett
  • 1,814
  • 15
  • 27
  • 1
    http://stackoverflow.com/questions/1609807/whats-the-best-way-of-parsing-a-fixed-width-formatted-file-in-java – tim_yates Oct 12 '11 at 18:16
  • @tim_yates I did see that when searching, I didn't know if there was a cleaner answer for Groovy though. – mfollett Oct 12 '11 at 18:20
  • 1
    Not that I've heard of. Flatworm looks awful too... Write a complex xml file to parse a simple flat file? Yuck! Sorry about that. I might have to have a go at writing a builder tomorrow ;-) – tim_yates Oct 12 '11 at 18:27

3 Answers3

3

I don't know of anything specifically for groovy. I've done something similar with regular expressions; here's a quick and dirty parser based on this approach:

def input =
"JOHN      DOE       123       \n" +
"JANE      ROE       456       \n"

def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]

def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"

rows = []
input.eachLine { line ->
    def m = line =~ pattern
    if (m) {
        def names = fieldDefs.keySet() as List
        def values = m[0][1..-1].collect { it.trim() }
        rows << [names, values].transpose().collectEntries{it}
    }
}
ataylor
  • 64,891
  • 24
  • 161
  • 189
  • Yeah, I said that it wouldn't be difficult to do, I was looking for one in a library that was looked over and highly optimized and such. On that topic, it seems like it would be more efficient to slice a string with getAt (e.g. line[start..finish]) than it would to do a regex. – mfollett Oct 13 '11 at 05:46
2

You can always use FlatFileItemReader from Spring Batch that will return a structure like JDBC ResultSet.

But it might be overkill and make it more complex. For Groovy I find it easy to read and write code like this:

file = '''\
JOHN      DOE       123       
JANE      ROE       456       
'''

names = []
file.eachLine { names << [
    first: it[0..9].trim(), 
    last:  it[10..19].trim(),
    age:   it[20..22].toInteger()
]}

assert names[0].first == 'JOHN'
assert names[1].age == 456
Jonny Heggheim
  • 1,423
  • 1
  • 12
  • 19
0

Just tested this using the regex method and the String getAt method. getAt seems to be about 2x faster than regex over 10k

def input = "";

for(i=1;i<10000;i++)
{
    input += "JOHN      DOE       123       \n"
}


def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]


def benchmark = { closure ->
    start = System.currentTimeMillis()
    closure.call()
    now = System.currentTimeMillis()
    now - start
  }


def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"

duration = benchmark {
    rows = []
    input.eachLine { line ->

        String firstName = line.getAt(0..9).trim();
        String lastName = line.getAt(10..19).trim();
        String someValue = line.getAt(20..29).trim();
        rows << ["firstName":firstName,"lastName":lastName,"someValue":someValue];
    }

    //println rows
    }


println "execution of string method took ${duration} ms"


def duration = benchmark {
rows = []
input.eachLine { line ->
    def m = line =~ pattern
    if (m) {
        def names = fieldDefs.keySet() as List
        def values = m[0][1..-1].collect { it.trim() }
        rows << [names, values].transpose().collectEntries{it}
    }
}

//println rows
}

println "execution of regex method took ${duration} ms"

execution of string method took 245 ms execution of regex method took 505 ms

Tim H
  • 338
  • 1
  • 2
  • 8