0

So I am trying to read a rather large XML file into a String. Currently joining a list of .readLines() like this:

def is = zipFile.getInputStream(entry)
def content = is.getText('UTF-8')
def xmlBodyList = content.readLines()
return xmlBodyList[1..xmlBodyList.size].join("")

However I am getting this output in console:

java.lang.IndexOutOfBoundsException: toIndex = 21859

I don't need any explanation on IndexOutOfBoundsExceptions, but I am having a hard time figuring out how to program around this issue.

How can I implement this differently, so it allows for a large enough file size?

Jonas Praem
  • 2,296
  • 5
  • 32
  • 53
  • How big is the file? Can't you just user read file content into text directly? – Rao Jun 09 '17 at 14:23
  • That seems like it is courses some errors elsewhere in the system. The format is not accepted. As for the file size. It's VERY big. I can't tell you exactly how big right now. But I can get back with that, if it's important. – Jonas Praem Jun 09 '17 at 14:26
  • 1
    Because `content = new File(filename).getText('utf-8')` would get the file content. Hope you might aware. So no need to read lines and join them. – Rao Jun 09 '17 at 14:36
  • I am aware, but I want to delete the first line with the join (not clear in the question) – Jonas Praem Jun 09 '17 at 14:43
  • Is it the only reason to read lines and join back into string? [max string size](https://stackoverflow.com/questions/1179983/how-many-characters-can-a-java-string-have). Is it kind of data file with column names in the first line? – Rao Jun 09 '17 at 14:52

1 Answers1

1

About Good way to avoid java.lang.IndexOutOfBoundsException

error is here:

return xmlBodyList[1..xmlBodyList.size].join("")

A good way to check variables before accessing and you can use relative range accessor:

assert xmlBodyList.size>1  //check value
return xmlBodyList[1..-1].join("")  //use relative indexes -1 = the last one

About large files processing

If you need to iterate through all the lines and execute some operation here is an example:

def stream = zipFile.getInputStream(entry)
stream.eachLine("UTF-8"){line, index->
    if(index>1){ //skip first line
        //do something here with each line from file
        println "$line $index"
    }
}

there are a lot of additional groovy methods over java.io.InputStream that could help you to process large file without loading it into memory:

http://docs.groovy-lang.org/latest/html/groovy-jdk/java/io/InputStream.html

daggett
  • 26,404
  • 3
  • 40
  • 56
  • how does this answer the question `Good way to avoid java.lang.IndexOutOfBoundsException, when joining a list` – Jonas Praem Jun 12 '17 at 07:43
  • i see that problem not in joining the list, but in building sublist. About your question `Good way to avoid java.lang.IndexOutOfBoundsException` : i suggest to use relative accessor that minimizes the error `xmlBodyList[1..-1]` the only error that remains in this case is `xmlBodyList` contains one or zero elements, that could be covered by following code: `assert xmlBodyList.size>1` – daggett Jun 12 '17 at 07:58