You probably have one huge line in your file containing the array. Here you get an exception because you are trying to build a CharBuffer that's too big (most likely an integer that became negative after going out of bound). Maximum array/string size in java is 2^31-1 (Integer.MAX_VALUE -1) (see this thread). You say that you have a 3GB record, with 1B per char, that make 3 billion characters which is more than 2^31 which is roughly equal to 2 billion.
TWhat you could do is a bit hacky but since you only have one key with a big array, it may work. Your json file might look like:
{
"key" : ["v0", "v1", "v2"... ]
}
or like this but I think in your case it is the former:
{
"key" : [
"v0",
"v1",
"v2",
...
]
}
Thus you could try changing the line delimiter used by hadoop to "," as here. Basically, they do it like this:
import org.apache.hadoop.io.LongWritable
import org.apache.hadoop.io.Text
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat
def nlFile(path: String) = {
val conf = new Configuration
conf.set("textinputformat.record.delimiter", ",")
sc.newAPIHadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text], conf)
.map(_._2.toString)
}
Then you could read your array and would just have to remove the JSON brackets by yourself with something like this:
nlFile("...")
.map(_.replaceAll("^.*\\[", "").replaceAll("\\].*$",""))
Note that you would have to be more careful if your records can contain the characters "[" and "]" but here is the idea.