I have a character file of 1.99 GB. Now, I want to extract millions of sub-sequences from that file randomly for example from position 90 to 190, 10 to 110, 50000 to 50100 etc. (each of 100 characters long).
I usually do it using,
FileChannel channel = new RandomAccessFile(file , "r").getChannel();
ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
Charset chars = Charset.forName("ISO-8859-1");
CharBuffer cbuf = chars.decode(buffer);
String sub = cbuf.subSequence(0, 100).toString();
System.out.println(sub);
But, for 1.99 gb file above code gives error,
java.lang.IllegalArgumentException
at java.nio.CharBuffer.allocate(CharBuffer.java:328)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
at java.nio.charset.Charset.decode(Charset.java:791)
So, I used following code,
FileChannel channel = new RandomAccessFile(file , "r").getChannel();
CharBuffer cbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()).asCharBuffer() ;
String sub = cbuf.subSequence(0, 100).toString();
System.out.println(sub);
which does not gives above error but returns output:
ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹
Which should be "011111000000........"
Can anybody help me why this things happening and how to solve it ?