0

I have a file that records terminated by "\n" and columns terminated by X"01", the first non printing character. And it is big... 7GB which will totally screw my laptop memory.

I have done some google around how to read big file line by line by using BufferReader.. etc. But the definition of LINE is a bit different, the readline function will return the line that either ends with "\n", "^M" ..etc.

I am wondering is there a solution in Java 6/7 to read big files line by line, whose definition is the line end with \n ONLY.

Thanks!

I have a sample data set here and wondering if some one who could run against the sample data and extract the first column timestamp of every line.

here is what I have done but it only reads in the first line,

import java.io.File;
import java.io.IOException;
import java.util.Scanner;

public class ParseAdafruit {

    public static void main(String[] args) throws IOException {
        // Predefine the delimiter ^A
        String delimiter = String.valueOf((char) 1);

        Scanner scanner = new Scanner(new File("/Users/.../data")).useDelimiter("\\n");
        while (scanner.hasNext()) {
            String line = scanner.next(); // This is your line
            String[] parts = line.split(delimiter);
            System.out.println(parts[0]);
        }
    }
}

Output

2014-01-28 18:00:41.960205

btw, I had such a good time in Python by using something like this:

for line in sys.stdin: 
    print line.split(chr(1))[0]
B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178

2 Answers2

3

This is how to set a Scanner to separate the string in a file by "\n". I don't know what you do with each line, but if you want to read the file into a string use a StringBuilder (or StringBuffer for synchronization) because String is immutable.

Scanner scanner = new Scanner(new File("PathToFile")).useDelimiter("\\n");
while (scanner.hasNext()) {
    scanner.next(); // This is your line
}
user1803551
  • 12,965
  • 5
  • 47
  • 74
  • Sir, I totally got the idea behind this method... but I tried your code and it doesn't work on my data... would you please take a try on my data and see if it works on your side? https://www.dropbox.com/s/xpsw62qjwoab98q/data – B.Mr.W. Apr 20 '14 at 04:34
  • It reads the data and I'm able to print it line by line. If you link a smaller file (5 lines or so) and post what you get and why it is not what you want it would be easier to pinpoint the problem. – user1803551 Apr 20 '14 at 04:42
  • that is a small file, with only 10 lines.. I just posted what I have done in my question. And it only prints out the first column of the first line. Thanks for your answer. – B.Mr.W. Apr 20 '14 at 04:44
  • I don't see a file with 10 lines, I see one of 0.9MB with dozens of lines. – user1803551 Apr 20 '14 at 04:48
  • yes.. that is where the problem goes to.. If you do "wc -l data" and you will see 9 only... Each line contains a complete HTML file with "\n" removed, but it might contain other new line characters like ^M... – B.Mr.W. Apr 20 '14 at 04:50
  • I don't use Linux and I don't have "wc -l". Also, `^M` is not a line separator. All my text editors insist that there are many more than 9 lines. I suggest again you use a really short file, no one needs to see the 9 HTML pages when debugging. – user1803551 Apr 20 '14 at 04:55
  • Well, sir... do a search for ^M in this page: http://en.wikipedia.org/wiki/Newline, and I tried your solution with some non-real-world example, and it works... I guess I will keep googling... Thanks again. – B.Mr.W. Apr 20 '14 at 04:58
  • Java does not recognize the token `^M` as a line separator. You can use the string `System.getProperty("line.separator")` to get the line separator for your OS, and use it as a delimiter. – user1803551 Apr 20 '14 at 05:03
1

it seems that the file encoding matters, so we read in the file as UTF-8 before running the scanner

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStreamReader;
import java.util.Scanner;

...

String fileDir = "pathtodata";
try
{
    BufferedReader in = new BufferedReader(new InputStreamReader(
            new FileInputStream(fileDir), "UTF8"));

    Scanner scanner = new Scanner(in).useDelimiter("\\n");
    while (scanner.hasNext())
    {
        String line = scanner.next(); // This is your line
        String[] parts = line.split(delimiter);
        System.out.println(parts[0]);
    }
    scanner.close();
    in.close();
}
catch (Exception e)
{
    e.printStackTrace();
}
Greg
  • 652
  • 6
  • 7