0

I was just wondering, can you read textiles or do you have to import text files into java (like as a string or array list) to be able to use the information on the textfile.

For example I have a file that looks similar to this

1  34  12  43  65
1  44  8   45  77
2  34  10  56  87
6  43  6   76  89
6  65  7   23  90

where each column down stands for something (maybe column one is item ID, column two is price, and column three is month). And then lets say I have 20gb of information layed out this way. Can I use java to make a data summary of this information or is the file just simply too large? I tried importing the 20gb file as an ArrayList, but after waiting 10 min and the arraylist still filling, i gave up.

I was thinking that maybe if I could interact directly with the file instead of importing it as a array list it may work.

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Danny
  • 117
  • 1
  • 7
  • 3
    Of course you can read from a file in Java, not just hard-code your numbers! Look up "Java I/O". – Sergey Kalinichenko Jun 11 '13 at 20:12
  • 2
    `i` is for variables, I is you. – Maroun Jun 11 '13 at 20:14
  • 1
    My bad Andrew Thompson. I was actually editing it and couldnt submit my changes because you already had. Dashblinkenlight, I know i/o. i use the scanner/filereader/bufferedreader to get the file in java. But after that, how can i interact with the program (like search it for when price =2)? – Danny Jun 11 '13 at 20:15
  • _"Can I use java to make a data summary of this information?"_ Do you have 20 GB of memory? If yes, then of course you can hold all the data in memory simultaneously. If not, you'll have to process the file as you read it and only remember some of the information. – Petr Janeček Jun 11 '13 at 20:16
  • @Slanec When did the max. memory of a Java app. rise (suddenly) to include 20Gb? – Andrew Thompson Jun 11 '13 at 20:17
  • Andrew, I like that idea. Just read all of the item ID 1s, getting that average, then proceeding to the next item ID. Problem is I have no idea how to do that and have no idea how many items there are. – Danny Jun 11 '13 at 20:17
  • Check out this post for ideas on how to proceed. http://stackoverflow.com/questions/2788080/reading-a-text-file-in-java – myqyl4 Jun 11 '13 at 20:24
  • [RandomAccessFile](http://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html) might be interesting for you. You could calculate the positions of the numbers if you know the dimensions of the matrix. – pvorb Jun 11 '13 at 20:26

2 Answers2

1

You can certainly use Java to summarize this information. For example, if your goal is to compute each column's minimum, maximum, and mean, you might write something like:

final BufferedReader br =
    new BufferedReader(new FileReader("/this/is/the/path/to/the/file.txt"));
final int[] mins = { Integer.MAX_VALUE, Integer.MAX_VALUE, Integer.MAX_VALUE,
                     Integer.MAX_VALUE, Integer.MAX_VALUE };
final int[] maxes = { Integer.MIN_VALUE, Integer.MIN_VALUE, Integer.MIN_VALUE,
                      Integer.MIN_VALUE, Integer.MIN_VALUE };
final double[] sums = { 0.0, 0.0, 0.0, 0.0, 0.0 };
int count = 0;
try {
    String line;
    while((line = br.readLine()) != null) {
        ++count;
        final String[] values = line.split("\\s+");
        for(int i = 0; i < 5; ++i) {
            final int value = Integer.parseInt(values);
            if(value < mins[i]) {
                mins[i] = value;
            }
            if(value > maxes[i]) {
                maxes[i] = value;
            }
            sums[i] += value;
        }
    }
} finally {
    br.close();
}
final double[] averages = new double[sums.length];
for(int i = 0; i < sums.length; ++i) {
    averages = sums[i] / count; 
}
System.out.println(Arrays.toString(mins));
System.out.println(Arrays.toString(maxes));
System.out.println(Arrays.toString(averages));
ruakh
  • 175,680
  • 26
  • 273
  • 307
0

The basic approach with a file that large would be to read a little, process that amount, clear the details from memory, then loop through the rest of the file doing the same thing.

I like that idea. Just read all of the item ID 1s, getting that average, then proceeding to the next item ID. Problem is I have no idea how to do that and have no idea how many items there are.

I don't see how that is a problem if you just want averages for each column. There are 5 columns so keep 5 attributes (e.g. long columnTotal11 .. columnTotal5). Add the values for each line to the respective column total and increment lineCount.

At the end of file, divide the column total for each column by the line count to get the average for that column.

  1. As pointed out, a long might not be big enough to hold the sum, so the problem might need BigInteger instead.
Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433