0

I have a text file that has text on each line, for example:


1245 Dog Husky

2356 Cat Tabby

3476 Dog Pug


with a huge amount of kind of arbitrary data repeated per line, about 10,000 lines, for the sake of argument, lets say it tends to infinity.

I have code that reads this data and stores it in an object, pseudo code follows;

Pet P; 
lineInput = reader.readLine();  //where reader is reading the above mentionedfile
P.id = lineInput.split('\t')[0]     
P.type = lineInput.split('\t')[1]   //Assigning the parts of the line to it's relevant data members 
P.breed = lineInput.split('\t')[2]  

Now here's the problem, considering I need to be able to sort, search and display these values as fast as possible, I don't know what my best option is, I came up with two methods which can be seen below

Method 1: Store all the Objects in an array list based on their starting id number

ArrayList<Pet> idStartsWith1;
if(P.id starts with 1)
     idStartsWith1.add(P);    // "1245    Dog    Husky" will be added here

ArrayList<Pet> idStartsWith2;
if(P.id starts with 2)
     idStartsWith2.add(P);   // "2356    Cat    Tabby" will be added here

ArrayList<Pet> idStartsWith3;
if(P.id starts with 3)
     idStartsWith3.add(P);   // "3476    Dog    Pug" will be added here

I think this would be the faster method, as these arraylists are already in the process memory, but I fear that it would overload the memory, and cause issues. (Remember, the number of lines in the text file tends to infinite)

Method 2:Write all the Objects to a .dat file based on their starting id number

Writer writer1 = new Writer("idStartsWith1.dat");    //writer1 will write to file "idStartsWith1.dat"
if(P.id starts with 1)
     writer1.write(P);    // "1245    Dog    Husky" will be writen to this file 

Writer writer2 = new Writer("idStartsWith2.dat");    //writer2 will write to file "idStartsWith2.dat"
if(P.id starts with 2)
     writer2.write(P);

Writer writer3 = new Writer("idStartsWith3.dat");    //writer3 will write to file "idStartsWith3.dat"
if(P.id starts with 3)
     writer3.write(P);

This will prevent the process memory from being overloaded but I fear that having to open, then read, then close the file each time I need to search and display a Pet, will add significant delays to the runtime.

Which of these two methods would work better? or is there another more efficient method that would not occur to a java novice like me?

EltoCode
  • 65
  • 4
  • you say the lines tend to infinity, but you say about 10.000 lines... on most device, 10K, 100K, 1M, 10M of such lines, you could consider memory infinite as well... have you any upper limit to the number of lines? – Matteo Mar 02 '20 at 17:28
  • 2
    Does this answer your question? [Sort a file with huge volume of data given memory constraint](https://stackoverflow.com/questions/2087469/sort-a-file-with-huge-volume-of-data-given-memory-constraint) – GotoFinal Mar 02 '20 at 20:29
  • 1
    You need a database. – rustyx Mar 02 '20 at 21:56
  • @Matteo I suppose an upper limit would be around several million – EltoCode Mar 03 '20 at 18:08

1 Answers1

0

Data of many applications are small enough to fit into main memory of a desktop computed. When your file has 1 GB, then you need some 3 GB of main memory and that's no problem for most desktops. On mobile, it's different.

Nothing can be as fast as working with main memory, when done right. ArrayList is not usable for searching, but a Map is.

You can use a database instead and you probably should. It's much slower than having all data in main memory, but still very fast, assuming you do it right (learn about indexes etc.). Most database can import a CSV file directly and are able to answer all your queries - filtering, sorting and joining other tables are what databases exists for.

maaartinus
  • 44,714
  • 32
  • 161
  • 320