I have a 20 gb file which looks like the following:
Read name, Start position, Direction, Sequence
Note that read names are not neccessarily unique.
E.g. a snippet of my file would look like
Read1, 40009348, +, AGTTTTCGTA
Read2, 40009349, -, AGCCCTTCGG
Read1, 50994530, -, AGTTTTCGTA
I want to be able to store these lines in a way that allows me to
- keep the file sorted based on the second value
- iterate over the sorted file
It seems that databases can be used for this.
The documentation seems to imply that dbm cannot be used to sort the file and iterate over it.
Therefore I'm wondering whether SQLite3 will be able to do 1) and 2). I know that I will be able to sort my file with a SQL-query and iterate over the resultset with sqlite3. However, will I be able to do this without running out of memory on a 4gb of RAM computer?