I'm trying to find patterns of words for a huge input. I was using a dictionary for this purpose, and after some hours the program crashed with MemoryError
.
I modified the program. I created a database via MySQLdb and I inserted there the values of the pattern-index
. So for every word I check if it is in the index and if not I write it into the index with a value. Problem is that the database approach is too slow.
I was wondering if there is any way to combine dictionaries and database for example:
if ram <90% usage:
seek into dict
append to dict
else:
if not (seek into dict):
seek into database
append to database
Using a dictionary for the same purpose of inputting 100 kb of data takes ~1.5 sec
Using a database for the same input takes ~84 sec
Original input is 16 GB . I do not know yet how much it will take to process.