1

I have an sqlite file with a certain table of close to 6 millions rows. For a certain script I need this entire table read at once. When using a query to select the entire table I get a MemoryError.

Loading up to 4 millions rows goes fine using the query:

query = "select * from event where NAME NOT IN (%s) limit 4000000" % placeholders
cursor.execute(query, list_excluded_events)

Using:

print('The lenght of the results in bytes = ' + str(sys.getsizeof(results)))

Gives me the size of the result: 17873392 bytes or 17 MB.

I have 4GB memory allocated to pycharm so 2 million of the same rows should be no problem. So why do I keep getting memory errors?

import sqlite3
import sys


def connection(table, *args):
    conn = sqlite3.connect(
        table)
    cursor = conn.cursor()

    if args != ():
        list_excluded_events = args[0]
          # <- Connect to the database using the variable declared in main

        placeholder = '?'
        placeholders = ', '.join(placeholder for unused in list_excluded_events)


        query = "select * from event where NAME NOT IN (%s) limit 4500000" % placeholders

        cursor.execute(query, list_excluded_events)

    else:
        cursor.execute("select * from event")

    results = cursor.fetchall()
    #print(results)

    results = [list(elem) for elem in results]  # <- Change list of tuples to a list of lists
    print('The lenght of the results in bytes = ' + str(sys.getsizeof(results)))
    return results
Bram
  • 33
  • 7
  • ```str(len(str(results)))``` how is this equal to the size of ```results```? – Abhinav Mathur Oct 21 '20 at 10:06
  • @AbhinavMathur, I assumed each char was equal to 1 byte so the len of the results should give the the byte size. Any more memory used for the storing of the string itself is insignificant – Bram Oct 21 '20 at 10:58
  • I edited the orignal post using sys.getsizeof and now the result is even smaller so my original question still stands – Bram Oct 21 '20 at 11:02
  • 1
    My guess? You are doing something that is doubling down on the size of your results in your processing code you didn't post. The error is obviously not in the code you posted, or you couldn't print the size of the results. Related: https://stackoverflow.com/questions/4285185/upper-memory-limit – Jared Smith Oct 21 '20 at 11:05
  • @JaredSmith I added the full code in the post. the table variable is just a string pointing to the file and in the args is a list with a max size of 6. – Bram Oct 21 '20 at 11:11
  • can you run the query in pandas using read_sql method and check if you are able to import that many rows. – Sriram Arvind Lakshmanakumar Oct 21 '20 at 11:12
  • Again though, that can't be everything or you'd never see the size printed. Also note that changing the returned row tuples to lists is likely going to generate a ton of garbage. Also there's no reason (at least in the code you posted) not to deal with the rows one at a time rather than materializing all of them at once. What are you *actually* trying to do here? – Jared Smith Oct 21 '20 at 11:14
  • @SriramArvindLakshmanakumar That gives me this error: "MemoryError: Unable to allocate 40.6 MiB for an array with shape (5320735,) and data type uint64" – Bram Oct 21 '20 at 11:17
  • @JaredSmith I indeed do not see the size printed when not setting a limit, I can however see the size when selecting 4 million rows. And based on that size I know that 5 millions rows should fit easily in my memory – Bram Oct 21 '20 at 11:18
  • Ah ok gotcha. Have you tried running it directly in the terminal instead of through pycharm? – Jared Smith Oct 21 '20 at 11:20
  • @JaredSmith I have not yet tried this. When I have time I will try it and respond with the results – Bram Oct 21 '20 at 11:26
  • @JaredSmith running it on the terminal works fine. It seems to be pycharm specific. Results ends up being 22 mb large which pycharms should be able to handle easily – Bram Oct 21 '20 at 12:19
  • @Bram my guess then is that pycharm isn't using the system python but it's own and it's putting a hard memory cap on processes to prevent them from crashing the editor. – Jared Smith Oct 21 '20 at 12:33
  • @JaredSmith Thanks for your help but I think I might just miss some chromosomes or something. I for some reason had python 32 bit installed and upgrading to python 64 bit fixed the issue. – Bram Oct 21 '20 at 14:20

1 Answers1

0

Upgrading to a 64 bit python version fixed the issue.

Bram
  • 33
  • 7