0

I'm using a simple python array to store words fetched from a file.

words=[]
words.append(new_word)

This code snippet works perfectly for files with small word counts. However when running the script for larger files, it hangs after some time.(when the array length is around 111166 and the letter count inside the array is high)

Is there a maximum limit for a python array? Is there a workaround to for this?

Thanks in advance.

B378
  • 987
  • 2
  • 12
  • 27
  • 1
    python doesn't produce any memory errors, it just eats the entire RAM and then locks your PC when it starts using swap/pagefile, so it's best to keep an eye on the memory, and i think you can work on data in chucks and read/write to disk every few tens of thousands to keep the memory problem manageable. – Ahmed AEK Feb 12 '22 at 10:00
  • 1
    also make sure you are not creating needless copies of data, and keep track of when your code attempts to copy the data, because you are definitely running out of memory. – Ahmed AEK Feb 12 '22 at 10:02
  • 1
    *the letter count inside the array is high* - are you appending those words as strings or something more complex? How large is your input file? – tevemadar Feb 12 '22 at 10:17
  • @tevemadar I'm appending the words as strings. The last array length was around 111166. Since a words contain 5-10 letters, the individual letter count in the array should be around 111166*5 – B378 Feb 12 '22 at 10:33
  • 1
    111166*5 is barely more than half megabytes, it's not really a challenge for PC-s and languages from the past 2 decades. You may want to show an [mre] – tevemadar Feb 12 '22 at 10:45

3 Answers3

1

sys.maxsize is the maximum indices that lists can have:

An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a 64-bit platform.

But apparently this shouldn't be your problem. There is something else going on with your code. sys.maxsize is much bigger than 111166.

.append() is also O(1) which doesn't slow your code. But when lists become larger than the place they have, new location in memory is allocated for them. This happens rarely.

S.B
  • 13,077
  • 10
  • 22
  • 49
1

You may consider using a database if your data gets too big. a viable option is SQLite which is a simple file-based database.

First create a table for your words

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute('''
        CREATE TABLE "words" (
            "id"    INTEGER,
            "word"  TEXT NOT NULL UNIQUE,
            PRIMARY KEY("id" AUTOINCREMENT)
        );
    ''')
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()
    

Now you can start adding words to the table

my_word = "cat"

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute("INSERT INTO words(word) VALUES(?)", [my_word])
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()

Now to fetch the word from list do

search_word = "cat"

try:
    connection = sqlite3.connect("database.db")
    cursor = connection.cursor()
    cursor.execute("SELECT * FROM words WHERE word=?", [search_word])
    print(cursor.fetchall())
except sqlite3.Error as error:
    print("Failed to execute the above query", error)
finally:
    if connection:
        connection.close()
omar
  • 190
  • 1
  • 10
0

The issue occurred due to invalid character available in the file where the words are extracted. For more information about this issue read UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

Thank you everyone who helped in finding the root cause.

B378
  • 987
  • 2
  • 12
  • 27