I'm learning Python by working on a project - a Facebook message analyzer. I downloaded my data, which includes a messages.htm file of all my messages. I'm trying to write a program to parse this file and output data (# of messages, most common words, etc.)
However, my messages.htm file is 270MB. When creating a BeautifulSoup object in the shell for testing, any other file (all < 1MB) works just fine. But I can't create a bs object of messages.htm. Here's the error:
>>> mf = open('messages.htm', encoding="utf8")
>>> ms = bs4.BeautifulSoup(mf)
Traceback (most recent call last):
File "<pyshell#73>", line 1, in <module>
ms = bs4.BeautifulSoup(mf)
File "C:\Program Files (x86)\Python\lib\site-packages\bs4\__init__.py", line 161, in __init__
markup = markup.read()
File "C:\Program Files (x86)\Python\lib\codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
So I can't even begin working with this file. This is my first time tackling something like this and I'm only just learning Python so any suggestions would be much appreciated!