0

I am working with a large pandas dataframe (around 180M rows, 6 columns, 6 GB) and I am facing a problem where an operation will suddenly stall at the python console, without any intimation.

The only indication of the problem is the windows resource monitor showing 0% CPU utilization of the python process, although the memory utilization of the same process will be high (99%+). The disk usage will also be at 0 MB/s.

After leaving the system as-it-is for around 10 - 15 minutes, the resource monitor will show a drop in memory utilization % as the python process is removed from the list. Meanwhile, the python console still hasn't finished and I don't see the ">>>" prompt come up.

The Python code is probably not at fault, since I can run the command (a groupby on a column followed by transform('count')) on the first 5 rows of the dataframe via .head()

I think this is due to a memory problem... I should point out that this doesn't happen immediately. On executing the code, the resource monitor shows memory usage steadily creep up and then consequently fall (probably due to swapping with disk, as there is a corresponding increase in disk utilization %). This happens multiple times (all the while showing CPU utilization in the 15% - 30% range) before halting (0% CPU and memory % maxed out).

If it's relevant, I am using PyCharm IDE to run the code. I have a python console open within PyCharm, where I am running the command. My machine has 16 GB RAM.

Any suggestions on how to troubleshoot this? My main gripe is the lack of useful error messages.

Similar Question:

Here is a similar question asked earlier on SO. The root cause in that case was that the file was stored on a network drive, which was causing latency issues. In my case, the file is stored locally.

Chaos
  • 466
  • 1
  • 5
  • 12
  • If "the python code is probably not at fault", this question doesn't belong here. If you hit swap, your computer is going to have a bad time. 6GB object sounds like you're approaching that threshold. – Drise Mar 09 '18 at 20:08
  • 1
    You say you can load the first 5 rows. What about 10, 100, 1000 rows? At what limit does it break? – Drise Mar 09 '18 at 20:12
  • Hitting swap is not going to be pleasant, but I would expect the machine to chug along and finish, and not glitch out with no error messages. I was hoping that the SO community could guide me towards debugging this. – Chaos Mar 09 '18 at 20:12
  • Unfortunately, without some *code* issue, this is off-topic. See the [Help others reproduce the problem](https://stackoverflow.com/help/how-to-ask) section. – Drise Mar 09 '18 at 20:16

0 Answers0