0

I am working on unpivoting a pandas dataframe and I'm running into a Memory error associated with the following line of code (in conjunction with a melt() operation preceding it):

delimited_table = df["value"].str.split(",", expand=True)

The dataframe looks a bit like this:

+----------+--------+--+
| ContactID| value  |  |
+----------+--------+--+
| pd.Data  | A,C    |  |
| pd.Data  | D,E,F  |  |
| pd.Data  | G,H,I,K|  |
| ...      | ...    |  |
+----------+--------+--+

For kicks and giggles, here's the exact error code:

MemoryError: Unable to allocate array with shape (92, 12513354) and data type object

My problem is I can't delete rows because it's all necessary data, and the df is 12.5 million rows, so obviously taking the whole column and stacking it into my memory (even with 64-bit) is not feasible. What are some ways I can iterate row by row in a pandas df, apply the str.split method, and return it as delimited values while making sure the number of columns is consistent for all rows to accommodate expansion?

Tfmgvi_971
  • 81
  • 7
  • 2
    Solution 1: buy a lot more RAM. Solution 2: split your data into multiple arrays, handle each one individually (by this I mean: keep at most 1 array in RAM at any given time. Operate on a piece of data, save it to disk, go on with the next chunk ecc) – Bakuriu Mar 09 '20 at 19:48
  • See https://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas. – AMC Mar 09 '20 at 19:55
  • Is there any way around this operation? Why do you need to split and expand? – AMC Mar 09 '20 at 19:55
  • @AMC thanks for the reference, I'll take a look. The reason I'm doing this is because the data is structured as comma-separated values, and I'm trying to ultimately find out how many A's I have, how many B's, how many C's, etc – Tfmgvi_971 Mar 09 '20 at 20:13
  • @Tfmgvi_971 I'm guessing it can't be read as CSV? By the way, iff that's all you need to do, then the plain old csv module and a loop should be fine. – AMC Mar 09 '20 at 20:13
  • @AMC Also, I don't know how many of each bin I have, so it would seem tedious to find out how many of each thing I have. I could have only A,B,C,D's in my table, but I could have dozens more that I don't have time to read and factor for – Tfmgvi_971 Mar 09 '20 at 20:16

0 Answers0