1

I have a DataFrame df like so:

      0 1 2 3 4 5 ... 1154161
1     a b c d e f ... A
2     g h i j k l ... B
3     m n o p q r ... C
...
86405 Q V W X Y Z ... ZY

Which is a 86405 rows × 1154161 columns DataFrame. Notice that the index starts from 1. I am trying to assign a row with index=0:

df.loc[0] = 0

But I run into error:

MemoryError: Unable to allocate 372. GiB for an array with shape (99725281205,) and data type float32

I want it to look like:

      0 1 2 3 4 5 ... 1154161
0     0 0 0 0 0 0 ... 0       <--- add this row
1     a b c d e f ... A
2     g h i j k l ... B
3     m n o p q r ... C
...
86405 Q V W X Y Z ... ZY

Is there another way to assign without running out of memory? Maybe in chunks (preferably not)?

EDIT: Add DataFrame info as per @hpaulj request.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1154161 entries, 0 to 1154160
Columns: 86405 entries, 1 to 86405
dtypes: float32(86405)
memory usage: 371.5 GB

EDIT2: note that the letters in the sample DataFrame are actually numbers (float32) in reality

ben
  • 159
  • 1
  • 4
  • 15
  • have a look on https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type – Anurag Dabas Apr 18 '21 at 10:14
  • @AnuragDabas for the link, is there a way to do it temporarily only? (im using linux) – ben Apr 18 '21 at 10:16
  • That is a huge dataframe, ideally so many columns in any data model is discouraged. However, you can try and see, `arr = np.vstack((np.zeros(df.shape[1]),df.to_numpy()))` and then `pd.DataFrame(arr,columns=df.columns)` – anky Apr 18 '21 at 11:07
  • Any attempt to grow the frame requires making a whole new one. Looks like that request is for the data portion. Is (99725281205,) the product of the new dimensions? – hpaulj Apr 18 '21 at 11:33
  • To further the discussion, show the `df.info` and the full error traceback. – hpaulj Apr 18 '21 at 14:28
  • @anky your solution still runs into same memory error. – ben Apr 19 '21 at 00:01
  • @hpaulj added info in question – ben Apr 19 '21 at 00:03
  • So your dataframe is already large, and adding a row requires making a whole new frame, at least temporarily. – hpaulj Apr 19 '21 at 00:36

1 Answers1

0

1.https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#setting-with-enlargement

df.loc[len(df)] = 0
print (df)

2.https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

df = df.append(pd.Series(0, index=df.columns), ignore_index=True)

Source: Append an empty row in dataframe using pandas

Piotr Żak
  • 2,046
  • 5
  • 18
  • 30
  • Your suggestions will run into the same problem. They all require making a new larger dataframe (and numpy arrays to store that data). – hpaulj Apr 18 '21 at 14:57