0

I have extension(example .exe,.py,.xml,.doc etc) table in my dataframe. after running on terminal I am getting above error on large data set.

encoder = OneHotEncoder(handle_unknown='ignore')
encoder.fit(features['Extension'].values.reshape(-1, 1))
temp = encoder.transform(features['Extension'].values.reshape(-1, 1)).toarray()  #GETTING ERROR on this
print("Size of array in bytes",getsizeof(temp))
print("Array :-",temp)
print("Shape :- ",features.shape, temp.shape)
features.drop(columns=['Extension'], axis=1, inplace=True)
dump(encoder, os.path.join(os.getcwd(), 'model_dumps', 'encoder.pkl'))
features.drop(columns=['Extension'], axis=1, inplace=True)
features = featureScaling(features)
features = np.concatenate((features, temp), axis=1)

OUTPUT -

1) Size of array in bytes :- 8884558912
2) Array :- 
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]]
3)Shape :- (323310, 8) (323310, 3435)
sheel
  • 467
  • 8
  • 23
  • does this answer your question https://stackoverflow.com/questions/57507832/unable-to-allocate-array-with-shape-and-data-type – Ehtisham Ahmed Nov 23 '20 at 09:57
  • @Ehtisham I saw that post but its not good idea to play with kernel at production level. – sheel Nov 23 '20 at 10:20

1 Answers1

1

that's funny.

MemoryError: Unable to allocate 8.27 GiB for an array with shape (323313, 3435) and data type float64

most modern computers don't have more than 8 Gb of RAM. Looks like you have 8 and python is not able to fit all this data in the memory. Try buying another computer with more ram or upgrade your existing one. This will definitely fix the issue.

  • It's not about the 8Gb ram. In future if I got millions of row, even 32Gb RAM will too less for me. I need generic solution on that. – sheel Nov 23 '20 at 10:22
  • @sheel right now the problem is with the RAM. To use the data you need to store it in RAM. If you don't have enough RAM you can't use the data. If you cannot use the data your code crashes. –  Nov 23 '20 at 11:17
  • 1
    I remembered about one thing. You can just write your code in the cloud IDEs like CS50 IDE (ide.cs50.io). If the cloud IDE has enough RAM everything will work as expected. –  Nov 23 '20 at 11:27