The raw data shape is (200000 * 15) but after pre-processing the data and applying OneHotEncoding
the data dimension has got increased to (200000 * 300).
The data needs to be trained with Linear Regression
, XGBoost
and RF
for predictive modeling. Earlier LabelEncoder
had been used and the results are not satisfactory.
(200000 * 300) is consuming a whole lot of RAM and slapping MemoryError
while training the data.
- Running on
Jupyter Notebook
AWS with 16 gb RAM - Using
sklearn
for most of the ML part - Data is in
csv
format (loaded asDataFrame
in Python)
Would appreciate any suggestion !