While loading my dataset using python code on the AWS server using Spyder, I get the following error:
File "<ipython-input-19-7b2e7b5812b3>", line 1, in <module>
ffemq12 = load_h2odataframe_returns(femq12) #; ffemq12 = add_fold_column(ffemq12)
File "D:\Ashwin\do\init_sm.py", line 106, in load_h2odataframe_returns
fr=h2o.H2OFrame(python_obj=returns)
File "C:\Program Files\Anaconda2\lib\site-packages\h2o\frame.py", line 106, in __init__
column_names, column_types, na_strings, skipped_columns)
File "C:\Program Files\Anaconda2\lib\site-packages\h2o\frame.py", line 147, in _upload_python_object
self._upload_parse(tmp_path, destination_frame, 1, separator, column_names, column_types, na_strings, skipped_columns)
File "C:\Program Files\Anaconda2\lib\site-packages\h2o\frame.py", line 321, in _upload_parse
ret = h2o.api("POST /3/PostFile", filename=path)
File "C:\Program Files\Anaconda2\lib\site-packages\h2o\h2o.py", line 104, in api
return h2oconn.request(endpoint, data=data, json=json, filename=filename, save_to=save_to)
File "C:\Program Files\Anaconda2\lib\site-packages\h2o\backend\connection.py", line 415, in request
raise H2OConnectionError("Unexpected HTTP error: %s" % e)
I am running this python code on Spyder on the AWS server. The code works fine up to half the dataset (1.5gb/3gb) but throws an error if I increase the data size. I tried increasing the RAM from 61gb to 122 GB but it is still giving me the same error.
Loading the data file
femq12 = pd.read_csv(r"H:\Ashwin\dta\datafile.csv")
ffemq12 = load_h2odataframe_returns(femq12)
Initializing h2o
h2o.init(nthreads = -1,max_mem_size="150G")
Loading h2o
Connecting to H2O server at http://127.0.0.1:54321... successful. -------------------------- ------------------------------------ H2O cluster uptime: 01 secs H2O cluster timezone: UTC H2O data parsing timezone: UTC H2O cluster version: 3.22.1.3 H2O cluster version age: 18 days H2O cluster total nodes: 1 H2O cluster free memory: 133.3 Gb H2O cluster total cores: 16 H2O cluster allowed cores: 16 H2O cluster status: accepting new members, healthy H2O connection proxy: H2O internal security:
False H2O API Extensions: Algos, AutoML, Core V3, Core V4 Python version: 2.7.15 final
I suspect it is a memory issue. But even after increasing RAM and max_mem_size, the dataset is not loading.
Any ideas to fix the error would be appreciated. Thank you.