1

I am working on AWS EMR Cluster. I have a data on my S3 storage. After I clean my data, i am sending to my S3 storage again via s3fs library. The code works with files which size are between 200-500 mb. However, when i am uploading between 2.0 and 2.5 gb size. The code gives a error which is "MemoryError". Do you guys any ideas or expericence about this issue?

import s3fs
bytes_to_write = nyc_green_20161.to_csv(None).encode()
fs = s3fs.S3FileSystem(key='#', secret='#')
with fs.open('s3://ludditiesnyctaxi/new/2016/yellow/yellow_1.csv', 'wb') as f:
f.write(bytes_to_write)
Andrew Gaul
  • 2,296
  • 1
  • 12
  • 19
yssefunc
  • 91
  • 3
  • 10

1 Answers1

0

I handle this problem to split my csv files. This post explains how to split csv files splitting one csv into multiple files in python

yssefunc
  • 91
  • 3
  • 10