1

I have a dataframe, its shape is "(4255300, 10)". I have to open this into csv file, but due to size restrictions of EXcel, this is not possible. I tried to split df row-wise (Pandas: split dataframe into multiple dataframes by number of rows) but only index numbers are getting inserted into splits(I wrote those splits into csv files). Also I tried to write this df into text file, (np.savetxt('desktop/s2.txt', z.values, fmt='%d', delimiter="\t") ) but wrong data is getting inserted into text file. There is no issue with width of df, only problem is length of it i.e.number of rows. Can anyone help me with this?

abi bose
  • 41
  • 3

1 Answers1

0

You could split the DataFrame into smaller chunks and then export like this:

# Creating a DataFrame with some numbers
df = pd.DataFrame(np.random.randint(0,100,size=(42000, 10)), index=np.arange(0,42000)).reset_index()
# Setting my chunk size
chunk_size = 10000
# Assigning chunk numbers to rows
df['chunk'] = df['index'].apply(lambda x: int(x / chunk_size))
# We don't want the 'chunk' and 'index' columns in the output
cols = [col for col in df.columns if col not in ['chunk', 'index']]
# groupby chunk and export each chunk to a different csv.
i = 0
for _, chunk in df.groupby('chunk'):
    chunk[cols].to_csv(f'chunk{i}.csv', index=False)
    i += 1
pnovotnyq
  • 547
  • 3
  • 12