My task involves appending two dataframes of the same kind (representing different time periods) and applying a lambda function to modify a column on the appended dataframe.
This works as expected when run normally, but fails if the appended dataframe is written to csv and read back again.
Setup
import pandas as pd
import os
os.chdir('/path//to/directory')
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
def foo(item):
return item.replace("0"*11,"")
Applying lambda function individually on each dataframe - works
df['material'] = df.apply(lambda x: foo(x['material']), axis=1) #Works
df2['material'] = df2.apply(lambda x: foo(x['material']), axis=1) #Works
Applying lambda function on the appended dataframe - works
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3['material'] = df3.apply(lambda x: foo(x['material']), axis=1) #Works
Applying lambda function on the dataframe df3 if saved and read back - fails
It works if a new column is created though.
df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3.to_csv('Data-Appended.csv') #Writing to csv
df4 = pd.read_csv('Data-Appended.csv') #Reading it into a dataframe
df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work
df4['new'] = df4.apply(lambda x: foo(x['material']), axis=1) #Works
The dtype for df3['material'] and df4['material'] are of type o.
Traceback output for the line that doesn't work:
Traceback (most recent call last): File "<ipython-input-42-096f7e61633e>", line 1, in <module> df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
line 2331, in setitem self._set_item(key, value)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",
line 2398, in _set_item NDFrame._set_item(self, key, value)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py",
line 1759, in _set_item self._data.set(key, value)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
line 3731, in set group=True):
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",
line 4684, in _get_blkno_placements for blkno, indexer in lib.get_blkno_indexers(blknos, group):
File "pandas/_libs/lib.pyx", line 1488, in pandas._libs.lib.get_blkno_indexers ValueError: Buffer has wrong number of dimensions (expected 1, got 0)