Why does writing a dataframe to a file and reading it back again change the behavior of array assignment?

Question

My task involves appending two dataframes of the same kind (representing different time periods) and applying a lambda function to modify a column on the appended dataframe.

This works as expected when run normally, but fails if the appended dataframe is written to csv and read back again.

Setup

import pandas as pd
import os

os.chdir('/path//to/directory')

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')

def foo(item):
    return item.replace("0"*11,"")

Applying lambda function individually on each dataframe - works

df['material'] = df.apply(lambda x: foo(x['material']), axis=1) #Works
df2['material'] = df2.apply(lambda x: foo(x['material']), axis=1) #Works

Applying lambda function on the appended dataframe - works

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df) 
df3['material'] = df3.apply(lambda x: foo(x['material']), axis=1) #Works

Applying lambda function on the dataframe df3 if saved and read back - fails

It works if a new column is created though.

df = pd.read_csv('Data-May.csv')
df2 = pd.read_csv('Data-TillApr.csv')
df3 = df2.append(df)
df3.to_csv('Data-Appended.csv') #Writing to csv
df4 = pd.read_csv('Data-Appended.csv') #Reading it into a dataframe


df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work

df4['new'] = df4.apply(lambda x: foo(x['material']), axis=1) #Works

The dtype for df3['material'] and df4['material'] are of type o.

Traceback output for the line that doesn't work:

Traceback (most recent call last):

  File "<ipython-input-42-096f7e61633e>", line 1, in <module>
    df4['material'] = df4.apply(lambda x: foo(x['material']), axis=1) #Doesn't work

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",

line 2331, in setitem self._set_item(key, value)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py",

line 2398, in _set_item NDFrame._set_item(self, key, value)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py",

line 1759, in _set_item self._data.set(key, value)

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",

line 3731, in set group=True):

  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py",

line 4684, in _get_blkno_placements for blkno, indexer in lib.get_blkno_indexers(blknos, group):

  File "pandas/_libs/lib.pyx", line 1488, in pandas._libs.lib.get_blkno_indexers

ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

please supply some data rows so we could help understand better your question, also, consider using dtype when reading the csv to decide on column data type @Hemanth G — masasa, Jun 01 '19 at 09:33
Hi @masasa, thanks for replying. I am unable to share rows at the moment, dtype for both df3 & df4 were identical and the error remained when I used 'str' to read the offending column. — Hemanth G, Jun 03 '19 at 05:52
ill wait , a few (fake data if needed) data to replicte the problem is extremely important , more over in pandas.. — masasa, Jun 03 '19 at 05:54
Hi, I am unable to replicate the error elsewhere. On the original machine, I encountered an "environment is inconsistent" error similar to this https://stackoverflow.com/questions/55527354/the-environment-is-inconsistent-please-check-the-package-plan-carefully , the original error did not recur after I reinstalled anaconda. All is well for now, I hope I don't see it again. — Hemanth G, Jun 04 '19 at 07:40

Why does writing a dataframe to a file and reading it back again change the behavior of array assignment?

0 Answers0