I am applying a function to a Pandas DataFrame
, and returning a tuple
, to cast into multiple DataFrame
columns using zip(* )
.
The returned tuple
, contains a list
, containing one or more tuples
.
In cases where at least one of the the nested lists
contain a different count of tuples
from the rest of the lists
, everything works fine.
In rare cases where the function returns all nested lists
with equal tuple
counts within, an AssertionError: Shape of new values must be compatible with manager shape
is raised.
I suspect Pandas is seeing the consistent nested list
lengths and is trying to unpack the list(tuples)
into separate columns.
How can I force Pandas to always store the returned list
as is, regardless of the conditions above?
(Python 3.7.4, Pandas 1.0.3)
Code that works:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 2},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))
Code that raises AssertionError: Shape of new values must be compatible with manager shape
:
import pandas as pd
import numpy as np
def simple_function(type_count):
calculated_value1 = np.random.randint(5)
calculated_value2 = np.random.randint(5)
types_list = [tuple((x, calculated_value2)) for x in range(0, type_count)]
return calculated_value1, types_list
df = pd.DataFrame([{'name': 'Joe', 'types': 1},
{'name': 'Beth', 'types': 1},
{'name': 'John', 'types': 1},
{'name': 'Jill', 'types': 1},
], columns=['name', 'types'])
df['calculated_result'], df['types_list'] = zip(*df['types'].apply(simple_function))