I encountered a strange behavior with Pandas .groupby() with .transform(). Here is the code to generate the dataset:
df = pd.DataFrame({"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,
"Random_Number": [1223344, 373293832, 32738382392, 7273283232, 8239329, 23938832],
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"]})
This is the function I wrote for the transform().
# this function will attach each value in string col with the number of elements in the each city group
# if the col type is not an object, then return 0 for all rows.
def some(x):
if x.dtype == 'object':
return x + '--' + str(len(x))
else:
return 0
Then I used my function with transform - works flawlessly and get what I want.
df_2 = stack.groupby(["City"])['Name','Random_Number'].transform(some)
HOWEVER, the strange thing happens when I switch the order of the col from ['Name','Random_Number']
to ['Random_Number','Name']
df_2 = stack.groupby(["City"])['Random_Number','Name'].transform(some)
When you look at cells in the 'Name'
column it seems like pandas puts everything into one cell multiple times:
df_2.iloc[0,1]
# Return:
# 0 Alice--4
# 1 Bob--4
# 3 Mallory--4
# 4 Bob--4
# Name: Name, dtype: object
Why is this happening?