0

I have the following dataframe where I wanted to combine products with the same value in Match column.

enter image description here

I did that by surfing and using the following piece of code

data2['Together'] = data2.groupby(by = ['Match'])['Product'].transform(lambda x : ','.join(x))
req = data2[['Order ID', 'Together']].drop_duplicates()
req

It gives the following result

enter image description here

Question 1
I tried to understand what was happening here by applying the same transform operation on each group and the transform function operates elementwise and gives something like this. So how does pandas change the result for the command shown above? enter image description here

Dhruv
  • 117
  • 11
  • 1
    please provide all the code and data (input/output) as **text**, not images: [how to make reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway May 04 '22 at 12:38

1 Answers1

0

Although question might not be very clear, but still I think posting an answer would be better than deleting it.

So as I saw in above results when transform was applied on the whole Groupby object it returned the function applied on whole series and values duplicated whereas when I applied the function on individual series or groups it performed the transform function on each single element i.e. like the apply function of series.

After searching through the documentation and seeing the output of a custom function below this is what I get.
The groupby transform function directly passes the object to the function and checks its output whether it matches the length of passed object or it's a scaler in which it expands the output to that length.
But in series transform object, the function first tries to use apply function on the object and in case it fails then applies the function on whole object.

This is what I got after reading the source code, you can also see the output below, I created a function and called it on both transforms

def func(val):
    print(type(val))
    return ','.join(val.tolist())

# For series transforms
<class 'str'>
<class 'str'>

# For groupby transforms
<class 'pandas.core.series.Series'>

Now if I modify the function such that it can work only on whole series object and not on individual strings then observe how the series transform function behaves

# Modified function (cannot work only on strings)
def func(val):
    print(type(val))
    return val.str.split().str[0]

#For Series transforms
<class 'str'>
<class 'pandas.core.series.Series'>
Dhruv
  • 117
  • 11