My dataset looks like this:
df = pd.DataFrame({"A": [1, 1, 1, 1, 2, 2, 2, 3, 3],
"B": ["a", "b", "c", "c", "b", "b", "d", "a", "c"],
"C": ["x", "x", "y", "x", "x", "y", "z", "y", "z"]})
>>> df
A B C
0 1 a x
1 1 b x
2 1 c y
3 1 c x
4 2 b x
5 2 b y
6 2 d z
7 3 a y
8 3 c z
I want to perform a groupby using the values of the A column. Specifically, this is the desired output:
A B C
0 1 a b c c [x, x, y, x]
1 2 b b d [x, y, z]
2 3 a c [y, z]
In other words, I want to join all the values of the B column using a single space, and I want to create a list with all the values of the C column.
So far I have been able to create the two desired columns in this way:
B = df.groupby("A")["B"].apply(lambda x: " ".join(x))
C = df.groupby("A")["C"].apply(list)
I am trying to modify both columns of my dataframe in place with a single groupby operation. Is it possible?