0

I have a dataframe

df = pd.DataFrame({
        'Names': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
        'Value': ['A1','A2','A3','B1','B2','C1','C2','C3']})

#  Names Value
#0     A    A1
#1     A    A2
#2     A    A3
#3     B    B1
#4     B    B2
#5     C    C1
#6     C    C2
#7     C    C3

I wish to get it into the current state:

#  Names Values
#0     A    [A1, A2, A3]
#1     B    [B1, B2]
#2     C    [C1, C2, C3]

Are there any inbuilt functions in the pandas or numpy packages that can simplify this? Or am I forced to iterate it through using default python?

ycx
  • 3,155
  • 3
  • 14
  • 26
  • Yup thought about groupby but using `apply` is not really what I was looking for. I suppose its close. – ycx Oct 31 '19 at 04:45

2 Answers2

1

Try this out:

df.groupby('Names')['Value'].apply(list).reset_index(name='Values')
Ha Bom
  • 2,787
  • 3
  • 15
  • 29
  • 1
    I was thinking about `groupby` and `apply` too, but I didn't want to do the cop-out answer of `apply`. I think the link provided by other user gave me the answer in `df.groupby('Names').agg(lambda x: list(x))` – ycx Oct 31 '19 at 04:54
  • 2
    @ycx Actually, you could skip the `lambda`. `df.groupby('Names')['Value'].agg(list)` is all you need. – Henry Yik Oct 31 '19 at 04:56
  • Ah thats nice, you have a way to shorten `lambda x: ', '.join(list(x))`? – ycx Oct 31 '19 at 05:04
1

It's very simple:

df.groupby('Names')['Value'].apply(list).reset_index(name='Values')
pissall
  • 7,109
  • 2
  • 25
  • 45
  • Thanks for the link. I was thinking about `groupby` and `apply` too, but I didn't want to do the cop-out answer of `apply`. I think your link provided the answer in `df.groupby('Names').agg(lambda x: list(x))` – ycx Oct 31 '19 at 04:52