0

I got a csv-file containing names and income. Some names are shown multiple times. I wanted to merge these to get only 1 unique name each with the income next to it using Pandas.

I thought pivot would be the solution for my problem. I tried the following:

df = pd.read_csv("properties.csv")
df = df.iloc[1:]
df = pd.DataFrame(df, columns= ['income', 'names'])
df['source'] = df['income'].astype(int)

test = pd.pivot_table(df, index='names', values='income')

What the problem is that I would like to numbers itself rather than the average.

For example:

name1: 2,3,2,3

name2: 1,2,4,1

Instead of:

name1: 2.5

name2: 2

Hiach
  • 1
  • 1
    Looks like duplicate to https://stackoverflow.com/questions/22219004/grouping-rows-in-list-in-pandas-groupby – 9dogs May 15 '19 at 19:32
  • default `aggfunc` of pivot_table is numpy.mean, which is why you're getting the average. – Snehaa Ganesan May 15 '19 at 19:34
  • @9dogs In my case I have over 500 names. So applying the solutions described in that post does not seem to be possible. But I could be missing something? – Hiach May 15 '19 at 20:03
  • did you try? you would pass it to to the `aggfunc` parameter: `pd.pivot_table(df, index='names', values='income', aggfunc=list)` – dan_g May 15 '19 at 20:13
  • @dan_g Thank you for your solution. I wasn't aware of the aggfunc possibilities besides average and sum. – Hiach May 15 '19 at 20:19

0 Answers0