I have one pandas Dataframe which looks like below:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','cricket','cricket','cricket','cricket'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','mohit','mahesh','mahesh','mahesh','mahesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','pune','surat','surat','panji','panji'],
'person_symbol': ['ram','mum','mum','ram','moh','mah','sur','sur','mah'],
'person_count': ['10','14','25','20','11','34','23','43','34']})
df = df[['sport_name','person_name','city','person_symbol','person_count']]
print df
sport_name person_name city person_symbol person_count
0 football ramesh mumbai ram 10
1 football ramesh mumbai mum 14
2 football ramesh delhi mum 25
3 football ramesh delhi ram 20
4 football mohit pune moh 11
5 cricket mahesh surat mah 34
6 cricket mahesh surat sur 23
7 cricket mahesh panji sur 43
8 cricket mahesh panji mah 34
From this Dataframe, I want to create two column dataframe named as "derived_symbol" and "person_count". For creating it I need to focus on some condition like below:
- derived_symbol need to form for each unique city and person_symbol.
- person_count is calculated based on what the derived_symbol is.
Example:
Consider First set of Dataframe i.e. sport_name = football and person_name = rakesh Which are four rows:
sport_name person_name city person_symbol person_count
0 football ramesh mumbai ram 10
1 football ramesh mumbai mum 14
2 football ramesh delhi mum 25
3 football ramesh delhi ram 20
In this above four rows, If we look at the city and person_symbol column values we can see four different unique values are available which are mumbai, delhi, ram and mum. so for this above four rows our output looks like below:
derived_symbol person_count
football.ramesh.TOTAL.mumbai_count 24
football.ramesh.TOTAL.delhi_count 45
football.ramesh.TOTAL.ram_count 30
football.ramesh.TOTAL.mum_count 39
Final Expected Output:
derived_symbol person_count
football.ramesh.TOTAL.mumbai_count 24
football.ramesh.TOTAL.delhi_count 45
football.ramesh.TOTAL.ram_count 30
football.ramesh.TOTAL.mum_count 39
football.mohit.TOTAL.pune_count 11
football.mohit.TOTAL.moh_count 11
cricket.mahesh.TOTAL.surat_count 57
cricket.mahesh.TOTAL.panji_count 77
cricket.mahesh.TOTAL.sur_count 66
cricket.mahesh.TOTAL.mah_count 68
Edit : Dataframe has dates too like below:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
'person_count': ['10','14','25','20','34','23','43','34'],
'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]
print df
sport_name person_name city person_symbol person_count month
0 football ramesh mumbai ram 10 2017-01-23
1 football ramesh mumbai mum 14 2017-01-23
2 football ramesh delhi mum 25 2017-01-23
3 football ramesh delhi ram 20 2017-01-23
4 football ramesh mumbai ram 34 2017-02-26
5 football ramesh mumbai mum 23 2017-02-26
6 football ramesh delhi mum 43 2017-02-26
7 football ramesh delhi ram 34 2017-02-26
Expected output:
derived_symbol person_count month
football.ramesh.TOTAL.mumbai_count 24 2017-01-23
football.ramesh.TOTAL.delhi_count 45 2017-01-23
football.ramesh.TOTAL.ram_count 30 2017-01-23
football.ramesh.TOTAL.mum_count 39 2017-01-23
football.ramesh.TOTAL.mumbai_count 57 2017-02-26
football.ramesh.TOTAL.delhi_count 77 2017-02-26
football.ramesh.TOTAL.ram_count 68 2017-02-26
football.ramesh.TOTAL.mum_count 66 2017-02-26
I did following to calculate person_count month wise:
df = pd.DataFrame({'sport_name': ['football','football','football','football','football','football','football','football'],
'person_name': ['ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh','ramesh'],
'city': ['mumbai', 'mumbai','delhi','delhi','mumbai', 'mumbai','delhi','delhi'],
'person_symbol': ['ram','mum','mum','ram','ram','mum','mum','ram'],
'person_count': ['10','14','25','20','34','23','43','34'],
'month': ['2017-01-23','2017-01-23','2017-01-23','2017-01-23','2017-02-26','2017-02-26','2017-02-26','2017-02-26']})
df = df[['sport_name','person_name','city','person_symbol','person_count','month']]
df['person_count'] = df['person_count'].astype(int)
df1=df.set_index(['sport_name','person_name','person_count','month']).stack().reset_index(name='val')
df1['derived_symbol'] = df1['sport_name'] + '.' + df1['person_name'] + '.TOTAL.' + df1['val'] + '_count'
df2 = df1.groupby(['derived_symbol','month'])['person_count'].sum().reset_index(name='person_count')
print (df2)