My dataframe df
is:
data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005', '2010', '2010','2010','2010','2010','2010','2010','2010', '2010'],
'Votes':[30, 50, 20, 26, 30, 45, 20, 46, 80, 60, 46, 95, 60, 10, 95, 16, 65, 35, 50, 100, 70, 26, 180, 100, 120, 46, 80],
'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df
Election Year Votes Party Region
0 2000 30 A a
1 2000 50 B a
2 2000 20 C a
3 2000 26 A b
4 2000 30 B b
5 2000 45 C b
6 2000 20 A c
7 2000 46 B c
8 2000 80 C c
9 2005 60 A a
10 2005 66 B a
11 2005 95 C a
12 2005 60 A b
13 2005 10 B b
14 2005 95 C b
15 2005 16 A c
16 2005 65 B c
17 2005 35 C c
18 2010 50 A a
19 2010 100 B a
20 2010 70 C a
21 2010 26 A b
22 2010 180 B b
23 2010 100 C b
24 2010 120 A c
25 2010 46 B c
26 2010 80 C c
I want to the Regions in which top two parties (in terms of getting maximum aggregate votes) of 2000 have continuously increased their votes in the next 2 elections.. So the desired output is:
Party Region
B a
B c
C b
First I tried to get the top two parties based on aggregate total votes of year 2000. This is giving Party "C" and "B".
df1=df['Election Year'].eq('2000')
top_2=df[m].groupby(['Election Year','Party'],as_index=False)
['Votes'].sum().sort_values('Votes',ascending=False).head(2)['Party'].values
top_2
This gives Party "C" and "B".
Now how can I check for the regions in which Votes of these parties have increased in subsequent years?