1

I got the idea to try and visualize data for election donations from the fec website. Basically, I would like to create a stacked bar chart, with the X-axis being the State, Y-axis being the donated amount, and the 'stacks' being the different candidates, showing how much each candidate received from each state.

Code:

import matplotlib.pyplot as plt
import pandas as pd
from pathlib import Path

pathName = r"R:\Downloads\indiv20\by_date"
dataDir = Path(pathName)
filename = "itcont_2020_20010425_20190425.txt"
fullName = dataDir / filename
data = pd.read_csv(fullName, low_memory=False, sep="|", usecols=[0, 9, 12, 14])

data.columns = ['Filer ID', 'State', 'Occupation', 'Donation Amount ($)']
data = data.dropna(subset=['Donation Amount ($)'])

donations_by_state = data.groupby('State').sum()

plt.bar(donations_by_state.index, donations_by_state['Donation Amount ($)'])
plt.ylabel('Donation Amount ($)')
plt.xlabel('State')
plt.title('Donations per State')

plt.show()

This plots the total contributions per state, and works great. However, when I try this groupby method to group all the data I want, I'm not sure how to plot a stacked bar chart from this data:

donations_per_candidate_per_state = data['Donation Amount ($)'].groupby([data['State'], data['Filer ID']]).sum()

State  Filer ID 
AA     C00005561      350
       C00010603      600
       C00042366      115
       C00309567     1675
       C00331694     2500
       C00365536      270
       C00401224     4495
       C00411330      100
       C00492991      300
       C00540500      300
       C00641381      250
       C00696948     2800
       C00697441      250
       C00699090       67
       C00703108     1400
AB     C00401224     1386
AE     C00000935      295
       C00003418      276
       C00010603     1750
       C00027466      320
       C00193433      105
       C00211037      251
       C00216614      226
       C00341396       20
       C00369033      150
       C00394957       50
       C00401224    26538
       C00438713       50
       C00457325      310
       C00492785      300
                    ...  
ZZ     C00580100     1490
       C00603084       95
       C00607861      750
       C00608380      125
       C00618371     2199
       C00630665     1000
       C00632133      600
       C00632398      400
       C00639500      208
       C00639591     1450
       C00640623     6402
       C00653816     1000
       C00666149     1000
       C00666453     2800
       C00683102     1000
       C00689430     3524
       C00693234    13283
       C00693713     1000
       C00694018     2750
       C00694455    12761
       C00695510     1045
       C00696245      250
       C00696419     3000
       C00696526      500
       C00696948    31296
       C00697441    34396
       C00698050      350
       C00698258     2800
       C00699090     5757
       C00700732      475
Name: Donation Amount ($), Length: 32662, dtype: int64

It seems to have the data tabulated in the way I need, just not sure how to plot it.

testfire10
  • 13
  • 2
  • Does this answer your question? [Pandas - Plotting a stacked Bar Chart](https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart) – nicoring Dec 08 '19 at 22:24

1 Answers1

1

You can use the following as described here:

df = donations_per_candidate_per_state.unstack('Filer ID')
df.plot(kind='bar', stacked=True)
nicoring
  • 633
  • 5
  • 12
  • This worked great, thank you. Although, I don't really understand the syntax. What does 'unstack' do? Is there a more pythonic way to plot complex data such as this? – testfire10 Dec 09 '19 at 01:02
  • `unstack` removes a column from the index, which in this case enables us to use it for the barplot. What do you mean with a more pythonic way? What is not pythonic about it? And it would be great if you could mark this as an answer if it worked and answered your question :) – nicoring Dec 09 '19 at 22:37
  • thanks @nicoring. I just marked it as solved. I'm new to programming (teaching myself as a mechanical engineer), so it's probably just me when I'm saying it's not pythonic. What I mean though, is that it's not at all clear what's happening, and in particular, things like 'donations_per_candidate_per_state = data['Donation Amount ($)'].groupby([data['State'], data['Filer ID']]).sum()' are very confusing to me syntactically. It's not intuitive in showing how I'm manipulating the data. Every time I use pandas, it's like I'm using it for the first time. Anyway, thank you for the help. – testfire10 Dec 10 '19 at 00:24