I'm working in ipython; I have a Yaml file and a list of [thomas] ids corresponding to my Yaml file (thomas: -third row down on the file). Below is just a small snippet of the file. The complete file can be found here (https://github.com/108michael/congress-legislators/blob/master/legislators-historical.yaml)
- id:
bioguide: C000858
thomas: '00246'
lis: S215
govtrack: 300029
opensecrets: N00002091
votesmart: 53288
icpsr: 14809
fec:
- S0ID00057
wikipedia: Larry Craig
house_history: 11530
name:
first: Larry
middle: E.
last: Craig
bio:
birthday: '1945-07-20'
gender: M
religion: Methodist
terms:
- type: rep
start: '1981-01-05'
end: '1983-01-03'
state: ID
district: 1
party: Republican
- type: rep
start: '1983-01-03'
end: '1985-01-03'
state: ID
district: 1
party: Republican
I want to parse the file and for every id in my list that corresponds to an Id in [thomas:] I want to retrieve the following: [fec]: (there could be more than one of these, I need all of them) [name:] [first:] [middle:] [last:]; [bio:] [birthday:]; [terms:] (it is likely that there is more than one term, I need for all terms) [type:] [start:] [state:] [party:]. Finally, there may also be instances where the fec data is not available.
1) How should I store the data? I am still relatively new to Python (my first programing language) and am not sure how to store the data. Intuitively, I would say dictionary; however what is paramount is ease of access and data retrieval. Previously, I have stored similarly nested data as csv. This method seems a little bit bulky. It seems that it would be ideal if I could just make a list (from the thomas ids that I have) of dictionaries (the data I am retrieving).
2) I'm not sure how to set up the for/while statements so that I only retrieve data corresponding to my list of thomas ids.
I started with writing what I expect would be the code for writing the info to CSV:
import pandas as pd
import yaml
import glob
import CSV
df = pd.concat((pd.read_csv(f, names=['date','bill_id','sponsor_id']) for f in glob.glob('/home/jayaramdas/anaconda3/df/s11?_s_b')))
outputfile = open('sponsor_details', 'W', newline='')
outputwriter = csv.writer(outputfile)
df = df.drop_duplicates('sponsor_id')
sponsor_list = df['sponsor_id'].tolist()
with open('legislators-historical.yaml', 'r') as f:
data = yaml.load(f)
for sponsor in sponsor_list:
where sponsor == data[0]['thomas']:
x = data[0]['thomas']
a = data[0]['name']['first']
b = data[0]['name']['middle']
c = data[0]['name']['last']
d = data[0]['bio']['gender']
e = data[0]['bio']['religion']
for fec in data[0]['id']:
c = fec.get('fec')
for terms in data[0]['id']:
t = terms.get('type')
s = terms.get('start')
state = terms.get('state')
p = terms.get('party')
outputwriter.writerow([x, a, b, c, d, e, c, t, s, state, p])
outputfile.flush()
I get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-48-057d25de7e11> in <module>()
15
16 for sponsor in sponsor_list:
---> 17 if sponsor == data[0]['thomas']:
18 x = data[0]['thomas']
19 a = data[0]['name']['first']
KeyError: 'thomas'