I'm currently working on the download of the form.idx file from sec.gov for the first quarter of 2016. Since I'm only interested in the 10-Ks, I wanted to download the file as a .csv file and delete the useless rows. I tried to filter by the form type but that didn't work out.
My code so far is the following:
import requests
import os
years = [2016]
quarters = ['QTR1']
base_path = '/Users/xyz/Desktop'
current_dirs = os.listdir(path=base_path)
for yr in years:
if str(yr) not in current_dirs:
os.mkdir('/'.join([base_path, str(yr)]))
current_files = os.listdir('/'.join([base_path, str(yr)]))
for qtr in quarters:
local_filename = f'{yr}-{qtr}.csv'
local_file_path = '/'.join([base_path, str(yr), local_filename])
if local_filename in current_files:
print(f'Skipping file for {yr}, {qtr} because it is already saved.')
continue
url = f'https://www.sec.gov/Archives/edgar/full-index/{yr}/{qtr}/form.idx'
r = requests.get(url, stream=True)
with open(local_file_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=128):
f.write(chunk)
r2 = pd.read_csv('/Users/xyz/Desktop/2016-QTR1.csv', sep=";", encoding="utf-8")
r2.head()
filt = (r2 ['Form Type'] == '10-K')
r2_10K = r2.loc[filt]
r2_10K.head()
r2_10K.to_csv('/Users/xyz/Desktop/modified.csv')
The Error message I get is:
Traceback (most recent call last):
File "<ipython-input-5-f84e3f81f3d1>", line 61, in <module>
filt = (r2 ['Form Type'] == '10-K')
File "/Users/xyz/opt/anaconda3/envs/spyder-4.1.5_1/lib/python3.8/site-packages/pandas/core/frame.py", line 2906, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/xyz/opt/anaconda3/envs/spyder-4.1.5_1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
raise KeyError(key) from err
KeyError: 'Form Type'
Maybe there's a way to just delete the rows I don't need in the file? Otherwise, I'm also thankful for any kind of help on that problem.
Many thanks in advance.
Kind regards, Elena