0

I saw the question on pre-2013 13-F filings, but noticed they used an even different format pre 2012. This is the original question: Extracting table of holdings from (Edgar 13-F filings) TXT (pre-2013) with python

Pre 2013 but post 2012 example:

https://www.sec.gov/Archives/edgar/data/1067983/000119312512470800/d434976d13fhr.txt

Pre 2012 example:

https://www.sec.gov/Archives/edgar/data/1067983/000095012905008251/0000950129-05-008251.txt

Pre 2012, they did not fill in all company names, title of class and CUSIP number. This therefore shifts the columns to the left. (See pre 2012 format in picture) Pre-2012 13-F Filing

Adapting the code from NoobFin and Jack Fleeting's question gives me this:

Code:

endpoint = r"https://www.sec.gov/Archives/edgar/data/1067983/000095012905008251/0000950129-05-008251.txt"
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url = endpoint, headers = headers)
def lst_bunch(l,lenth=4):
    i=0
    while i < len(l):
        if len(l[i])<lenth:
            l[i] += l.pop(i+1)
        i += 1
    for item in l:
        if len(item)<lenth:
            lst_bunch(l,lenth)
    else:
        return l

tabs = response.text.replace('<TABLE>','xxx<TABLE>').split('xxx')
for tab in tabs[1:]:
    soup = bs(tab,'html')
    table = soup.select_one('table')
    lines = table.text.splitlines()
    lst_bunch(lines,50)
    for line in lines:
        print(line.strip())

Output:
Jack Fleeting's code applied

What I am looking for is a DataFrame which I can export to CSV (or SQL or whatever) that looks like this:

Quick Excel file to show desired result.

I was thinking of making 1 good example and put it through some ML commands, but maybe I am missing something.

Thanks!

  • What have you try so far? – Corralien Apr 23 '22 at 12:28
  • Apologies, included the code and examples! – Not_a_Robot Apr 23 '22 at 12:52
  • 1
    `pandas.read_fwf` will read a table of Fixed-Width Formatted lines into DataFrame; docs are [here](https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html). This enables the user to preserve columnar structure, when reading text (.txt) files. – jsmart Apr 23 '22 at 14:48
  • @jsmart thanks! I see how that would work. However, I get a 403 response when I put the link through pd.read_fwf. Is there a solution to this? Saving the BeautifulSoup to .txt first would be fine as well. – Not_a_Robot Apr 23 '22 at 18:45

0 Answers0