I am working on extracting a table of holdings from 13-F form on EDGAR. Before 2013 holdings were given in a txt file (see example). The output I am aiming for is a pd.DataFrame with same shape as the "Form 13F Information Table" in txt file (10 columns and each line in a separate row).
I have tried to use BeautifulSoup, which turned table into a tag object, but I can't figure out how to format this to go into the dataframe as described above.
This is my code attempt:
soup2 = BeautifulSoup(requests.get(filing_url_13f).content, 'lxml')
holdings = soup2.find_all('table')
#This is my attempt to turn the content into a list:
lixt=[]
for x in soup2.find_all(['c','c','c','c','c','c','c','c','c'] ):
for line in x:
lixt.append(line)
x=lixt[1]
l=[]
for string in x.strings:
l.append(repr(string))
el=l[7]
And this is where I get stuck as el returns the below. I can't split it by \n, as there are often \n within company name (AMERICAN\n EXPRESS CO).
\nAMERICAN\n EXPRESS CO COM 025816109 112,209 1,952,142 Shared-Defined 4 1,952,142 - -\nAMERICAN\n EXPRESS CO COM 025816109 990,116 17,225,400 Shared-Defined 4, 5 17,225,400 - -\nAMERICAN\n EXPRESS CO COM 025816109 48,274 839,832 Shared-Defined 4, 7 839,832 - -\nAMERICAN\n EXPRESS CO COM 025816109 111,689 1,943,100 Shared-Defined 4, 8, 11 1,943,100 - -\nAMERICAN\n EXPRESS CO COM 025816109 459,532 7,994,634 Shared-Defined 4, 10 7,994,634 - -\nAMERICAN\n EXPRESS CO COM 025816109 6,912,308 120,255,879 Shared-Defined 4, 11 120,255,879 - -\nAMERICAN\n EXPRESS CO COM 025816109 80,456 1,399,713 Shared-Defined 4, 13 1,399,713 - -\nARCHER DANIELS\n MIDLAND CO COM 039483102 163,151 5,956,600 Shared-Defined 4, 5 5,956,600 - -\nBANK OF NEW\n YORK MELLON\n CORP COM 064058100 206,661 8,041,300 Shared-Defined 4 8,041,300 - -\nBANK OF NEW\n YORK MELLON\n CORP COM 064058100 46,104 1,793,915 Shared-Defined 2, 4, 11 1,793,915 - -\nBANK OF NEW\n YORK MELLON\n CORP COM 064058100 251,827 9,798,700 Shared-Defined 4, 8, 11 9,798,700 - -\nCOCA COLA CO COM 191216100 29,000 800,000 Shared-Defined 4 800,000 - -\n
I will be very grateful for any suggestions.