Questions tagged [edgar]

EDGAR is an information system of the U.S. Securities and Exchange Commission holding company data. Questions related to parsing and querying the data and public APIs should be tagged.

EDGAR stays for Electronic Data Gathering, Analysis, and Retrieval. This information system uses several data formats: classic SGML based, XML-based XBRL format for business reporting and many more.

120 questions
22
votes
1 answer

Web scraping SEC Edgar 10-K and 10-Q filings

Are there anyone experienced with scraping SEC 10-K and 10-Q filings? I got stuck while trying to scrape monthly realised share repurchases from these filings. In specific, I would like to get the following information: 1. Period; 2. Total Number of…
Jiayuan Chen
  • 221
  • 1
  • 2
  • 4
5
votes
1 answer

SEC company filings: Is the tag valid SGML? If so, how to parse it?

I tried to parse SEC company filings from sec.gov. Starting from fb 10-Q index.htm let's look at a complete text submission filing like complete submission text filing. It has a structure like: "some…
Michael S
  • 466
  • 1
  • 4
  • 12
5
votes
1 answer

From 10-K -- extract SIC, CIK, create metadata table

I am working with 10-Ks from Edgar. To assist in file management and data analysis, I would like to create a table containing the path to each file, the CIK number for the company filed (this is a unique ID issued by SEC), and the SIC industry code…
user7317101
5
votes
1 answer

HTML Rendering of EDGAR .txt Filings

Currently, I'm working on a project where one PHP script grabs an index file from ftp://ftp.sec.gov and places all the company information into the database. The second PHP script then grabs the raw text file from the SEC and saves it locally for…
3
votes
2 answers

JSONDecodeError: Expecting value: line 1 column 1 (char 0) when scaping SEC EDGAR

My codes are as follows: import requests import urllib from bs4 import BeautifulSoup year_url = r"https://www.sec.gov/Archives/edgar/daily-index/2020/index.json" year_content = requests.get(year_url) decoded_year_url = year_content.json() I could…
Julie
  • 57
  • 4
3
votes
1 answer

Parse XML with Python lxml

I am trying to parse a XML using the python library lxml, and would like the resulting output to be in a dataframe. I am relatively new to python and parsing so please bear with me as I outline the problem. The original xml that I am trying to parse…
stump
  • 85
  • 1
  • 6
3
votes
2 answers

How to Use Beautiful Soup to Scrape SEC's Edgar Database and Receive Desire Data

Apologies in advance for long question- I am new to Python and I'm trying to be as explicit as I can with a fairly specific situation. I am trying to identify specific data points from SEC Filings on a routine basis however I want to automate this…
bvd
  • 51
  • 1
  • 5
3
votes
2 answers

Arelle Webserver - How to extract the income statement from an XBRL filing?

I am trying to extract financial statement information based on type of the statement. Let me explain to you in a little more details. I want to extract the income statement, balance sheet and cash flow statement from an XBRL instance – especially…
rbr
  • 51
  • 3
3
votes
0 answers

How would I approach a lot of structured-but-inconsistent data?

I'm attempting to parse EDGAR documents - they're SEC filings. Specifically, I'm attempting to parse both SEC Schedule 13D and Schedule 13G filings. There appears to be lots of failed attempts at parsing these filings, and I assume that's because…
Mr_Spock
  • 3,815
  • 6
  • 25
  • 33
2
votes
1 answer

Parse SEC EDGAR XML Form Data with child nodes using BeautifulSoup

I am attempting to scrape individual fund holdings from the SEC's N-PORT-P/A form using beautiful soup and xml. A typical submission, outlined below and [linked here][1], looks like:
therdawg
  • 69
  • 6
2
votes
1 answer

Downloading file from the website - HTTPError: HTTP Error 403: Forbidden

I am trying to download 10Ks (annual report of public companies) from EDGAR. I am running the code below (used it from the textbook, don't understand much of it), but keep getting the following error: (I downloaded 'master.idx' files that are…
Alberto Alvarez
  • 805
  • 3
  • 11
  • 20
2
votes
3 answers

How to get data from SEC Edgar python and a json

on the following page below there is as Data source a json link: https://www.sec.gov/edgar/browse/?CIK=1067983&owner=exclude Data source: CIK0001067983.json -> https://data.sec.gov/submissions/CIK0001067983.json This is my code (it works…
JKR
  • 51
  • 4
2
votes
2 answers

How to Web scraping SEC Edgar 10-K Dynamic data

we are trying to parse SEC Edgar filing using Python . I'm trying to get this table "Sales By Segment Of Business" at line 21 . This is the link to the…
Tarun teja
  • 59
  • 3
  • 7
2
votes
1 answer

Extracting table of holdings from (Edgar 13-F filings) TXT (pre-2013) with python

I am working on extracting a table of holdings from 13-F form on EDGAR. Before 2013 holdings were given in a txt file (see example). The output I am aiming for is a pd.DataFrame with same shape as the "Form 13F Information Table" in txt file (10…
NoobFin
  • 23
  • 5
2
votes
2 answers

Extracting xml from a txt file

I'm trying to extract the xml portion of code from a txt file in python. The current txt file I'm using is from the edgar database and has multiple representations of a 10-k report in one txt file, having html then xml, and then some other…
segfault
  • 65
  • 6
1
2 3 4 5 6 7 8