2

Do you know of any API (paid or free), tool or python package which can parse individual sections SEC 10-K filings?

I'm looking for the individual sections of 10-K filings (e.g. ITEM 1: Business, ITEM 1A: Risk Factors, etc) separated from the entire 10-K filing and preferably cleaned from any page headers (company name), footers (page number) and tables containing mostly numeric data. I've written a parser in python using BeautifulSoup for entire 10-K statements but dividing them into individual sections is looking to be quite challenging - not impossible though.

Before reinventing the wheel, I thought, I ask the community first if they know of any existing solutions for this. I've found https://jodie.ai/hi/ which has the 10-K statements divided into sections but only dating back to 2009.

Thanks for the help!

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Martin
  • 21
  • 1
  • 2
  • FYI, I just posted a related question which, if answered, might answer your question too. We seem to be seeking the same thing. If you found a better solution since posting I'd love to hear about that too. https://stackoverflow.com/questions/62706179/filing-section-locations-in-biquery-sec-filing-dataset – T. Shaffner Jul 02 '20 at 23:01

2 Answers2

5

I had to solve the same problem and developed an item extraction algorithm for 10-K and 10-Q filings. The algo supports all item types and can return standardized clear-text and the original HTML of each item:

  • 1 - Business
  • 1A - Risk Factors
  • 1B - Unresolved Staff Comments
  • 2 - Properties
  • 3 - Legal Proceedings
  • 4 - Mine Safety Disclosures
  • 5 - Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities
  • 6 - Selected Financial Data (prior to February 2021)
  • 7 - Management’s Discussion and Analysis of Financial Condition and Results of Operations
  • 7A - Quantitative and Qualitative Disclosures about Market Risk
  • 8 - Financial Statements and Supplementary Data
  • 9 - Changes in and Disagreements with Accountants on Accounting and Financial Disclosure
  • 9A - Controls and Procedures
  • 9B - Other Information
  • 10 - Directors, Executive Officers and Corporate Governance
  • 11 - Executive Compensation
  • 12 - Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters
  • 13 - Certain Relationships and Related Transactions, and Director Independence
  • 14 - Principal Accountant Fees and Services
  • 15 - Exhibits, Financial Statement Schedules

Request Parameters

You can use the API to retrieve any item by providing the URL of the 10-K or 10-Q filing, the items to be extracted and the type:

  • url (required) - URL of the 10-K or 10-Q filing, e.g. TSLA 10-K https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm
  • items (required) - The item or items to be extracted. Provide multiple items separated by comma, e.g. 1,1A,1B,2,5
  • type (optional) - Can be text or html. text returns clear, formatted text without any XBRL, XML or HTML tags. All tables are removed. html returns the original, cleaned HTML version of the item including tables. Default: text
  • token (required) - Your API key.

If you need to generate a list of most recent 10-K/Q filings, you can use the query API (https://sec-api.io/docs/query-api).

Example Request - Item 1A Risk Factors, Text

https://api.sec-api.io/extractor?
url=https://www.sec.gov/Archives/edgar/data/1318605/000156459021004599/tsla-10k_20201231.htm&
item=1A&
type=text&
token=YOUR_API_KEY

Example Response - Item 1A Risk Factors, Text

You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and future results. The risks described below are not the only risks facing our company. Risks and uncertainties not currently known to us or that we currently deem to be immaterial also may materially adversely affect our business, financial condition and operating results.

Risks Related to Our Ability to Grow Our Business

We may be impacted by macroeconomic conditions resulting from the global COVID-19 pandemic.

Since the first quarter of 2020, there has been a worldwide impact from the COVID-19 pandemic. Government regulations and shifting social behaviors have limited or closed non-essential transportation, government functions, business activities and person-to-person interactions. In some cases, the relaxation of such trends has recently been followed by actual or contemplated returns to stringent restrictions on gatherings or commerce, including in parts of the U.S. and a number of areas in Europe.

We temporarily suspended operations at each of our manufacturing facilities worldwide for a part of the first half of 2020. Some of our suppliers and partners also experienced temporary suspensions before resuming, including Panasonic, which manufactures battery cells for our products at our Gigafactory Nevada. We also instituted temporary employee furloughs and compensation reductions while our U.S. operations were scaled back. Reduced operations or closures at motor vehicle departments, vehicle auction houses and municipal and utility company inspectors have resulted in challenges in or postponements for our new vehicle deliveries, used vehicle sales and energy product deployments. Global trade conditions and consumer trends may further adversely impact us and our industries. For example, pandemic-related issues have exacerbated port congestion and intermittent supplier shutdowns and delays, resulting in additional expenses to expedite delivery of critical parts. Similarly, increased demand for personal electronics has created a shortfall of microchip supply, and it is yet unknown how we may be impacted. Sustaining our production trajectory will require the readiness and solvency of our suppliers and vendors, a stable and motivated production workforce and ongoing government cooperation, including for travel and visa allowances. The contingencies inherent in the construction of and ramp at new facilities such as Gigafactory Shanghai, Gigafactory Berlin and Gigafactory Texas may be exacerbated by these challenges.

We cannot predict the duration or direction of current global trends, the sustained impact of which is largely unknown, is rapidly evolving and has varied across geographic regions. Ultimately, we continue to monitor macroeconomic conditions to remain flexible and to optimize and evolve our business as appropriate, and we will have to accurately project demand and infrastructure requirements globally and deploy our production, workforce and other resources accordingly. If current global market conditions continue or worsen, or if we cannot or do not maintain operations at a scope that is commensurate with such conditions or are later required to or choose to suspend such operations again, our business, prospects, financial condition and operating results may be harmed.

We may experience delays in launching and ramping the production of our products and features, or we may be unable to control our manufacturing costs.

We have previously experienced and may in the future experience launch and production ramp delays for new products and features. For example, we encountered unanticipated supplier issues that led to delays during the ramp of Model X and experienced challenges with a supplier and with ramping full automation for certain of our initial Model 3 manufacturing processes. In addition, we may introduce in the future new or unique manufacturing processes and design features for our products. There is no guarantee that we will be able to successfully and timely introduce and scale such processes or features.

In particular, our future business depends in large part on increasing the production of mass-market vehicles including Model 3 and Model Y, which we are planning to achieve through multiple factories worldwide. We have relatively limited experience to date in manufacturing Model 3 and Model Y at high volumes and even less experience building and ramping vehicle production lines across multiple factories in different geographies. In order to be successful, we will need to implement, maintain and ramp efficient and cost-effective manufacturing capabilities, processes and supply chains and achieve the design tolerances, high quality and output rates we have planned at our manufacturing facilities in California, Nevada, Texas, China and Germany. We will also need to hire, train and compensate skilled employees to operate these facilities. Bottlenecks and other unexpected challenges such as those we experienced in the past may arise during our production ramps, and we must address them promptly while continuing to improve manufacturing processes and reducing costs. If we are not successful in achieving these goals, we could face delays in establishing and/or sustaining our Model 3 and Model Y ramps or be unable to meet our related cost and profitability targets.

We may also experience similar future delays in launching and/or ramping production of our energy storage products and Solar Roof; new product versions or variants; new vehicles such as Tesla Semi, Cybertruck and the new Tesla Roadster; and future features and services such as new Autopilot or FSD features and the autonomous Tesla ride-hailing network. Likewise, we may encounter delays with the design, construction and regulatory or other approvals necessary to build and bring online future manufacturing facilities and products.

Any delay or other complication in ramping the production of our current products or the development, manufacture, launch and production ramp of our future products, features and services, or in doing so cost-effectively and with high quality, may harm our brand, business, prospects, financial condition and operating results.

We may be unable to grow our global product sales, delivery and installation capabilities and our servicing and vehicle charging networks, or we may be unable to accurately project and effectively manage our growth.

Our success will depend on our ability to continue to expand our sales capabilities . We also frequently adjust our retail operations and product offerings in order to optimize our reach, costs, product line-up and model differentiation and customer experience. However, there is no guarantee that such steps will be accepted by consumers accustomed to traditional sales strategies. For example, marketing methods such as touchless test drives that we have pioneered in certain markets have not been proven at scale. We are targeting with Model 3 and Model Y a global mass demographic with a broad range of potential customers, in which we have relatively limited experience projecting demand and pricing our products. We currently produce numerous international variants at a limited number of factories, and if our specific demand expectations for these variants prove inaccurate, we may not be able to timely generate deliveries matched to the vehicles that we produce in the same timeframe or that are commensurate with the size of our operations in a given region. Likewise, as we develop and grow our energy products and services worldwide, our success will depend on our ability to correctly forecast demand in various markets.

Because we do not have independent dealer networks, we are responsible for delivering all of our vehicles to our customers. While we have improved our delivery logistics, we may face difficulties with deliveries at increasing volumes, particularly in international markets requiring significant transit times. For example, we saw challenges in ramping our logistics channels in China and Europe to initially deliver Model 3 there in the first quarter of 2019. We have deployed a number of delivery models, such as deliveries to customers’ homes and workplaces and touchless deliveries, but there is no guarantee that such models will be scalable or be accepted globally. Likewise, as we ramp Solar Roof, we are working to substantially increase installation personnel and decrease installation times. If we are not successful in matching such capabilities with actual production, or if we experience unforeseen production delays or inaccurately forecast demand for the Solar Roof, our business, financial condition and operating results may be harmed.

Moreover, because of our unique expertise with our vehicles, we recommend that our vehicles be serviced by us or by certain authorized professionals. If we experience delays in adding such servicing capacity or servicing our vehicles efficiently, or experience unforeseen issues with the reliability of our vehicles, particularly higher-volume and newer additions to our fleet such as Model 3 and Model Y, it could overburden our servicing capabilities and parts inventory. Similarly, the increasing number of Tesla vehicles also requires us to continue to rapidly increase the number of our Supercharger stations and connectors throughout the world.

There is no assurance that we will be able to ramp our business to meet our sales, delivery, installation, servicing and vehicle charging targets globally, that our projections on which such targets are based will prove accurate or that the pace of growth or coverage of our customer infrastructure network will meet customer expectations. These plans require significant cash investments and management resources and there is no guarantee that they will generate additional sales or installations of our products, or that we will be able to avoid cost overruns or be able to hire additional personnel to support them. As we expand, w e will also need to ensure our compliance with regulatory requirements in various jurisdictions applicable to the sale, installation and servicing of our products, the sale or dispatch of electricity related to our energy products and the operation of Superchargers. If we fail to manage our growth effectively, it may harm our brand, business, prospects, financial condition and operating results.

Our future growth and success are dependent upon consumers’ demand for electric vehicles and specifically our vehicles in an automotive industry that is generally competitive, cyclical and volatile.

If the market for electric vehicles in general and Tesla vehicles in particular does not develop as we expect, develops more slowly than we expect, or if demand for our vehicles decreases in our markets or our vehicles compete with each other, our business, prospects, financial condition and operating results may be harmed.

We are still at an earlier stage and have limited resources and production relative to established competitors that offer internal combustion engine vehicles. In addition, electric vehicles still comprise a small percentage of overall vehicle sales. As a result, the market for our vehicles could be negatively affected by numerous factors, such as:

- perceptions about electric vehicle features, quality, safety, performance and cost;

- perceptions about the limited range over which electric vehicles may be driven on a single battery charge, and access to charging facilities;

- competition, including from other types of alternative fuel vehicles, plug-in hybrid electric vehicles and high fuel-economy internal combustion engine vehicles;

- volatility in the cost of oil and gasoline, such as wide fluctuations in crude oil prices during 2020;

- government regulations and economic incentives; and

- concerns about our future viability.

Documentation: https://sec-api.io/docs/sec-filings-item-extraction-api

Jay
  • 1,564
  • 16
  • 24
1

I just commented above about a related question I have, in which the related BigQuery dataset may be the answer to your question. I haven't managed to make it work myself however for extracting individual filing sections.

The next option I found, which isn't an API and thus doesn't stay current but does go back to 1993, is the repository at https://sraf.nd.edu/data/. I can't tell yet if the sections are broken out exactly as you're looking for but a substantial amount of pre-cleaning has been done, making it either an easier starting point for you and/or a useful check against your own parsing code. The resources site there includes links to earlier papers analyzing the same and useful things like dictionaries and related word lists, and the code page includes their own python cleaning work, which appears to have been quite comprehensive.

Still not the full, clean API I think you and I are both looking for, but the best I've found.

T. Shaffner
  • 359
  • 1
  • 5
  • 22
  • Thanks for the helpful reply. I've looked at your resources, however, they don't really give full solutions to my problem. The Google BigQuery dataset includes, for example, 10-Q statements dating back to 2009, but I'm looking for 10-Ks dating back to 1995. – Martin Jul 15 '20 at 12:28
  • Anyway, in the meantime, I've written my own script what works surprisingly well. It's not 100% accurate because some companies use their own structure, but it works for approx 490 of the S&P 500 constituents. The script uses BeautifulSoup, FuzzyWuzzy, transforms each text within the HTML tags and the Python-Constraint package to get the most accurate results. But let me tell you, the HTML 10-Ks are a big mess. – Martin Jul 15 '20 at 12:39
  • @Martin Nice! Any plan to make this open source? Also I'm curious, why not take the text files as your starting point instead of the HTML? All the ones I've seen seem to have a txt submission going back to 1994 that's a cleaner starting point with xml-style tags. Not possible for some reason? – T. Shaffner Jul 26 '20 at 18:07