1

I was pleased recently to discover that Bigquery hosts a dataset of SEC filings. I am unable to find the actual text of the filings in the dataset however! This seems so obvious I must be missing something.

As an example, the 2018 Microsoft 10-K filing on the SEC website itself can be seen to have the 10-K text in which Item 7 includes the phrase "Management’s Discussion and Analysis of Financial Condition and Results". I searched for this phrase in the Dataset.

First, the following query should pull all the text from this filing:

SELECT *
FROM `bigquery-public-data.sec_quarterly_financials.txt`
WHERE submission_number="0001564590-18-019062"

The results of this query, when searched for the above phrase, finds nothing however.

A second attempt based on another StackOverflow answer gave me this, in which I try to search the entire dataset for that phrase in case it's stored in a different table:

SELECT *
FROM `bigquery-public-data.sec_quarterly_financials.*` t
WHERE REGEXP_CONTAINS(LOWER(TO_JSON_STRING(t)), r'/^discussion and analysis of financial condition$/')

No result!

I can clearly find the same SEC filing, and yet content within it seems to be missing. I've searched other phrases and sections too, the text seems not to be there. Yet, based on all the Google documentation I think it should be. What am I missing?

Alternatively, anyone know of another source for parsing sections of SEC 10-K filings or the like? That would be useful too and you could also answer this question with it.

T. Shaffner
  • 359
  • 1
  • 5
  • 22
  • According to the description on the value field/column: _"The value, with all whitespace normalized, that is, all sequences of line feeds, carriage returns, tabs, non-breaking spaces, and spaces having been collapsed to a single space, and no leading or trailing spaces. Escaped XML that appears in EDGAR \"Text Block\" tags is processed to remove all mark-up (comments, processing instructions, elements, attributes). The value is truncated to a maximum number of bytes. The resulting text is not intended for end user display but only for text analysis applications."_ This _might_ explain it. – Graham Polley Jul 03 '20 at 00:26
  • this sounds like an answer :o) – Mikhail Berlyant Jul 03 '20 at 00:30
  • Does it return something for a particular table: `SELECT * FROM bigquery-public-data.sec_quarterly_financials.txt WHERE REGEXP_CONTAINS(LOWER(value),r'discussion and analysis of financial condition')`? – Nick_Kh Jul 03 '20 at 11:06
  • @GrahamPolley, to my understanding that shouldn't mean that any actual textual content is lost but rather that it is cleaned, reduced, and simplified for text analysis applications. After all it's a text analysis application I want this for! And in that case I should still be able to find a phrase if it occurs early in the text, which doesn't seem to be the case with other early phrases I've searched. That's my confusion; based on that I should at least see the first 8k or so characters of content, but I don't. – T. Shaffner Jul 04 '20 at 14:43
  • @mk_sta, my hope is that's searching all the tables in the dataset. I just checked though and you're correct, that doesn't actually seem to be the case. If I change it to just the txt table though I still seem unable to find any actual text content from the body of the filing. – T. Shaffner Jul 04 '20 at 14:50
  • With `txt` particular table I'm able to search the content, strange that this doesn't work for you. How about this: `lower(value) like "%discussion and analysis of financial condition%"`? – Nick_Kh Jul 14 '20 at 06:46
  • @mk_sta, if I run that without filtering for the particular filing in question (e.g. without submission_number="0001564590-18-019062") then I get a bunch of results, but if I filter it for the particular one I'm searching for I get no results. I'm guessing that means that phrase occurs in other filings but still means I'm unable to find that phrase in the particular filing I'm searching. Does it show up for you in this particular filing? As best I can tell that still means there's text missing in the database from the filing I'm examining. – T. Shaffner Jul 25 '20 at 14:51

0 Answers0