2

Please tell me if there is at least some working way using Python or JS to get data from a text file containing data in the format shown below.

table.txt =

<CAPTION>
ANNUAL
Five fiscal years ended September 30, 1994

                     1994        1993        1992        1991        1990
<S>            <C>         <C>         <C>         <C>         <C>
Net sales      $ 9,188,748 $7,976,954  $ 7,086,542 $ 6,308,849  $5,558,435

Net income     $   310,178 $   86,589  $   530,373 $   309,841  $  474,895

Earnings per
 common and
 common
 equivalent
 share         $     2.61  $     0.73  $     4.33  $     2.58  $     3.77
Cash dividends
 declared
 per common
 share         $     0.48  $     0.48  $     0.48  $     0.48  $     0.44
Common and
 common
 equivalent
 shares used
 in the
 calculations
 of earnings per
 share            118,735     119,125     122,490     120,283     125,813
Cash, cash
 equivalents,
 and short-term
 investments   $1,257,856  $  892,303  $1,435,500  $  892,719  $  997,091
Total assets   $5,302,746  $5,171,412  $4,223,693  $3,493,597  $2,975,707
Long-term debt $  304,472  $    7,117  $   17,740  $   18,131  $    5,437
Deferred tax 
 liabilities   $  670,668  $  629,832  $  610,803  $  509,870  $  501,832


<CAPTION>
RESULTS OF OPERATIONS              1994    Change    1993   Change    1992
<S>                             <C>         <C>   <C>       <C>    <C>
Net sales                       $ 9,189     15%   $ 7,977   13%    $ 7,087
Gross margin                    $ 2,344    -14%   $ 2,728  -12%    $ 3,095
  Percentage of net sales         25.5%             34.2%            43.7%
Operating expenses (excluding
 restructuring costs)           $ 1,948    -15%   $ 2,297    --    $ 2,289
  Percentage of net sales         21.2%             28.8%            32.3%
Restructuring costs             $  (127)  -140%   $   321    --        -- 
  Percentage of net sales         (1.4%)             4.0%              -- 
Net income                      $   310    258%   $    87   -84%   $   530
Earnings per share              $  2.61    258%   $  0.73   -83%   $  4.33

                                                            
<CAPTION>
Operating Expenses                     1994  Change   1993  Change    1992
<S>                                     <C>    <C>     <C>     <C>    <C>
Research and development             $  564    -15%  $  665    10%  $  602
Percentage of net sales                6.1%            8.3%           8.5%

                                                            
<CAPTION>
                                       1994  Change   1993  Change    1992
<S>                                  <C>       <C>    <C>     <C>    <C>
Selling, general and administrative  $1,384    -15%  $1,632   -3%    $1,687
Percentage of net sales               15.1%           20.5%           23.8%

                                                            
<CAPTION>
                                    1994  Change   1993  Change    1992
<S>                                 <C>     <C>    <C>     <C>     <C>
Restructuring costs                 $(127)  -140%  $ 321     --      --                                     
Percentage of net sales             (1.4%)          4.0%             --

                                                            
<CAPTION>
Interest and Other Income             
(Expense), Net                        1994   Change   1993   Change   1992
<S>                                   <C>     <C>      <C>    <C>      <C>
Interest and other income (expense),  
  net                                $ (22)   -175%   $ 29    -41%    $ 50                                      

                                                            
<CAPTION>
Provision for Income Taxes            1994  Change   1993  Change    1992
<S>                                   <C>     <C>    <C>     <C>     <C>
Provision for income taxes            $190    258%   $53     -84%    $325
Effective tax rate                     38%            38%             38%

                                                           
<CAPTION>
Liquidity and Capital Resources                  1994     1993     1992
<S>                                              <C>      <C>      <C>
Cash, cash equivalents, and short-term                                     
  investments, net of short-term   borrowings    $   966  $    69  $ 1,251
Working capital                                  $ 2,532  $ 1,830  $ 2,151
Cash generated by (used for) operations          $   737  $ (651)  $   921
Cash used for investment activities, excluding   
  short-term investments                         $   163  $   228  $   264                            
Cash generated by (used for) financing            
  activities                                     $ (208)  $   336  $ (114) 

                                                       
<CAPTION>
                                                              
INDEX TO CONSOLIDATED FINANCIAL STATEMENTS                     Page
<S>                                                            <C>
Financial Statements:                                            
Report of Ernst & Young LLP, Independent Auditors              22 
Consolidated Balance Sheets at September 30, 1994 and            
  September 24, 1993                                           23
Consolidated Statements of Income for the three fiscal years     
  ended September 30, 1994                                     24
Consolidated Statements of Shareholders' Equity for the          
  three fiscal years ended September 30, 1994                  25
Consolidated Statements of Cash Flows for the three fiscal       
  years ended September 30, 1994                               26
Notes to Consolidated Financial Statements                     27                   
Selected Quarterly Financial Information (Unaudited)           41            
Financial Statement Schedules:                                   
For the three fiscal years ended September 30, 1994              
   Schedule II - Amounts receivable from related parties and     
                 underwriters, promoters and employees         
                 other than related parties                    S-1
   Schedule VIII - Valuation and qualifying accounts and       
                   reserves                                    S-3
   Schedule IX - Short-term borrowings                         S-4
   Schedule X - Supplementary income statement information     S-5

                                                       
<CAPTION>
September 30, 1994, and September 24, 1993             1994             1993
<S>                                              <C>             <C>
Assets:                                                       
Current assets:                                               
  Cash and cash equivalents                      $ 1,203,488     $   676,413
  Short-term investments                              54,368         215,890
  Accounts receivable, net of allowance for                                      
   doubtful accounts of $90,992 ($83,776 in 1993)  1,581,347       1,381,946
  Inventories                                      1,088,434       1,506,638
  Deferred tax assets                                293,048         268,085
  Other current assets                               255,767         289,383
    Total current assets                           4,476,452       4,338,355
Property, plant, and equipment:                                                
  Land and buildings                                 484,592         404,688
  Machinery and equipment                            572,728         578,272
  Office furniture and equipment                     158,160         167,905
  Leasehold improvements                             236,708         261,792
                                                   1,452,188       1,412,657
   Accumulated depreciation and amortization       (785,088)       (753,111)
    Net property, plant, and equipment               667,100         659,546
Other assets                                         159,194         173,511
                                                 $ 5,302,746     $ 5,171,412

Liabilities and Shareholders' Equity:                                          
Current liabilities:                                                           
  Short-term borrowings                          $   292,200     $   823,182
  Accounts payable                                   881,717         742,622
  Accrued compensation and employee benefits         136,895         144,779
  Accrued marketing and distribution                 178,294         174,547
  Accrued restructuring costs                         58,238         307,932
  Other current liabilities                          396,961         315,023
    Total current liabilities                      1,944,305       2,508,085
Long-term debt                                       304,472           7,117
Deferred tax liabilities                             670,668         629,832
Commitments and contingencies                                                  
Shareholders' equity:                                                          
  Common stock, no par value; 320,000,000 shares                                  
   authorized; 119,542,527 shares   issued and        
   outstanding in 1994 (116,147,035 shares in 
   1993)                                             297,929         203,613 
  Retained earnings                                2,096,206       1,842,600
  Accumulated translation adjustment                (10,834)        (19,835)
    Total shareholders' equity                     2,383,301       2,026,378
                                                 $ 5,302,746     $ 5,171,412

                                                      
<CAPTION>
Three fiscal years ended                  1994          1993          1992
September 30, 1994
<S>                                <C>           <C>           <C>       
Net sales                          $ 9,188,748   $ 7,976,954   $ 7,086,542
Costs and expenses:                                                       
  Cost of sales                      6,844,915     5,248,834     3,991,337
  Research and development             564,303       664,564       602,135
  Selling, general and                 
   administrative                    1,384,111     1,632,362     1,687,262
  Restructuring costs                (126,855)       320,856            --
                                     8,666,474     7,866,616     6,280,734
Operating income                       522,274       110,338       805,808
Interest and other income             
  (expense), net                      (21,988)        29,321        49,634 
Income before income taxes             500,286       139,659       855,442
Provision  for income taxes            190,108        53,070       325,069
Net income                         $   310,178   $    86,589   $   530,373                                                    
Earnings  per common and common                                            
 equivalent share                  $      2.61   $       .73   $      4.33
Common and common equivalent                                               
 shares used in the calculations
 of earnings per share                 118,735       119,125       122,490


                                          Accu-
<CAPTION>                                        mulated  Notes
                                                 Trans-   Receivable Total
                                                 lation   from       Share-
                     Common Stock      Retained  Adjust-  Share-     holders'
                  Shares    Amount     Earnings  ment     holders    Equity
<S>              <C>     <C>         <C>        <C>       <C>      <C>
Balance at
September 27,
1991            118,386 $ 278,865  $1,492,024  $ (2,377) $ (1,836) $1,766,676

Common stock
issued under
stock option
and purchase
plans, including
related
tax benefits      4,093   155,388          --         --        --    155,388

Repurchase of
common stock    (4,000) (151,943)    (60,682)         --        --  (212,625)

Repayment of notes
receivable from       
shareholders        --         --          --         --     1,836      1,836

Cash dividends of
$.48 per common
share               --         --    (57,196)         --        --   (57,196)

Accumulated
translation
adjustment          --         --          --      2,918        --      2,918

Net income          --         --     530,373         --        --    530,373

Balance at
September 25,
1992           118,479    282,310   1,904,519        541        --  2,187,370

Common stock
issued under
stock option
and purchase
plans, including
related
tax benefits     2,693    101,842          --         --        --    101,842

Repurchase of
common stock   (5,025)  (180,539)    (92,915)         --        --  (273,454)

Cash dividends of
$.48 per common
share               --         --    (55,593)         --        --   (55,593)

Accumulated
translation
adjustment          --         --          --   (20,376)        --   (20,376)

Net income          --         --      86,589         --        --     86,589

Balance at
September 24,    
1993           116,147    203,613   1,842,600   (19,835)        --  2,026,378

Common stock
issued under
stock option
and purchase
plans, including
related
tax benefits     3,396     94,316          --         --        --     94,316

Cash dividends of
$.48 per common
share               --         --    (56,572)         --        --   (56,572)

Accumulated
translation
adjustment          --         --          --      9,001        --      9,001

Net income          --         --     310,178         --        --    310,178

Balance at
September 30,    
1994           119,543  $ 297,929  $2,096,206 $ (10,834)     $  -- $2,383,301

                                                                   
<CAPTION>
Three fiscal years ended                  1994          1993          1992
  September 30, 1994
<S>                               <C>            <C>           <C>     
Cash and cash equivalents,        
 beginning of the period          $    676,413   $   498,557   $   604,147
Operations:                                                               
  Net income                           310,178        86,589       530,373
  Adjustments to reconcile net                                              
    income to cash generated by  
    (used for) operations:
    Depreciation and amortization      167,958       166,113       217,182
    Net book value of property,                                             
     plant, and equipment retirements   11,130        13,145        14,687
  Changes in assets and liabilities:
    Accounts receivable              (199,401)     (294,761)     (180,026)
    Inventories                        418,204     (926,541)        91,558
    Deferred tax assets               (24,963)      (68,946)        23,841
    Other current assets                33,616      (96,314)      (87,376)
    Accounts payable                   139,095       315,686        69,852
    Income taxes payable                50,045      (54,724)       100,361
    Accrued restructuring costs      (249,694)       202,894      (57,327)
    Other current liabilities           39,991      (13,383)        96,915
    Deferred tax liabilities            40,836        19,029       100,933
        Cash  generated by (used       
         for) operations               736,995     (651,213)       920,973
                                                                          
Investments:                                                              
  Purchase of short-term             
   investments                       (312,073)   (1,431,998)   (2,121,341)
  Proceeds from sale of short-term     
   investments                         473,595     2,153,051     1,472,970
  Purchase of property, plant, and   
   equipment                         (159,587)     (213,118)     (194,853)
  Other                                (3,737)      (15,169)      (69,410)
        Cash generated by (used         
         for) investment activities    (1,802)       492,766     (912,634)
                                                                          
Financing:                                                                
  Increase (decrease) in short-term    
   borrowings                        (530,982)       638,721        35,895
  Increase (decrease) in long-term        
   borrowings                          297,355      (10,624)         (391)
  Increases in common stock, net of
   related tax benefits and changes        
   in notes receivable from
   shareholders                         82,081        85,289       120,388
  Repurchase of common stock                --     (273,454)     (212,625)
  Cash dividends                      (56,572)      (55,593)      (57,196)
  Other                                     --      (48,036)            --
        Cash generated by (used    
         for) financing activities   (208,118)       336,303     (113,929)
                                                                          
Total cash generated (used)            527,075       177,856     (105,590)
Cash and cash equivalents, end of            
 the period                        $ 1,203,488   $   676,413   $   498,557
                                                                            
Supplemental cash flow disclosures:
 Cash paid during the year for:                                          
   Interest                        $    34,387   $    11,748   $     8,778
   Income taxes, net               $    45,692   $   226,080   $    97,667
Schedule of non-cash transactions:
 Tax benefit from stock options    $    12,235   $    16,553   $    36,836
                               

                                                     (in millions)
<CAPTION>
                               1994                    1993(A)
                                                                      
                                               Credit                  Credit
                            Notional   Fair    Risk    Notional  Fair  Risk
                            Principal  Value   Amount  Principal Value Amount
                                                 
Transactions Qualifying as Accounting Hedges
<S>                           <C>      <C>      <C>     <C>      <C>    <C>
Interest rate instruments                                                 
  Swaps                     $  699    $ (40)      --         --    --      --                                
Foreign exchange instruments                                             
  Spot / Forward contracts  $2,385    $ (23)    $ 15    $ 2,114  $  6   $  19
  Purchased options         $1,510    $   17    $ 21    $ 1,637  $ 28   $  33  
  Sold options              $  302    $  (1)      --    $   765  --(B)     --
                                                                                          
Transactions Other Than Accounting Hedges
                                                                          
Interest rate instruments                                                 
  Swaps                        --        --      --     $   112  --(B)     --
  Sold options              $  148     --(B)     --     $    67  --(B)     --
Foreign exchange instruments                                             
  Spot / Forward contracts  $  300     --(B)   --(B)    $   574  $  2   $  10
  Purchased options         $1,600    $   32   $ 32     $ 1,608  $ 24   $  24
  Sold options              $5,511    $ (45)     --     $ 5,282  $(39)     --


                                                 (in millions)
<CAPTION>                        1994                   1993
                           Carrying       Fair    Carrying       Fair
                             Amount      Value      Amount      Value
<S>                        <C>           <C>         <C>         <C>
Cash and cash equivalents  $ 1,203       $ 1,203     $ 676      $ 676           
Short-term investments     $    54       $    54     $ 216      $ 216
Short-term borrowings      $   292       $   292     $ 823      $ 823
Long-term debt:                                                      
  Ten-year unsecured notes $   300       $   259        --         --
  Other                    $     4       $     4     $   7      $   7
                                             

                                                      
<CAPTION>
Inventories                                                           
Inventories consist of the following:                    (In thousands)
                                                                
                                                     1994         1993
<S>                                            <C>          <C>     
Purchased parts                                $  469,420   $  504,201                                                       
Work in process                                   206,654      284,440
Finished goods                                    412,360      717,997
                                               $1,088,434   $1,506,638
                                                                   

                                                              
<CAPTION>                                                  (In thousands)
Short-Term Borrowings                                         
                                                         1994        1993
<S>                                                <C>        <C>          
Commercial paper                                   $   89,817  $  823,182
Notes payable to banks                                202,383          --
                                                   $  292,200  $  823,182

                                                        
<CAPTION>
Income Taxes                                                            

The provision for income taxes consists of the following:                                    
                                                           
                                                           (In thousands)  
                                         
                                    FAS 109 Method          APB 11 Method            
                                           1994          1993          1992
<S>                                  <C>             <C>         <C>
Federal:                                                                 
  Current                            $   60,757      $  13,637   $  108,512                                                      
  Deferred                               19,673       (23,757)      100,355
                                         80,430       (10,120)      208,867
State:                                                                    
  Current                                 5,769          3,144       26,935
  Deferred                               20,352            633       13,891
                                         26,121          3,777       40,826
Foreign:                                                                 
  Current                                71,095         39,512       65,144
  Deferred                               12,462         19,901       10,232
                                         83,557         59,413       75,376
                                                                          
Provision for income taxes           $  190,108      $  53,070   $  325,069
                                                       

                                 
<CAPTION>

The code below gives an error "pandas.errors.ParserError: Expected 3 fields in line 11, saw 4. Error could possibly be due to quotes being ignored when a multi-char delimiter is used."

import re
import pandas as pd
from bs4 import BeautifulSoup
from io import StringIO
with open("table.txt", "r") as f: content = f.read()
pat = r"<PAGE>(.*?)<TABLE>"
sections = [re.sub("\s+", " ", m).strip()
for m in re.findall(pat, content, re.DOTALL)]
tables=BeautifulSoup(content,"html.parser").find_all("table")
dd = {}
for t, s in zip(tables, sections):
   buf = StringIO()
   buf.write(re.sub("\$\s*", " ", t.text))
   buf.seek(0)
   dd[s] = pd.read_csv(buf, sep="\s\s+", engine="python",index_col=0, thousands=",")
print(dd["Consolidated Balance Sheets (Dollars in thousands)"].loc["Inventories",
"1994"])

I tried to find data line by line (for example, output the value of "Inventories" for 1994 in the terminal), but nothing works successfully...

1 Answers1

0

Here is a proposition with & :

import re
import pandas as pd
from bs4 import BeautifulSoup
from io import StringIO

with open("file.txt", "r") as f:
    content = f.read()

pat = r"<PAGE>(.*?)<TABLE>"
sections = [re.sub(r"\s+", " ", m).strip() for m in re.findall(pat, content, re.DOTALL)]
tables = BeautifulSoup(content, "html.parser").find_all("table")

dd = {}
for t,s in zip(tables, sections):
    buf = StringIO()
    buf.write(re.sub(r"\$\s*", "", t.text))
    buf.seek(0)
    dd[s] = pd.read_csv(buf, sep=r"\s{3,}",
                        engine="python", index_col=0, thousands=",")

Output/Tests:

To ask for a specific info, use the loc accessor:

print(dd["Consolidated Balance Sheets (Dollars in thousands)"].loc["Inventories", "1994"])

#'1088434.0'

To display a specific table, use dict-indexing :

print(dd["Consolidated Balance Sheets (Dollars in thousands)"])

                                                   1994       1993
September 30, 1994, and September 24, 1993                        
Assets:                                             NaN        NaN
Current assets:                                     NaN        NaN
Cash and cash equivalents                     1203488.0   676413.0
Short-term investments                          54368.0   215890.0
Accounts receivable, net of allowance for           NaN        NaN
doubtful accounts of 90,992 (83,776 in 1993)  1581347.0  1381946.0
Inventories                                   1088434.0  1506638.0
Deferred tax assets                            293048.0   268085.0
Other current assets                           255767.0   289383.0
Total current assets                          4476452.0  4338355.0

To show all the sections/tables available, use keys :

print(dd.keys)

#dict_keys(['Consolidated Balance Sheets (Dollars in thousands)',
#           'Consolidated Statements of Income (In thousands, except per share amounts)'])
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Thank you for your reply! I tried something like this (used pattern= r'(.*?)' page_regex = re.compile(pattern, re.DOTALL) page_matches = page_regex.findall(contents) ...) but the problem is that with a small change in the file structure, an error immediately appears. Like "pandas.errors.ParserError: Expected 1 fields in line 9, saw 5. Error could possibly be due to quotes being ignored when a multi-char delimiter is used." – Jerom Ganoć Apr 14 '23 at 19:25
  • That's why you need to share a *reproducible example* that describe the most your real *text-file* and eventually that triggers an *error* ;) – Timeless Apr 14 '23 at 19:26
  • I can send a link to this file, but is it allowed to do this? – Jerom Ganoć Apr 14 '23 at 19:37
  • You don't need to send the whole file. Just make sure to make a reproducible example and include the sections of your textfile that triggers the error. – Timeless Apr 14 '23 at 19:42
  • Added some more file structure – Jerom Ganoć Apr 14 '23 at 19:57
  • Thanks, I run the code with your new text file and I got no errors. Can you *double-check* ? You need to show the code you use + the traceback error. – Timeless Apr 14 '23 at 20:01
  • Now I have left only a few lines in the text file (what I have just indicated above in my question). And the code doesn't work, outputting an error: pandas.error.Parser Error: Expected 3 fields in line 11, saw 4. Error could possibly be due to quotes being ignored when a multi-char delimiter is used. – Jerom Ganoć Apr 14 '23 at 21:07
  • You'll find me annoying but I still can't reproduce the error. Can you include your code in the question as well ? – Timeless Apr 14 '23 at 21:20
  • Attached the code and the text (with which the error occurs) to the question. – Jerom Ganoć Apr 15 '23 at 11:32
  • Thanks, I made a small *edit/fix* (see [here](https://stackoverflow.com/posts/76018141/revisions)), it should fix the problem I hope ;) – Timeless Apr 15 '23 at 12:25
  • Thank you, now the program works well for part of the text, but for the full text (returned to the question description) there is the same error... – Jerom Ganoć Apr 15 '23 at 13:04
  • Just so you can understand the logic here, we're using [`read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) to split the columns on whistespaces (`\s`) and unfortunately, I can't conceive or imagine exactly how your real *text-file* looks like so I'm trying (*as much as I can*) to make a general/dynamic code that can handle/parse your data correctly. Sorry if you feel upset about the consistent errors! I made a small [edit](https://stackoverflow.com/posts/76018141/revisions) by the way. Make sure to check it out where you'll have time and good luck with your work ;) – Timeless Apr 15 '23 at 13:22
  • Thank you. Applied your changes. But, as far as I understand, such a solution (based on 're') is not scalable and with small changes in the number of spaces in the text, etc., the program stops working... Here is all text of a file, maybe it can help https://www.sec.gov/Archives/edgar/data/320193/0000320193-94-000016.txt – Jerom Ganoć Apr 15 '23 at 13:39
  • Yeah, it is certainly not your best option and I maybe failed to answer/help you. I hope that my naïve approach gave you some insights that you can use to think of another solution (more robust/scalable than mine) and good luck ;) – Timeless Apr 15 '23 at 13:46
  • Perhaps, looking at the full file, you can share suggestions about what problems there are in applying the approach you described, so that I can think about their solution? Thank you and have a nice day! – Jerom Ganoć Apr 15 '23 at 14:13