python - pandas append dataframe failing

Question

I am trying to concatenate three separate dataframes (bs_df, income_df, cash_df) of each ticker then append them into a dataframe df. It successfully appends the dataframe the first time it goes through the for loop, but when it tries to append the new dataframe to the original dataframe, it fails to do so and throws an Assertion Error. I think it's because the dataframes have different columns (reference), but I cannot figure out a way to bypass this and still create a dataframe with all the column names and fill in with NaN values if a dataframe does not have the value for a specific column that the other dataframe has.

Here is the code I am running:

from yahoofinancials import YahooFinancials
import pandas as pd

tickers = ["PIH","AAPL","AMZN"]
df = pd.DataFrame()
for ticker in tickers:
    print (ticker)
    # Updates every loop
    ticker_yahoo_financials = YahooFinancials(ticker)

    # Get financial info
    bs_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'balance')
    income_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'income')
    cash_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'cash')

    # Create dataframes and comebine them
    bs_df = pd.DataFrame(list(bs_hist['balanceSheetHistory'][ticker][0].values()))
    income_df = pd.DataFrame(list(income_hist['incomeStatementHistory'][ticker][0].values()))
    cash_df = pd.DataFrame(list(cash_hist['cashflowStatementHistory'][ticker][0].values()))
    comb = pd.concat([bs_df, income_df, cash_df], axis=1)
    df = df.append(comb)

Here is the error message:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-23-f476ca9fbb70> in <module>()
     19     cash_df = pd.DataFrame(list(cash_hist['cashflowStatementHistory'][ticker][0].values()))
     20     comb = pd.concat([bs_df, income_df, cash_df], axis=1)
---> 21     df = df.append(comb)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in append(self, other, ignore_index, verify_integrity)
   5192             to_concat = [self, other]
   5193         return concat(to_concat, ignore_index=ignore_index,
-> 5194                       verify_integrity=verify_integrity)
   5195 
   5196     def join(self, other, on=None, how='left', lsuffix='', rsuffix='',

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
    211                        verify_integrity=verify_integrity,
    212                        copy=copy)
--> 213     return op.get_result()
    214 
    215 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
    406             new_data = concatenate_block_managers(
    407                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 408                 copy=self.copy)
    409             if not self.copy:
    410                 new_data._consolidate_inplace()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   5205         blocks.append(b)
   5206 
-> 5207     return BlockManager(blocks, axes)
   5208 
   5209 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   3031 
   3032         if do_integrity_check:
-> 3033             self._verify_integrity()
   3034 
   3035         self._consolidate_check()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in _verify_integrity(self)
   3247                                  'block items\n# manager items: {0}, # '
   3248                                  'tot_items: {1}'.format(
-> 3249                                      len(self.items), tot_items))
   3250 
   3251     def apply(self, f, axes=None, filter=None, do_integrity_check=False,

AssertionError: Number of manager items must equal union of block items
# manager items: 67, # tot_items: 68

jezrael · Accepted Answer · 2018-04-12T09:09:25.227

Problem is after concat Dataframes in loop get duplicated index rows, so finally concat raise error:

PIH
           2017-12-31  2016-12-31  2015-12-31
netIncome    294000.0     11000.0  -1673000.0
netIncome    294000.0     11000.0  -1673000.0
AAPL
             2017-09-30    2016-09-24    2015-09-26
netIncome  4.835100e+10  4.568700e+10  5.339400e+10
netIncome  4.835100e+10  4.568700e+10  5.339400e+10
AMZN
             2017-12-31    2016-12-31   2015-12-31
netIncome  3.033000e+09  2.371000e+09  596000000.0
netIncome  3.033000e+09  2.371000e+09  596000000.0

My solution:

tickers = ["PIH","AAPL","AMZN"]
dfs = []
for t in tickers:
    print (t)
    # Updates every loop
    ticker_yahoo_financials = YahooFinancials(t)

    # Get financial info
    bs_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'balance')
    bs_hist = bs_hist['balanceSheetHistory']
    income_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'income')
    income_hist = income_hist['incomeStatementHistory']
    cash_hist = ticker_yahoo_financials.get_financial_stmts('annual', 'cash')
    cash_hist = cash_hist['cashflowStatementHistory']

    bs_df = pd.concat([pd.DataFrame(x) for x in bs_hist[t]], axis=1)
    income_df = pd.concat([pd.DataFrame(x) for x in income_hist[t]], axis=1)
    cash_df = pd.concat([pd.DataFrame(x) for x in cash_hist[t]], axis=1)
    comb = pd.concat([bs_df, income_df, cash_df])

    #check duplicated index  
    print (comb[comb.index.duplicated(keep=False)])
    #remove duplicated rows(because same)
    comb = comb[~comb.index.duplicated()]
    dfs.append(comb)


df = pd.concat(dfs, keys=tickers, axis=1)

print (df.head())

                            PIH                                  AAPL  \
                     2017-12-31  2016-12-31  2015-12-31    2017-09-30   
accountsPayable       9805000.0   7294000.0   5146000.0  7.479300e+10   
capitalExpenditures    -28000.0    -83000.0    -48000.0 -1.245100e+10   
capitalSurplus       47064000.0  46809000.0  48688000.0           NaN   
cash                 23575000.0  43045000.0  47957000.0  2.028900e+10   
changeInCash        -19470000.0  -4912000.0  -5682000.0 -1.950000e+08   

                                                         AMZN                \
                       2016-09-24    2015-09-26    2017-12-31    2016-12-31   
accountsPayable      5.932100e+10  6.067100e+10  5.278600e+10  3.904800e+10   
capitalExpenditures -1.273400e+10 -1.124700e+10 -1.195500e+10 -7.804000e+09   
capitalSurplus                NaN           NaN  2.138900e+10  1.718600e+10   
cash                 2.048400e+10  2.112000e+10  2.052200e+10  1.933400e+10   
changeInCash        -6.360000e+08  7.276000e+09  1.188000e+09  3.444000e+09   


                       2015-12-31  
accountsPayable      3.076900e+10  
capitalExpenditures -5.387000e+09  
capitalSurplus       1.339400e+10  
cash                 1.589000e+10  
changeInCash         1.333000e+09

In my solution get Multiindex in columns, so for selecting values is possible use xs or slicers:

print (df.xs('PIH', axis=1).head())
                     2017-12-31  2016-12-31  2015-12-31
accountsPayable       9805000.0   7294000.0   5146000.0
capitalExpenditures    -28000.0    -83000.0    -48000.0
capitalSurplus       47064000.0  46809000.0  48688000.0
cash                 23575000.0  43045000.0  47957000.0
changeInCash        -19470000.0  -4912000.0  -5682000.0

idx = pd.IndexSlice
print (df.loc['accountsPayable', idx['PIH', ['2017-12-31', '2016-12-31']]])
PIH  2017-12-31    9805000.0
     2016-12-31    7294000.0
Name: accountsPayable, dtype: float64

print (df.loc['accountsPayable', idx['PIH',  '2016-12-31']])
7294000.0

@user9532692 - Ya, it was pain, this kind of errors are very hard cached :) But I am happy I find problem :) Good luck! — jezrael, Apr 12 '18 at 12:49

score -1 · Answer 2 · answered Apr 12 '18 at 07:24

-1

try it:

result = pd.merge([bs_df, income_df, cash_df])

answered Apr 12 '18 at 07:24

Jay Pratap Pandey

352
2
9
19

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – rollstuhlfahrer Apr 12 '18 at 08:34

python - pandas append dataframe failing

2 Answers2