0

I have a panda's dataframe with df.shape = (36, 17). The dataframe contains 17 economic and fiscal indicators from the Netherlands, measured over 36 years.

I am trying to replicate a rules-based European agreement that provides an European memberstate with exact instructions how big a Required annual fiscal adjustment should be. This required fiscal adjustment depends on the level of government debt and on the output gap (which is an economic indicator for the business cycle). The exact instructions are provided in the table below.

SGP flexibility
(source: voxeu.org)

The relevant indicators from the table above have the following names in my dataframe:

  • Debt = gov_debt_perct_mev (60% = 60)
  • Output gap = output_gap_pf_sf
  • Difference between potential and growth = diff_pg_ag
  • Required annual fiscal adjustment = reqsb

I am now trying to replicate the table with Python code as to calculate what the required annual fiscal adjustment should be for all the years in my dataframe.

Sadly, if I run this code below I get len(reqsb) = 21456. But it should be equal to the amount of years which is 36.

Question: I cannot find the bug. Anybody any tips to get the correct length?

Sorry for this long story btw, but wanted to provide enough info :-)

reqsb = []

for debt in df.gov_debt_perct_mev:
    if (debt <= 60.0):
        for og in df.output_gap_pf_sf:
            if (og < -4.0):
                reqsb.append(0)
            if (og >= -4.0) & (og < -3.0):
                reqsb.append(0)
            if (og >= -3.0) & (og < -1.5):
                for diff in df.diff_pg_ag:
                    if (diff > 0):
                        reqsb.append(0.25)
                    else:
                        reqsb.append(0)
            if (og >= -1.5) & (og < 1.5):
                reqsb.append(0.5)
            if (og >= 1.5):
                for diff in df.diff_pg_ag:
                    if (diff > 0):
                        reqsb.append(0.76)
                    else:
                        reqsb.append(0.51)
    else:
        for og in df.output_gap_pf_sf:
            if (og < -4.0) :
                reqsb.append(0)
            if (og >= -4.0) & (og < -3.0):
                reqsb.append(0.25)
            if (og >= -3.0) & (og < -1.5):
                for diff in df.diff_pg_ag:
                    if (diff > 0):
                        reqsb.append(0.5)
                    else:
                        reqsb.append(0.25)
            if (og >= -1.5) & (og < 1.5):
                reqsb.append(0.51)
            if (og >= 1.5):
                for diff in df.diff_pg_ag:
                    if (diff > 0):
                        reqsb.append(1.0)
                    else:
                        reqsb.append(0.76)
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Tjeerd Tim
  • 71
  • 1
  • 8
  • 4
    instead of a big method, why not break it down into pieces. For example, instead of " if (og >= -4.0) & (og < -3.0):" you could have a method _value_in_valid_range(og) and then your code becomes much more readable and testable (you can then test each method separately). – user2266449 Oct 27 '15 at 11:14

2 Answers2

1

Your problem is here:

for debt in df.gov_debt_perct_mev:
    if (debt <= 60.0):
        for og in df.output_gap_pf_sf:
....
else:
    for og in df.output_gap_pf_sf:

You added to array at least xa + xb amount of entries. Where x is length of df.output_gap_pf_sf, a is amount of entries in df.gov_debt_perct_mev under 60 and b is (len(df.gov_debt_perct_mev) - a).

You shouldn't use 'for' loop in another 'for'.

Also, you have the similar problem in:

if (og >= -3.0) & (og < -1.5):
    for diff in df.diff_pg_ag:
                if (diff > 0):
                    reqsb.append(0.25)
                else:
                    reqsb.append(0)
Mariusz
  • 349
  • 2
  • 7
1

Well, as it seems, each time you loop, you evaluate 36 items because for each loop you don't select a specific cell of the dataframe but rather select a whole column.

for debt in df.gov_debt_perct_mev: # loop 36 times
    # if (debt <= 60.0) is true
        for og in df.output_gap_pf_sf: # loop another 36 times
            # and so on for each loop you write.

Since you want to iterate over each row (translating to each year) in your dataframe and evaluate thresholds for each value in specific cells you should use the iterrows() method on your dataframe df:

for index, row in df.iterrows():
    if row.gov_debt_perct_mev <= 60: 
        if (row.output_gap_pf_sf < -4.0):
            reqsb.append(0)
        # rest of if cases...
        if (row.output_gap_pf_sf >= 1.5):
            if row.diff_pg_ag > 0: 
                # append again..
    else:
        # similarly replace for loops with if 
        # stamements.

This means that you evaluate each cell for a specific year to see what value it should take.


An additional note here is, that you're probably confusing the binary and & operator with the logical and and.

In your case this doesn't actually disrupt the results of your conditions since you're comparing True & False which, as numbers, evaluate to the same result either you use & or and.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
  • To take this further, you could use an [apply](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html) combined with a function to carry out the logic to remove the loop entirely. This would make the code a lot more portable (and hopefully efficient) and allow the output to also be a pandas object without any conversion. – pbarber Nov 16 '15 at 13:04