0

I'm running a loop that appends values to an empty dataframe out side of the loop. However, when this is done, the datframe remains empty. I'm not sure what's going on. The goal is to find the power value that results in the lowest sum of squared residuals.

Example code below:

import tweedie

power_list = np.arange(1.3, 2, .01)
mean = 353.77
std = 17298.24
size = 860310
x = tweedie.tweedie(mu = mean, p = 1.5, phi = 50).rvs(len(x))
variance = 299228898.89

sum_ssr_df = pd.DataFrame(columns = ['power', 'dispersion', 'ssr'])

for i in power_list:

    power = i

    phi = variance/(mean**power)

    tvs = tweedie.tweedie(mu = mean, p = power, phi = phi).rvs(len(x))

    sort_tvs = np.sort(tvs)

    df = pd.DataFrame([x, sort_tvs]).transpose()
    df.columns = ['actual', 'random']
    df['residual'] = df['actual'] - df['random']
    ssr = df['residual']**2
    sum_ssr = np.sum(ssr)
    df_i = pd.DataFrame([i, phi, sum_ssr])
    df_i = df_i.transpose()
    df_i.columns = ['power', 'dispersion', 'ssr']
    sum_ssr_df.append(df_i)    

sum_ssr_df[sum_ssr_df['ssr'] == sum_ssr_df['ssr'].min()]

What exactly am I doing incorrectly?

Jordan
  • 1,415
  • 3
  • 18
  • 44
  • 3
    One doesn't. [You append to a list, then concat after the loop](https://stackoverflow.com/a/37009561/4333359) – ALollz May 16 '19 at 18:10
  • 1
    But to explain your real problem, DataFrames aren't like lists. While for a list `my_list.append(other)` changes `my_list` for DataFrames you would need `my_df = my_df.append(other)` – ALollz May 16 '19 at 18:30

1 Answers1

0

This code isn't as efficient as is could be as noted by ALollz. When you append, it basically creates a new dataframe in memory (I'm oversimplifying here).

The error in your code is:

 sum_ssr_df.append(df_i)

should be:

 sum_ssr_df = sum_ssr_df.append(df_i)
Polkaguy6000
  • 1,150
  • 1
  • 8
  • 15