0

I have about 88 columns in a pandas dataframe. I'm trying to apply a formula that calculates a single value for each column. How do I switch out the name of each column and then build a new single-row dataframe from the equation?

Below is the equation (linear mixed model) which results in a single value for each column.

B1 = (((gdf.groupby(['Benthic_Mo'])['SHAPE_Area'].sum())/Area_sum) * 
  (gdf.groupby(['Benthic_Mo'])['W8_629044'].mean())).sum()

Below is a sample of the names of the columns

['OBJECTID', 'Benthic_Mo', 'SHAPE_Leng', 'SHAPE_Area', 'geometry', 'tmp', 'Species','W8_629044', 'W8_642938', 'W8_656877', 'W8_670861', 'W8_684891', 'W8_698965', 'W8_713086', 'W8_72726',...]

The columns with W8_## need to be switched out in the formula, but about 80 of them are there. The output I need is a new dataframe with a single row. I also would like to calculate the variance or Standard deviation from the data calculated with the formal.

thank you!

Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
domarom
  • 1
  • 4
  • Can you add some sample data? Cf. [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15873043). There's most likely a solution without any (manual) looping but it's difficult to give concrete advice without any test data to verify potential solutions. – fsimonjetz Sep 01 '22 at 08:03

2 Answers2

0

You can loop through the dataframe columns. I think the below code should work.

collist = list(orignal_dataframe.columns)
emptylist = []
emptydict = {}
for i in collist[7:]:
    B1 = (((gdf.groupby(['Benthic_Mo'])['SHAPE_Area'].sum())/Area_sum) *  (gdf.groupby(['Benthic_Mo'])[i].mean())).sum()
    emptydict[i]  = B1
emptylist.append(emptydict)
resdf = pd.DataFrame(emptylist)
  • Both `gdf.groupby(['Benthic_Mo'])` and `(gdf.groupby(['Benthic_Mo'])['SHAPE_Area'].sum())/Area_sum)` should be computed only one time. – BloomShell Sep 01 '22 at 06:16
  • That worked! the emptydict should be []. The only problem is the new dataframe is still a 81 by 81 matrix with a bunch of NaN and the corrected values diagonally in the middle of the matrix. How would I make the new df a single row? – domarom Sep 01 '22 at 06:18
  • Actually, I used this and it collapsed everything. Thank you! "resdf1 = resdf.apply(lambda x: pd.Series(x.dropna().values))" – domarom Sep 01 '22 at 06:29
  • I corrected the answer, I think it should be helpful. – Hitesh Rahangdale Sep 01 '22 at 14:18
-1

to create new df with the results in each new col (one row), you can use similar as below:

W8_cols = [col for col in df.columns if 'W8_' in col]

df_out = pd.DataFrame()
for col in W8_cols:
    B1 = (((gdf.groupby(['Benthic_Mo'])['SHAPE_Area'].sum()) / Area_sum) *
          (gdf.groupby(['Benthic_Mo'])[col].mean())).sum()
    t_data = [{col: B1}]
    df_temp = pd.DataFrame(t_data)
    data = [df_out, df_temp]
    df_out = pd.concat(data, axis=1)
NoobVB
  • 989
  • 6
  • 10