0

I have this defined function which calculates all of the neccessary statistics i need (e.g. two way anova and multicomparison).

def stats(F1_para1,F2_para2,para3,para4,value):
    #MEAN,SEM,COUNT
    msc = df.groupby([F1_para1,F2_para2,para3,para4])[value].agg(['mean','sem','count'])
    msc.reset_index(inplace=True) #converts any columns in index as columns
    pd.DataFrame(msc)
    #TWO-WAY ANOVE AND MULTICOMP
    df['comb'] = df[F1_para1].map(str) + "+" + df[F2_para2].map(str)
    mod = ols(value+'~'+F1_para1+'+'+F2_para2+'+'+F1_para1+'*'+F2_para2, data = df).fit()
    aov = anova_lm(mod, type=2) #mod needs to be the same text as mod (i.e. mod1,mod2)
    comparison=MultiComparison(df[value], df['comb'])
    tukey_df = pd.read_html(comparison.tukeyhsd().summary().as_html())[0]
    r=tukey_df[tukey_df['reject'] == True]
    df2=aov.append(r) #combines dataframes of aov and r

So when i use the function as follows:

Water_intake = stats('Time','Drug','Diet','Pre_conditions',value='Water_intake') 
food_intake = stats('Time','Drugs','Diet','Pre_conditions',value='Food_intake')

The output dataframes following statistical analysis from anova and multicompaison are combined and becomes a new dataframe - 'df2'. 'Value' is the column header of the dependent variable from the main dataframe (df in the code). so everytime i use this function with a different dependent variable from the main dataframe (e.g. food intake, water intake, etc), the statistics summary is exported to the df2 dataframe, which i want to save it as separate sheets into a "statistics" workbook.

I've looked at the solutions here: Save list of DataFrames to multisheet Excel spreadsheet

with ExcelWriter(r"path\statistics.xlsx") as writer:
        for n, df2 in enumerate(df2):
            df2.to_excel(writer,value)
        writer.save()

But i recieved this error:

AttributeError: 'str' object has no attribute 'to_excel'

Not sure if there is another way to achieve the same goal?

1 Answers1

0

You are using df2 when you're enumerating through df2, which will return the column names, which are strings not df, hence the error. You can check this by running:

for n, df2 in enumerate(df2):
     print(n)
     print(df2)

You're also not changing df2 or calling the function to get df2 in your for loop. I think the whole thing needs re-writing.

Firstly you need to add return df2 at the end of your function, so that you actually get your df2 when it's called.

def stats(F1_para1,F2_para2,para3,para4,value):
    #MEAN,SEM,COUNT
    msc = df.groupby([F1_para1,F2_para2,para3,para4])[value].agg(['mean','sem','count'])
    msc.reset_index(inplace=True) #converts any columns in index as columns
    pd.DataFrame(msc)
    #TWO-WAY ANOVE AND MULTICOMP
    df['comb'] = df[F1_para1].map(str) + "+" + df[F2_para2].map(str)
    mod = ols(value+'~'+F1_para1+'+'+F2_para2+'+'+F1_para1+'*'+F2_para2, data = df).fit()
    aov = anova_lm(mod, type=2) #mod needs to be the same text as mod (i.e. mod1,mod2)
    comparison=MultiComparison(df[value], df['comb'])
    tukey_df = pd.read_html(comparison.tukeyhsd().summary().as_html())[0]
    r=tukey_df[tukey_df['reject'] == True]
    df2=aov.append(r) #combines dataframes of aov and r
    return df2

Then your 2 function calls in the question will actually return something. To add these into an excel document, you can then do:

Water_intake = stats('Time','Drug','Diet','Pre_conditions',value='Water_intake') 
food_intake = stats('Time','Drugs','Diet','Pre_conditions',value='Food_intake')

to export these 2 to excel on different sheets, you can do:

writer = pd.ExcelWriter(r"path\statistics.xlsx")
Water_intake.to_excel(writer, sheet_name='Water_intake')
food_intake.to_excel(writer, sheet_name='Food_intake')
writer.save()

This should give you a spreadsheet with 2 sheets containing the different df2 on each. I don't know how many of these you need, or how you call the function differently for each, but it may be necessary to create a for loop.

Emi OB
  • 2,814
  • 3
  • 13
  • 29
  • I tried it, it seperates the columns from within df2, but i would like to save the entire df2 dataframe as its own excel sheet into the same excel workbook. Value is a dependent variable from the main dataframe (df in the code) and df2 is the dataframes containing the results from the statistical analysis (e.g. aov, tukeys, multicomp). so everytime i run the function with a different dependent variable from the main dataframe (i.e. value for df), it produces a new df2 which i want to save it as separate sheets into the same workbook. I'll add this to the question to clarify it. –  Nov 24 '21 at 10:55
  • @GLit98 ok, but you're not making a new df2 each time in the loop. Using df2 in enumerate will overwrite df2 to the strings of the header columns, so that is why you're getting an error. Could you perhaps add the code where you get your different df2s and running the function into the question? – Emi OB Nov 24 '21 at 11:01
  • Yeah sure, added it to the question now. The excel sheet name can equal to the 'value' inputted so that everytime the function is used, df2 is exported into and saved into seperate sheets into the same workbook. –  Nov 24 '21 at 11:18
  • @GLit98 have updated my answer a bit. Still don't think I've quite done what youre after, but closer? – Emi OB Nov 24 '21 at 11:48
  • Right okay, i tried it and keeps overwriting the same excel file. I think ill stick to writing it into excel seperately from the function. –  Nov 25 '21 at 01:04
  • @GLit98 that code should write to 1 excel document with 2 sheets. You need to put return df2 at the end of your function though – Emi OB Nov 25 '21 at 07:39