How to drop null values in dataframe after groupby while writing to excel

Question

I have the following dfe:-

  ID CATEG   LEVEL     COLS    VALUE COMMENTS
    1   A        2     Apple    428  comment1
    1   A        3     Apple    175  comment1
    1   C        1     Apple    226  comment1
    1   C        2     Apple    884  comment1
    1   C        3     Apple    289  comment1
    1   B        1     Apple    712  comment1
    1   B        2     Apple    849  comment1
    2   B        3     Apple    376  comment1
    2   C        None  Orange   591  comment1
    2   B        None  Orange   135  comment1
    2   D        None  Orange   423  comment1
    2   A        None  Orange   866  comment1
    2            None  Orange   496  comment2

I want to pivot by one column COLS of dfe , groupby ID and write in excel such that each ID data is on one sheet. What I tried :-

df=pd.pivot_table(dfe,index=['ID','CATEG','LEVEL'],columns=['COLS'],values=['VALUE'])
    

with pd.ExcelWriter('file.xlsx',options={'nan_inf_to_errors': True}) as writer :
        df.groupby('ID').apply(lambda x: x.to_excel(writer,sheet_name=str(x.name),na_rep=0,index=True))
writer.save()

The problem I'm facing doing so is after the groupby many columns are 0, I want to remove columns which are null after the groupby and before writing to excel. I cannot remove null column before groupby as the whole column won't be null then

score 1 · Accepted Answer · answered Oct 14 '20 at 08:45

1

You can remove all columns with only missing values by DataFrame.dropna by how='all' and axis=1 parameters:

with pd.ExcelWriter('file.xlsx',options={'nan_inf_to_errors': True}) as writer :
        df.groupby('ID').apply(lambda x: x.dropna(how='all', axis=1).to_excel(writer,sheet_name=str(x.name),na_rep=0,index=True))
writer.save()

answered Oct 14 '20 at 08:45

jezrael

822,522
95
1,334
1,252

Hi. Thanks for the reply, this worked, but I'm unable to get all the headers on one line, how do I do that eg : `0 value` is shown on first row then pivot column headers in second row and `ID`, `CATEG` , `LEVEL` in next row,, I want all the headers in one row how do I do this. `header=false` removes all the 3 rows – Scope Oct 14 '20 at 10:29
1

try change `df=pd.pivot_table(dfe,index=['ID','CATEG','LEVEL'],columns=['COLS'],values=['VALUE'])` to `df=pd.pivot_table(dfe,index=['ID','CATEG','LEVEL'],columns='COLS',values='VALUE')` – jezrael Oct 14 '20 at 10:30
It worked . Im facing an issue where all numbers appearing in scinetifc format `pd.options.display.float_format = '{:.2f}'.format` this makes the numbers as decimals , which im trying to avoid, is there any other way to avoid scientific fromatting this – Scope Oct 14 '20 at 10:36
@Scope - Some ideas - [this](https://stackoverflow.com/questions/17737300/suppressing-scientific-notation-in-pandas). – jezrael Oct 14 '20 at 10:38
All these converts to float which im trying to avoid, as i dont have any float values in my data.. – Scope Oct 14 '20 at 10:45
@Scope - Unfortunately not idea :( It seems data related issue, so for test is necessary create data with same issue. – jezrael Oct 14 '20 at 10:46
thanks no issues. Also could you answer my previous question [here](https://stackoverflow.com/questions/64330740/how-to-group-by-column-in-a-dataframe-and-create-pivot-tables-in-a-loop) if you have an idea how to ,where we need to use 2 levels of pivot ,somewhat related to this question – Scope Oct 14 '20 at 10:51
If `df`grouped by `ID`and values in `LEVEL` column are null/nan, I do not want to add that column `LEVEL` to `columns` parameter of `pivot_table`, how to do this? – Scope Oct 14 '20 at 18:45
@Scope You can filter them before pivoting by `df1 = df[df['LEVEL'].isna()]` and `df2 = df[df['LEVEL'].notna()]` and processing only df2. – jezrael Oct 14 '20 at 19:25
Within an `ID` group if there are `na` `LEVEL`, I want to pivot and give one sheet name and if for all `ID` if there are are `na` `LEVEL` I want to pivot and give another sheet name. How do I achieve this? – Scope Oct 15 '20 at 06:33
Also after I create a `pivot` how do I sort values according to another `df`?Im able to do it before `pivot` and see `CATEG` in the order I want it to be, but after `pivot` orde gets changed, I cannot come to understand why – Scope Oct 15 '20 at 06:38

How to drop null values in dataframe after groupby while writing to excel

1 Answers1

Linked