2

I want to display only filtered rows in the Excel output. Below is example,

df = pd.DataFrame({'Data': [1, 2, 3, 4, 
                               5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx',  
                   engine ='xlsxwriter') 

df.to_excel(writer, sheet_name ='Sheet1') 
writer.save() 

For output I want to hide all rows where 'Data' < 5. How to do this? It is equivalent to applying filter and saving the excel.

I know how to remove dups in pandas or filter in pandas. But I do not want to remove them in pandas, I simply want to apply filter in excel. The use case is that user will have full data in excel but certain rows will be hidden, if user wants they can unhide them in excel and look at data. Hope this explains use case

Thank you

nilesh
  • 327
  • 3
  • 16
  • If dupe is wrong, let me know. – jezrael Jan 23 '19 at 12:22
  • 1
    This isn't a dupe, at least of the linked question. The OP is asking how to filter rows in the XlsxWriter file that is created from Pandas, not how to filter in Pandas itself. – jmcnamara Jan 23 '19 at 13:00
  • Having said that, the solution will involve finding rows that would be filtered in the dataframe and then applying them to the XlsxWriter file. So some parts of the linked question will be relevant. – jmcnamara Jan 23 '19 at 13:02
  • @jezrael, I know how to remove dups in pandas. But I do not want to remove them in pandas, I simply want to apply filter in excel. The use case is that user will have full data in excel but certain rows will be hidden, if user wants they can unhide them in excel and look at data. Hope this explains use case. – nilesh Jan 23 '19 at 13:29
  • @nilesh - reopened – jezrael Jan 23 '19 at 13:38
  • @jmcnamara - reopened – jezrael Jan 23 '19 at 13:38

2 Answers2

4
df = pd.DataFrame({'Data': [1, 2, 3, 4, 
                               5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx',  
                   engine ='xlsxwriter') 

df.to_excel(writer, sheet_name ='Sheet1') 
workbook = writer.book
worksheet1 = writer.sheets['Sheet1']

# Activate autofilter
worksheet1.autofilter(f'B1:B{len(df)}')
worksheet1.filter_column('B', 'x < 5')

# Hide the rows that don't match the filter criteria.
for idx, row_data in df.iterrows():
    region = row_data['Data']
    if not (region < 5):
        # We need to hide rows that don't match the filter.
        worksheet1.set_row(idx + 1, options={'hidden': True})

writer.save() 


   
Roelant
  • 4,508
  • 1
  • 32
  • 62
nilesh
  • 327
  • 3
  • 16
  • 1
    Good answer. One small correction. You need to filter on column B instead of A since the dataframe puts an index in column A and the data in column B. – jmcnamara Jan 24 '19 at 09:01
0

Create a dataframe with the filtered data and write this to the excel file:

import pandas as pd

df = pd.DataFrame({'Data': [1, 2, 3, 4, 5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx') 

df_filtered = df.loc[df.Data >= 5]

df_filtered.to_excel(writer, sheet_name ='Sheet1') 
writer.save() 

Remark: Had to remove xlswriter module since I do not have it on my system, but code should work with it as well.

albert
  • 8,027
  • 10
  • 48
  • 84
  • 1
    The OP wants all the data but a filter in the excel file i think – anky Jan 23 '19 at 12:23
  • 2
    @anky_91: If that's the case, https://xlsxwriter.readthedocs.io/working_with_autofilters.html might be worth a look. – albert Jan 23 '19 at 12:24
  • @albert I know how to filter in pandas. But the question is about how to filter in apply filter in excel writing. – nilesh Jan 23 '19 at 13:31