Display only filtered rows in Excel writer output using pandas

Question

I want to display only filtered rows in the Excel output. Below is example,

df = pd.DataFrame({'Data': [1, 2, 3, 4, 
                               5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx',  
                   engine ='xlsxwriter') 

df.to_excel(writer, sheet_name ='Sheet1') 
writer.save()

For output I want to hide all rows where 'Data' < 5. How to do this? It is equivalent to applying filter and saving the excel.

I know how to remove dups in pandas or filter in pandas. But I do not want to remove them in pandas, I simply want to apply filter in excel. The use case is that user will have full data in excel but certain rows will be hidden, if user wants they can unhide them in excel and look at data. Hope this explains use case

Thank you

This isn't a dupe, at least of the linked question. The OP is asking how to filter rows in the XlsxWriter file that is created from Pandas, not how to filter in Pandas itself. — jmcnamara, Jan 23 '19 at 13:00
Having said that, the solution will involve finding rows that would be filtered in the dataframe and then applying them to the XlsxWriter file. So some parts of the linked question will be relevant. — jmcnamara, Jan 23 '19 at 13:02
@jezrael, I know how to remove dups in pandas. But I do not want to remove them in pandas, I simply want to apply filter in excel. The use case is that user will have full data in excel but certain rows will be hidden, if user wants they can unhide them in excel and look at data. Hope this explains use case. — nilesh, Jan 23 '19 at 13:29

score 4 · Accepted Answer · edited Dec 01 '20 at 10:43

df = pd.DataFrame({'Data': [1, 2, 3, 4, 
                               5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx',  
                   engine ='xlsxwriter') 

df.to_excel(writer, sheet_name ='Sheet1') 
workbook = writer.book
worksheet1 = writer.sheets['Sheet1']

# Activate autofilter
worksheet1.autofilter(f'B1:B{len(df)}')
worksheet1.filter_column('B', 'x < 5')

# Hide the rows that don't match the filter criteria.
for idx, row_data in df.iterrows():
    region = row_data['Data']
    if not (region < 5):
        # We need to hide rows that don't match the filter.
        worksheet1.set_row(idx + 1, options={'hidden': True})

writer.save()

Good answer. One small correction. You need to filter on column B instead of A since the dataframe puts an index in column A and the data in column B. — jmcnamara, Jan 24 '19 at 09:01

score 0 · Answer 2 · answered Jan 23 '19 at 12:21

0

Create a dataframe with the filtered data and write this to the excel file:

import pandas as pd

df = pd.DataFrame({'Data': [1, 2, 3, 4, 5, 6, 7]}) 

writer = pd.ExcelWriter('pandasEx.xlsx') 

df_filtered = df.loc[df.Data >= 5]

df_filtered.to_excel(writer, sheet_name ='Sheet1') 
writer.save()

Remark: Had to remove xlswriter module since I do not have it on my system, but code should work with it as well.

answered Jan 23 '19 at 12:21

albert

8,027
10
48
84

1

The OP wants all the data but a filter in the excel file i think – anky Jan 23 '19 at 12:23
2

@anky_91: If that's the case, https://xlsxwriter.readthedocs.io/working_with_autofilters.html might be worth a look. – albert Jan 23 '19 at 12:24
@albert I know how to filter in pandas. But the question is about how to filter in apply filter in excel writing. – nilesh Jan 23 '19 at 13:31

Display only filtered rows in Excel writer output using pandas

2 Answers2