1

Magicians out there....

I need your help with the best approaches for the below use case.

I have an excel sheet whith lakhs of rows of data and I need to filter it based on some criteria and need to create new multiple tiles. I am in no mood to do it manually hence started working out on a python script that will do the job for me. however, I am facing a few challenges in achieving the end goal. The challenges are the "Color Formatting" and "comment" added to the cell.

Let's recreate the scenario. I have attached a sample excel sheet for your reference here. it includes "Indian Cars" data with 4 headers called (Brand, Model, Fuel Type & Transmission Type). I need to filter the data based on "Brand" and create a new excel file (workbook) with the Brand name as the excel file name.

enter image description here

Approach 1:- First I started with loading an excelsheet into a data frame with Pandas and then filtered the data and exported it, that was quite fast and easy and I loved it. However, I am losing cell colors and added note to the cell (Model & Fuel type)

Note: I tried styling the pandas, however, for some reason, it's not working for me.

Approach 2:- I though of using Openpyxl & Xlsxwriter, however, the issue is I am unable to filter data and keep comments added to the header.

Approach 3:- Logically, I can create a copy of my existing sheet and delete the unwanted rows from it and save it with desired name, that should do the job for me. Unfortunately, I am unable to figure out how to achieve it in python.

Kindly share your thoughts on this and help me with right approach... and If I can get a sample code or full code... that would just make my day... :D

BigBen
  • 46,229
  • 7
  • 24
  • 40
Krishna
  • 53
  • 7
  • Just share the code you already implemented and I can complete it with the filtering and styling part. – kithuto Oct 21 '22 at 14:06
  • What kind of styling are you refering to? Can you specify a filter example? – kithuto Oct 21 '22 at 14:12
  • @NachoR. - Thank you for your help. I tried the code that you wrote and it's working partially... though it's quite helpful, however, below are few things that I am looking for. 1. It should create new excel files(workbook) - your code is creating new sheets in the same workbook. 2. each workbook should have only one brand detail in it. - your code is just duplicating the same information in 3 different sheets. 3. If possible - can we retain the comment added to the headers, if not then it's fine. – Krishna Oct 21 '22 at 16:04
  • I already fixed this issues and you can change the behavor of the program to create diferent excel files. – kithuto Oct 21 '22 at 16:15
  • @NachoR. - This is a great help. Your code is working like magic bro.... thank you so much for all the help.. I am yet to test the "Comment" part. I will do some R&D on that part and update you. Thank you once again for the help man... – Krishna Oct 25 '22 at 05:57
  • Does this answer your question? [How to avoid using Select in Excel VBA](https://stackoverflow.com/questions/10714251/how-to-avoid-using-select-in-excel-vba) – Dominique Oct 26 '22 at 12:09

1 Answers1

1

This should do the trick. You can change the colors of the headers.

Code for custom styling of the excel added.

import pandas as pd

# function to style the dataframe with some conditons (simple condition for an example you can change or add conditions with multiple rows)
def style_df(row):
    values = pd.Series(data=False, index=row.index)
    if not pd.isna(row['Transmission Type']):
        if row['Transmission Type'].strip() == 'Manual':
            return ['background-color : gray; color: red' for _ in values]
        elif row['Transmission Type'].strip() == 'Manual, Automatic':
            return ['background-color : lightblue; color: green' for _ in values]
    
    return ['' for _ in values]

page = pd.read_excel("Cars_in_india.xlsx", 'Cars in India')

# creating an excel file for each brand
for brand in page.Brand.unique():
    writer = pd.ExcelWriter(brand+".xlsx", engine = 'xlsxwriter')
    workbook = writer.book
    border_fmt = workbook.add_format({'bottom':1, 'top':1, 'left':1, 'right':1})

    dataframe = page[page.Brand == brand].copy()

    dataframe = dataframe.style.apply(style_df, axis=1)
    dataframe.to_excel(writer, index=False, sheet_name=brand)
    
    # dynamic columns sizes
    for column in page[page.Brand == brand]:
        column_width = max(page[page.Brand == brand][column].astype(str).map(len).max(), len(column))
        col_idx = page[page.Brand == brand].columns.get_loc(column)
        writer.sheets[brand].set_column(col_idx, col_idx, column_width)
        
    worksheet = writer.sheets[brand]
    
    #applying style to the header columns
    worksheet.write(0, 1, "Model", workbook.add_format({'fg_color': '#00FF00'}))
    worksheet.write(0, 2, "Fuel Type", workbook.add_format({'fg_color': '#FFA500'}))
    
    # applying borders to the table
    worksheet.conditional_format(xlsxwriter.utility.xl_range(0, 0, len(page[page.Brand == brand]), len(page[page.Brand == brand].columns)-1), {'type': 'no_errors', 'format': border_fmt})

    writer.save()

You can use openpyxl to read the coments and then write the comments when creating the excel. But you used a type of comment not compatible with the current version of excel that openpyxl uses (you will see the same error in the google cloud editor). Then, the only option is to change the type of the comment or rewrite them in the python code.

Example code:

from openpyxl import load_workbook

wb = load_workbook("Cars_in_india.xlsx")
ws = wb["Cars in India"]

_, comment, comment2, _ = list(ws.rows)[0]

# then after this code:
# worksheet.write(0, 1, "Model", workbook.add_format({'fg_color': '#00FF00'}))
# worksheet.write(0, 2, "Fuel Type", workbook.add_format({'fg_color': '#FFA500'}))
# you can add:
worksheet.write_comment('B1', comment.text)
worksheet.write_comment('C1', comment2.text)
kithuto
  • 455
  • 2
  • 11
  • `read_excel` will lose the formatting from the original file, as OP has already tried and mentioned. – BigBen Oct 21 '22 at 14:31
  • Fixed with some custom styling – kithuto Oct 21 '22 at 15:43
  • That won't preserve comments from the original file. – BigBen Oct 21 '22 at 15:45
  • @BigBen - Would you have any thoughts on this? – Krishna Oct 21 '22 at 16:21
  • I'd honestly either use xlwings (requires an Excel installation), or VBA, for this. It's very trivial in VBA. – BigBen Oct 21 '22 at 16:25
  • @NachoR. - Your comment code also worked pretty well bro, I am able to add comments to the cell that I want to. As you mentioned, it works on the older version of comments, however, this will work too for me. If it is not to much to ask, could you please help me with one more requirement. I need to filter data based on rows and style the range. for example. In Maruti Brand sheet, filter all the rows which "Transmission Type" as "Manual" and change the background color to "grey" and font color to "Dark grey", also sort them based based on "Transmission Type" Column. – Krishna Oct 25 '22 at 13:49
  • Here is the Sample sheet link. https://docs.google.com/spreadsheets/d/1402kHU3cc99GpMw42ZfcOWVIQ0Ysqm3T/edit?usp=sharing&ouid=100667446761329536779&rtpof=true&sd=true – Krishna Oct 25 '22 at 13:49
  • @Krishna Hi, let me edit the answer with the style changes. The backgrond should be changed in all the row or only some columns? – kithuto Oct 26 '22 at 10:58
  • You can use hexadecimal colors in the style insted of strings if you need a specific color – kithuto Oct 26 '22 at 11:56
  • @NachoR. - The background color should be changed in all the rows.(filtered rows I mean) – Krishna Oct 27 '22 at 07:23
  • 1
    @Krishna is how I already implemented in the code. Hope it works for you – kithuto Oct 27 '22 at 07:35
  • @NachoR. - Bro you are so awesome . Yup it's working as expected.... Thank you so much once again... you are a magician indeed.... – Krishna Oct 27 '22 at 08:49
  • @NachoR. -- Brother, I need quick help from you on the above code. I am using your above code to create different files for a different purposes. However, now I have got a situation. This time I need to keep a set of data fixed in the first few rows in each sheet, and then I need to start writing the brand data. Could you please help me with this? I would really appreciate your help on this, brother. – Krishna Dec 11 '22 at 14:50
  • 1
    @Krishna you can just save the header data in a new dataframe then just after: dataframe = page[page.Brand == brand].copy() Just concatenate using pd.concat: dataframe = pd.concat([header_data, dataframe]) – kithuto Dec 13 '22 at 09:02