1

I've got a set of dataframes that through lots of fiddling have comments and flags on whether or not the data is good:

df = pd.DataFrame({'Name': ['a','a','b','b'],
                   'Measurements': ['temp','pressure','temp','pressure'],
                   'Values': [1, 2, -1, np.nan],
                   'Comment': ['','','Is negative', 'Is NaN'],
                   'IsBad':[False, False, True, True] })
measurements = df.reset_index().pivot(index='Name',columns='Measurements',values='Values')
comments = df.reset_index().pivot(index='Name',columns='Measurements',values='Comment')
bad_cells = df.reset_index().pivot(index='Name',columns='Measurements',values='IsBad')

What I'd like to do is export this to an excel file that has the cells formatted by "IsBad" and a comment stuck in to explain why the cell is bad (ignore that this example starts at A2...): Dummy excel file

I've been looking at a couple different ways to do this and I'm stuck, I can create an excel file with a new sheet for each pivoted dataframe but I can't seem to create comments using other sheets as an input. XlsxWriter also seems to not be too great at creating an excel file in a loop where I could define each cell based on rows in the "df" object.

Any ideas?

RedM
  • 344
  • 2
  • 15

2 Answers2

2

I ended up with this:

import pandas as pd
import xlsxwriter

df = pd.DataFrame({'Name': ['a','a','b','b'],
                   'Measurements': ['temp','pressure','temp','pressure'],
                   'Values': [1, 2, -1, -999],
                   'Comment': ['','','Is negative', 'Is NaN'],
                   'IsBad':[False, False, True, True] })
                   
measurements = df.reset_index().pivot(index='Name',columns='Measurements',values='Values')
comments = df.reset_index().pivot(index='Name',columns='Measurements',values='Comment')
bad_cells = df.reset_index().pivot(index='Name',columns='Measurements',values='IsBad')

wb = xlsxwriter.Workbook('measurements.xlsx')
ws = wb.add_worksheet()
bg = wb.add_format({'bg_color': 'yellow'})

ws.write('A1', 'Name')
ws.write('B1', 'pressure')
ws.write('C1', 'temp')

i = 2
for key, row in measurements.iterrows():
    ws.write('A' + str(i), key)
    ws.write('B' + str(i), row[0])
    ws.write('C' + str(i), row[1])

    if bad_cells['pressure'].index.get_loc(key):
        ws.write('B' + str(i), row[0], bg)
        ws.write_comment('B' + str(i), comments['pressure'][i - 2]) 
    
    if bad_cells['temp'].index.get_loc(key):
        ws.write('C' + str(i), row[1], bg)
        ws.write_comment('C' + str(i), comments['temp'][i - 2]) 
    i += 1

wb.close()

Output

I had to change np.nan with -999, because is not possible to write a NaN value in an excel. Anyway, you could try other replaces: xlsxwriter docs

Alex Montano
  • 343
  • 1
  • 4
  • 11
  • 1
    Good solution. Note there is also a `nan_inf_to_errors` [constructor option](https://xlsxwriter.readthedocs.io/workbook.html#constructor) in XlsxWriter. – jmcnamara Nov 25 '21 at 16:21
  • Ah, okay. This looks promising. The looping through the dataframe helps here. Let me check this out – RedM Nov 26 '21 at 12:07
1

Alex Montano's answer was good but didn't work too well on my real data due to some hard coding on the column names/indexes. I'm posting the more general solution (based off his work) here. I've also pulled in a function, from another StackOverflow question, that would allow tables that exceed 26 columns to have AA, AB, AC... columns:

import pandas as pd
import xlsxwriter

df = pd.DataFrame({'Name': ['a','a','b','b'],
                   'Measurements': ['temp','pressure','temp','pressure'],
                   'Values': [1, 2, -1, -999],
                   'Comment': ['','','Is negative', 'Is NaN'],
                   'IsBad':[False, False, True, True] })
                   
measurements = df.reset_index().pivot(index='Name',columns='Measurements',values='Values')
comments = df.reset_index().pivot(index='Name',columns='Measurements',values='Comment')
bad_cells = df.reset_index().pivot(index='Name',columns='Measurements',values='IsBad')

# https://stackoverflow.com/questions/42176498/repeating-letters-like-excel-columns
def excel_cols():
    n = 1
    while True:
        yield from (''.join(group) for group in itertools.product(string.ascii_uppercase, repeat=n))
        n += 1

wb = xlsxwriter.Workbook('my_file.xlsx',{'strings_to_numbers': True})

ws = wb.add_worksheet('QC')
bad_format = wb.add_format({'bg_color': 'yellow'})

# Write column names, reset index so that they're just another column
xl_df  = measurements.reset_index().astype(object).replace(np.nan, '')
column_index = list(itertools.islice(excel_cols(), len(xl_df.columns)))

# Create cell index values in Excel format ...X,Y,Z,AA,AB... etc
for column_ix, column in zip(column_index, xl_df.columns):
    ws.write(column_ix + '1', column)

i = 2
for key, row in xl_df.iterrows():
    for column_ix, column in zip(column_index, row.index):
        this_index = column_ix + str(i)
        cell_value = row[column]
        this_cell_is_bad = bad_cells.reset_index()[column].iloc[key]
        this_comment = comments.reset_index()[column].iloc[key]
        
        if this_cell_is_bad == True:
            ws.write(this_index, cell_value, bad_format)
            ws.write_comment(this_index, this_comment)
        else:
            ws.write(this_index, cell_value)  
    i += 1

wb.close()
RedM
  • 344
  • 2
  • 15
  • Note, you don't need to use cell ranges like A1, Z1, etc, or the `excel_cols()` function. All methods in XlsxWriter take a (row, col) index as well as an A1 cell range. Have a look at the docs for `write()` and `write_comment()`. https://xlsxwriter.readthedocs.io/worksheet.html – jmcnamara Nov 30 '21 at 15:37
  • Aaah, that would make things much easier. – RedM Dec 01 '21 at 09:07