0

I'm in the midst of writing a iPython notebook that will pull the contents of a .csv file and paste them into a specified tab on an .xlsx file. The tab on the .xlsx is filled with a bunch of pre-programmed formulas so that I might run an analysis on the original content of the .csv file.

I've ran into a snag, however, with the the date fields that I copy over from the .csv into the .xlsx file.

The dates do not get properly processed by the Excel formulas unless I double-click the date cells or apply Excel's "text to columns" function on the column of dates and set a tab as the delimiter (which I should note, does not split the cell).

I'm wondering if there's a way to either...

  • write a helper function that logs the keystrokes of applying the "text to columns" function call
  • write a helper function to double click and return down each row of the column of dates

    from openpyxl import load_workbook
    import pandas as pd
    
    def transfer_hours(report_name, ER_hours_analysis_wb):
    
        df = pd.read_csv(report_name, index_col=0)
    
        book = load_workbook(ER_hours_analysis_wb)
        sheet_name = "ER Work Log"
    
        with pd.ExcelWriter("ER Hours Analysis 248112.xlsx", 
            engine='openpyxl')  as writer:
    
            writer.book = book
            writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
    
            df.to_excel(writer, sheet_name=sheet_name, 
                startrow=1, startcol=0, engine='openpyxl')
    
brddawg
  • 434
  • 8
  • 19
Coleman
  • 1
  • 1

2 Answers2

0

As great a tool as pandas is designed to be, in this case there may not be a reason to include.

Here is a shorter structure for what you're trying to accomplish:

import csv
import datetime
from openpyxl import load_workbook

def transfer_hours(report_name, ER_hours_analysis_wb):
    wb = load_workbook(ER_hours_analysis_wb)
    ws = wb['ER Work Log'] 

    csvfile = open(report_name, 'rt')
    reader = csv.reader(csvfile,delimiter=',')

    #iterators
    rownum = 0
    colnum = 0

    for row in reader:       
        for col in row:
            dttm = datetime.datetime.strptime(col, "%m/%d/%Y")
            ws.cell(column=colnum,row=rownum).value = dttm

    wb.save('new_spreadsheet.xlsx')

What you'll be able to do from here is break out which columns should have what format based on the position in the csv. Here is an example:

    for row in reader:       
        ws.cell(column=0,row=rownum,value=row[0])
        dttm = datetime.datetime.strptime(row[1], "%m/%d/%Y")
        ws.cell(column=1,row=rownum).value = dttm

For reference:

https://openpyxl.readthedocs.io/en/stable/usage.html

In Python, how do I read a file line-by-line into a list?

How to format columns with headers using OpenPyXL

brddawg
  • 434
  • 8
  • 19
0

Use the xlsx module

import xlsx
load_workbook  ( filen = (filePath,  read_only=False, data_only=False )

Setting data_only to False will return the formulas whereas data_only=True returns the non-formula values.

RasikhJ
  • 601
  • 5
  • 10