-1

I have to get rid of duplicate rows on a *.xlsx file on a project. I have the code down here. But in the output file, date values turns into "yy-mm-dd hh:mm:ss" format after runnning my code. What would be the cause and solution to that wierd problem?

Running it on Pycharm 2019.2 Pro and Python 3.7.4

import pandas

mExcelFile = pandas.read_excel('Input/ogr.xlsx')
mExcelFile.drop_duplicates(subset=['FName', 'LName', 'Class', '_KDT'], inplace=True)
mExcelFile.to_excel('Output/NoDup.xlsx')

I'm expecting dates stay in original format which is "dd.mm.yy" but values become "yy-mm-dd hh:mm:ss"

  • 3
    Maybe you just have to make an Excel column wider? Probably Pandas turns date into datetime and it is wide to fit in regular column's width. It can be seen on your screenshot in formulas area. – crayxt Aug 31 '19 at 13:06
  • thats embarrassingly true :( @crayxt – Estelle Arquette Aug 31 '19 at 13:10

2 Answers2

0

To control date format when writing to Excel, try this:

writer = pd.ExcelWriter(fileName, engine='xlsxwriter', datetime_format='dd/mm/yy')
df.to_excel(writer)
crayxt
  • 2,367
  • 2
  • 12
  • 17
0

Actually answer from the link below solved it. Since I am new to python programming I didn't realize where the problem was. It was actually pandas converting cellvalues to datetimes. Detailed answer : https://stackoverflow.com/a/49159393/11584604