2

I have a dataframe with date column. Column includes "custom" and "general" datatypes. I want to change it all datetime format. "43891" means "01.03.2020 00:00:00"

TARİH
28.02.2020  00:00:00 -->custom
28.02.2020  00:00:00 -->custom
28.02.2020  00:00:00 -->custom
43891 -->general
43891 -->general
43891 -->general
.
.

Here what I have tried below, same problem with me (ref. changing all dates to standard date time in dataframe)

import pandas as pd
from datetime import datetime, timedelta

def from_excel_ordinal(ordinal, _epoch0=datetime(1899, 12, 31)):
    if ordinal >= 60:
        ordinal -= 1  # Excel leap year bug, 1900 is not a leap year!
    return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)

df = pd.read_excel('D:\Documents\Desktop\deneme/deneme1.xlsx', sheet_name='Sheet1')
m = df['TARİH'].astype(str).str.isdigit()

df.loc[m, 'TARİH'] = \
df.loc[m, 'TARİH']\
  .astype(int)\
  .apply(from_excel_ordinal)

df['TARİH'] = pd.to_datetime(df['TARİH'], errors='coerce')

df.to_excel('D:\Documents\Desktop\deneme/deneme1.xlsx',index=False)

When I apply these codes, I am sharing output below. "General type" cells turns to "NaT".

print(df.loc[3280:3286, 'TARİH'])

Output: 
2020-02-28
2020-02-28
2020-02-28
2020-02-28
NaT
NaT
NaT
Name: TARİH, dtype: datetime64[ns]

In this solution, changing all dates to standard date time in dataframe all column is "general" datatype. Due to that problem is solved. But when I apply above codes to my dataframe, Column D format is turning to "datetime" format. Due to that I am taking below error when I run the codes 2nd time:

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

I will use these codes everyday. Due to that I need solve the format cell problem. I can try also other methods if you offer.

Also I have 3000 rows. So I'm not able to apply manual methods.

piseynir
  • 237
  • 1
  • 4
  • 14

2 Answers2

1

IIUC, 43891 is the number of days since a zero-date:

# zero_date = 1899-12-29
zero_date = pd.to_datetime('2020-03-01') - pd.to_timedelta(43891, unit='D')

And then you can do np.select:

# you need dayfist
custom = pd.to_datetime(df['TARİH'], dayfirst=True, errors='coerce')

# general type
df['TARİH'] = np.where(custom.isna(), df['TARİH'],
                       (custom - zero_date)/pd.to_timedelta('1D')
                      )
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Thanks it worked but when I re-execute the code this error raised: TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns]. This code will run everyday, so everyday zero_date will be change. To contribute this code I need to check numerical datetime everytime. This is manual work – piseynir Apr 11 '20 at 14:58
  • You can only execute it once. After that, `df['TARIH']` is already of `datetime` type, and cannot be converted to `timedelta`. – Quang Hoang Apr 11 '20 at 15:00
  • Yes, I understand it but I need repeatable codes. Maybe if I locate the "integer value" in column, codes may be repeatable. But I don't know how to do. – piseynir Apr 11 '20 at 15:01
  • What do you mean repeatable? Don't you want to change the datatype? Repeatability should start from when you read the data. Anyway, try the new code, it should run *repeatedly*. – Quang Hoang Apr 11 '20 at 15:06
  • Repeatable means, I will use these codes for production follow. And I want to automate my work. If I open the excel everytime to check values, it won't be automation. – piseynir Apr 11 '20 at 15:08
  • I tried new codes, again error : TypeError: dtype datetime64[ns] cannot be converted to timedelta64[ns] error line: general = zero_date + pd.to_timedelta(df['TARİH'], unit='D', errors='coerce') also np is not defined. – piseynir Apr 11 '20 at 15:09
  • 1
    Updated, you don't need `general` for the final output. – Quang Hoang Apr 11 '20 at 15:11
  • Thanks for all replies. I' m new at python, sorry for too many questions :) I added "import numpy as np" but I' m taking this error: "TypeError: invalid type promotion" according to this line "(custom - zero_date)/pd.to_timedelta('1D')" – piseynir Apr 11 '20 at 15:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211438/discussion-between-caner-ucal-and-quang-hoang). – piseynir Apr 11 '20 at 15:56
0

There are no answers came to question with pandas method. Due to that I used "pynput.mouse" library.

When you change the column style to "short date" with mouse controller method, df['TARİH'] = pd.to_datetime(df['TARİH']) this code runs due to there are no mixed datetimes and integers passed array, whole column have same format.

If you have pandas or any other method, please answer.

from pynput.mouse import Button, Controller
import pandas as pd

#Go to desktop
mouse= Controller ()
mouse.move(1358,751)
mouse.click(Button.left, 1)

#Open folder
mouse.position=(632, 108)
time.sleep(2)
mouse.click(Button.left,2)

#Open excel file
mouse.position=(354, 127)
time.sleep(2)
mouse.click(Button.left,2)

#Select D column in excel
mouse.position=(250, 256)
time.sleep(10)
mouse.click(Button.left,1)

#Go to format cell area
mouse.position=(709, 87)
time.sleep(2)
mouse.click(Button.left,1)

#Change format to short date
mouse.position=(663, 297)
time.sleep(2)
mouse.click(Button.left,1)

#Close excel file
mouse.position=(1337, 11)
time.sleep(2)
mouse.click(Button.left,1)

#Save excel file
mouse.position=(597, 400)
time.sleep(2)
mouse.click(Button.left,1)

#wait till excel close
time.sleep(3)

print("Formula writing operation is starting..")
df = pd.read_excel('D:\Documents\Desktop\deneme/2020 Data_çalışma.xlsx', sheet_name='Sheet1')
df['TARİH'] = pd.to_datetime(df['TARİH'])
print("Formula is written..")


Output:
TARİH
28.02.2020  00:00:00
28.02.2020  00:00:00
28.02.2020  00:00:00
01.03.2020  00:00:00
01.03.2020  00:00:00
01.03.2020  00:00:00
.
.
piseynir
  • 237
  • 1
  • 4
  • 14