0

I have a test CSV below:

COLUMN,VALUE,PRICE,PURCHASEDATE
Banana,Fruit,1,2020-20-02 12:00:00
Tomato,Fruit,2,2020-20-02 12:00:00
Apple,,3,2020-20-02 12:00:00
Cucumber,Vegetable,,2020-20-02 12:00:00
Cabbage, Vegetable,0,2020-20-02 12:00:00
Lettuce,,,2020-20-02 12:00:00

I am using parse_dates = True, but it is returning the DataFrame with an object column type:

df = pd.read_csv('fruit.csv', header=0, parse_dates=True, infer_datetime_format=True)

enter image description here

This didnt work either:

df = pd.read_csv('fruit.csv', header=0, parse_dates=["PURCHASEDATE"], infer_datetime_format=True)
MrRaghav
  • 335
  • 3
  • 11
smackenzie
  • 2,880
  • 7
  • 46
  • 99

1 Answers1

0

One approach is to import the data and change the date format in separate steps:

import pandas as pd

columns = ['COLUMN', 'VALUE', 'PRICE', 'PURCHASEDATE']

data = [
    ('Banana',   'Fruit',     1, '2020-20-02 12:00:00'),
    ('Tomato',   'Fruit',     2, '2020-20-02 12:00:00'),
    ('Apple',    '',          3, '2020-20-02 12:00:00'),
    ('Cucumber', 'Vegetable', 0, '2020-20-02 12:00:00'),
    ('Cabbage',  'Vegetable', 0, '2020-20-02 12:00:00'),
    ('Lettuce',  '',          0, '2020-20-02 12:00:00'),
]  # date format is year-day-month

df = pd.DataFrame(data=data, columns=columns)

Then convert purchase date to datetime format:

df['PURCHASEDATE'] = pd.to_datetime(df['PURCHASEDATE'], 
                                    format='%Y-%d-%m %H:%M:%S', 
                                    errors='coerce')
print(df)

     COLUMN      VALUE  PRICE        PURCHASEDATE
0    Banana      Fruit      1 2020-02-20 12:00:00
1    Tomato      Fruit      2 2020-02-20 12:00:00
2     Apple                 3 2020-02-20 12:00:00
3  Cucumber  Vegetable      0 2020-02-20 12:00:00
4   Cabbage  Vegetable      0 2020-02-20 12:00:00
5   Lettuce                 0 2020-02-20 12:00:00

In this example, the original date had YYYY-DD-MM format.

jsmart
  • 2,921
  • 1
  • 6
  • 13