0

New to python,

Trying to sort a dataset ready for comparison using pandas however I need to edit certain aspects before a comparison is possible. There is an 'A' before each of my dates which needs to be removed. In addition to this, the format of the date itself is YYYYDDD which needs to be change to DD/MM/YYYY or my other dataset needs to be changed to YYYYDDD, whichever is easiest.

My attempt to remove the 'A' is as follows, I have no idea where to even begin in relation to modifying the date apart from perhaps using the library 'datetime'.

import pandas as pd
import datetime

csv = '/home/student/Desktop/Ben_Folder/AirQuality/Test/2002_DDV.csv'

df = pd.read_csv(csv)
test = df(columns='Date'[1:7])

test.to_csv('Test.csv', header=['Date', 'AOD'])

Example of dataset as follows:

       Date  AOT
0  A2002185  0.0
1  A2002185  0.0
2  A2002185  0.0
3  A2002185  0.0
4  A2002185  0.0
Ben_Wright
  • 19
  • 2

1 Answers1

0

Dates in Pandas are stored as integers. Anything else you see is just a string representation of those integers. Once you are aware of this, you will appreciate the benefit of converting to datetime objects.

Here you can use pd.to_datetime, which allows you to specify your format:

df['Date'] = pd.to_datetime(df['Date'], format='A%Y%j')

print(df)

        Date  AOT
0 2002-07-04  0.0
1 2002-07-04  0.0
2 2002-07-04  0.0
3 2002-07-04  0.0
4 2002-07-04  0.0

Python's strftime directives is a useful resource to building custom string formats.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • So if I wished to reformat to DDMMYYYY it would be %d%m%Y? How would I apply that? – Ben_Wright Sep 11 '18 at 11:50
  • @Ben_Wright, If you have a new question, please [ask it separately](https://stackoverflow.com/questions/ask). But please make sure it hasn't been asked before. – jpp Sep 11 '18 at 11:56