0

When I read CSV/Excel files, the datetime column has an 'object' type. It would be easy to identify it if it was the only one, however I got few more object type columns -strings- that does not contain time.

I'm looking for a way to identify the time column without knowing its name nor its position. All I could do for the moment is something like this:

dfs = df.loc[:, df.dtypes == object]
#Or 
df = df.select_dtypes(include = ['object']) 

Example of csv file:

timestep,winddirection,windspeed,windspd_10m_gust,Hs,waveheight_max,waveperiod,peakperiod_1d,secondarywavedirection,windwaveheight,secondarywaveperiod,primarywavedirection,swellheigth,primarywaveperiod,temperature
7/15/2021 16:00,161,21.6609024,23.5205124,2.58,3,4.6,6.2,162,1.43,5.2,101,1.21,8.1,24.5
7/15/2021 17:00,165,21.466518,23.326128,2.56,3,4.6,6.2,164,1.41,5.2,100,1.2,8.1,24.7
7/15/2021 18:00,167,20.8833648,22.5485904,2.53,2.9,4.6,6.2,165,1.39,5.2,101,1.19,8.1,24.6
7/15/2021 19:00,167,20.494596,21.9654372,2.5,2.9,4.6,6.2,167,1.34,5.1,103,1.21,8,24.7
7/15/2021 20:00,167,20.6889804,22.354206,2.48,2.8,4.6,6.2,166,1.33,5.1,100,1.19,8,24.8
7/15/2021 21:00,168,21.0777492,22.7429748,2.47,2.8,4.6,6.2,167,1.32,5.1,102,1.17,8.1,24.9
7/15/2021 22:00,168,21.2721336,23.1317436,2.46,2.8,4.6,6.2,167,1.34,5.1,99,1.14,8.2,25
7/15/2021 23:00,166,21.466518,23.326128,2.46,2.8,4.5,6.2,165,1.36,5.1,98,1.12,8.3,24.7
7/16/2021 0:00,169,21.0777492,22.7429748,2.46,2.8,4.5,6.1,167,1.35,5.1,97,1.13,8.3,24.2
7/16/2021 1:00,176,22.0496712,24.1036656,2.48,2.8,4.5,6.1,174,1.36,5,98,1.14,8.3,23.9
7/16/2021 2:00,177,22.43844,24.4924344,2.54,2.9,4.5,6.1,174,1.45,5.1,96,1.12,8.4,24.3
7/16/2021 3:00,181,23.410362,25.6587408,2.62,3.1,4.5,6.2,176,1.56,5.2,96,1.12,8.5,24.3
7/16/2021 4:00,181,24.1878996,26.6306628,2.68,3.2,4.6,6.2,178,1.65,5.3,96,1.1,8.5,24.2
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Keyser Soze
  • 262
  • 3
  • 11
  • This may also help: [datetime dtypes in pandas read csv](https://stackoverflow.com/questions/21269399/datetime-dtypes-in-pandas-read-csv) – ThePyGuy Aug 31 '21 at 09:14
  • 1
    Do you have any information about the csv/excel file you read ? like the position of the datetime column ? the name of that column ? – Alexandre Léonard Aug 31 '21 at 09:18
  • @AlexandreLéonard Each time I get a different Data file, I don't know the name nor the position of the column. My program should be able to identify it automatically – Keyser Soze Aug 31 '21 at 09:19
  • 1
    Ok, then next question, does your datetime column always has the same format ? If yes, which format ? And could you share with us one excel/csv example file ? – Alexandre Léonard Aug 31 '21 at 09:24
  • @AlexandreLéonard Primarily, it's the following format: '%d-%m-%Y %H:%M:%S'. For the time being, I'll consider this as the only acceptable format – Keyser Soze Aug 31 '21 at 09:42
  • I added an example of csv file in edit -sorry didn't know how to add attachments- – Keyser Soze Aug 31 '21 at 09:42
  • This question is not a duplicate because here, we have no information regarding which column is the datetime. – Alexandre Léonard Aug 31 '21 at 10:26
  • 1
    The example csv and the format you provided are not compatible. However, after making them compatible, you could do something like this. Define a parse: `dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M')`. Then read the csv normally with `read_csv`. Then with the first row, loop over the column and store the name of the columns for which you are able to apply the datetime parse:`>>> is_datetime = [] >>> for column in df: ... try: ... result = dateparse(df[column][0]) ... is_datetime.append(column) ... except: ... pass ... ` – Alexandre Léonard Aug 31 '21 at 10:28
  • Nicely done @Alex ! Even tough you have to adapt to each time format, it is still a decent solution. If only I could reward you with bounty – Keyser Soze Aug 31 '21 at 12:30

1 Answers1

1
  • while reading the csv file set infer_datetime_format to True.

  • Or use df['column'] = pd.datetime(df['column']) on a particular column

Install visions, imagehash packages

  • pip install visions
  • pip install imagehash

Use the following code

import pandas as pd
import visions
from visions.functional import cast_to_inferred, detect_type, infer_type
from visions.typesets import CompleteSet
df = pd.read_csv("file.csv")
typeset = CompleteSet()
cast_df = cast_to_inferred(df, typeset)
print(infer_type(df, typeset))

Check this Link for example

theunknownSAI
  • 300
  • 1
  • 4