1

I am using the Azure Python SDK to read a tabular dataset from a Blob Store as follows:

df = Dataset.Tabular.from_delimited_files(path=[DataPath(ds, blobstore_dir + 'tabular_data.csv')],
                                          separator=',', header=True)

The data has four datetime columns, one of the columns reads in with no problem because there are instances where the month-day order is not ambiguous, but the other three are being inferred incorrectly as "month-day" instead of "day-month".

When reading in the data I get the following warning:

UserWarning: Ambiguous datetime formats inferred for columns ['Period Start', 'Period End', 'Extracted At'] are resolved as "month-day". Desired format can be specified by set_column_types.

I have attempted to set the column types as below, and have tried a few different formats but all I end up with is NULL in place of all the values.

df = Dataset.Tabular.from_delimited_files(
        path=[DataPath(ds, blobstore_dir + 'tabular_data.csv')], separator=',', header=True,
        set_column_types={'Period Start': DataType.to_datetime("%d-%m-%Y %H:%M:%S"),
                          'Period End': DataType.to_datetime("%d-%m-%Y %H:%M:%S"),
                          'Extracted At': DataType.to_datetime("%d-%m-%Y %H:%M:%S")})

The documentation for from_delimited_files() is here

Can anyone tell me how to force from_delimited_files() to resolve the ambiguous datetimes as day-month or tell me how to use set_column_types correctly? I've worked around it temporarily by inserting a dummy row with a non-ambiguous datetime.

crumpet
  • 11
  • 3

1 Answers1

0

You can Import Pandas and use to date_parser and to_datetime with the format you can convert the date-time with the respective format.

date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

Reading from file

pd.read_csv('testresult.csv', parse_dates=['TIME'], 
    date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))

It also passes in the format to the date_parser parameter instead of having pandas attempt to guess it with the infer_datetime_format parameter.

Refer here

Delliganesh Sevanesan
  • 4,146
  • 1
  • 5
  • 15