I am using the Azure Python SDK to read a tabular dataset from a Blob Store as follows:
df = Dataset.Tabular.from_delimited_files(path=[DataPath(ds, blobstore_dir + 'tabular_data.csv')],
separator=',', header=True)
The data has four datetime columns, one of the columns reads in with no problem because there are instances where the month-day order is not ambiguous, but the other three are being inferred incorrectly as "month-day" instead of "day-month".
When reading in the data I get the following warning:
UserWarning: Ambiguous datetime formats inferred for columns ['Period Start', 'Period End', 'Extracted At'] are resolved as "month-day". Desired format can be specified by
set_column_types
.
I have attempted to set the column types as below, and have tried a few different formats but all I end up with is NULL in place of all the values.
df = Dataset.Tabular.from_delimited_files(
path=[DataPath(ds, blobstore_dir + 'tabular_data.csv')], separator=',', header=True,
set_column_types={'Period Start': DataType.to_datetime("%d-%m-%Y %H:%M:%S"),
'Period End': DataType.to_datetime("%d-%m-%Y %H:%M:%S"),
'Extracted At': DataType.to_datetime("%d-%m-%Y %H:%M:%S")})
The documentation for from_delimited_files()
is here
Can anyone tell me how to force from_delimited_files()
to resolve the ambiguous datetimes as day-month or tell me how to use set_column_types
correctly? I've worked around it temporarily by inserting a dummy row with a non-ambiguous datetime.