0

I am aware there are multiple answers for string/object to date time conversion. I have tried most of them, but still not able to get the desired result.

I have date in format 2024-08-01 00:00:00.0000000 and I want only the date part 2024-08-01 format.

My dataframe is in format: Date is of type object

Date 
2024-08-01 00:00:00.0000000 
2024-09-01 00:00:00.0000000

Using the answers provided in stackoverflow, I performed:

from dateutil.parser import parse
df['DATE'] = df['DATE'].apply(lambda x : parse(x)) #This will give in format 2024-08-01 00:00:00 of dtype datetime.

Then I use strftime to convert it into %Y-%m-%d format.

def tz_datetime(date_tz):
    date_obj = dt.datetime.strftime(date_tz, '%Y-%m-%d') 
    return date_obj
df['DATE'] = df['DATE'].apply(tz_datetime)

My DATE column is of object dtype now. df.head() gives me:

DATE
2024-08-01
2024-09-01

Then I use pd.to_datetime to convert it into datetime object.

df['DATE'] = pd.to_datetime(df['DATE'], format="%Y-%m-%d")

I also use floor() option to get the date in required date format.

df['DATE'] = df['DATE'].dt.floor('d')

Now the DATE dtype in datetime64[ns]. and df.head() also shows the required format.

But, when I export it to csv format or I display(df) again my DATE column shows different format.

DATE
2024-10-01T00:00:00.000+0000
2024-08-01T00:00:00.000+0000
2024-07-01T00:00:00.000+0000
2024-06-01T00:00:00.000+0000
2017-10-01T00:00:00.000+0000

I have exhausted all option to get this date in "%Y-%m-%d" format. Can someone please explain me what am I doing wrong, why this date format shows correct when I do .head() and shows different/actual value of DATE column when I display() the dataframe.

AMIT BISHT
  • 49
  • 1
  • 9
  • Please clarify your question. Specifi8cally, what is the format of the data in your original dataframe and what is the format of the csv output you desire. For example, to me it looks like the original dataframe contains a datetimest5amp value, but I am not sure. I also think, based on your question, you desire an output in the form of a string containing a date in the format of 'YYYY-mm-dd' is this correct? – itprorh66 Dec 07 '21 at 14:55
  • @itprorh66, I have edited the question. I want to fetch the date part of `2024-08-01 00:00:00.0000000` like this `2024-08-01` in date format. – AMIT BISHT Dec 07 '21 at 15:21
  • convert the date to a string and just do `myData.split(" ")[0]` –  Dec 07 '21 at 15:22
  • pandas datetime does not handle date and time separately (as you could do in native Python). so if you parse the strings to datetime data type, you will have date ***and*** time. If you don't care about the data type (just need it for display or something), do not convert to datetime and split off the time part. – FObersteiner Dec 07 '21 at 17:06
  • @MrFuppes, I need it in date format for my analysis further like missing value imputation and outlier treatment. What confuses me more is why my date is getting converted from `2024-08-01 00:00:00.0000000` to `2024-08-01T00:00:00.000+0000`? Why this timezone and UTC offset occur after all this transformation? – AMIT BISHT Dec 08 '21 at 06:01
  • 1
    To be precise, "date" is not a format. Do you mean you need a pd.Series of Python date objects? Regarding UTC offsets, that's certainly not reproducible with your example. Must be specified in your real data. Also, make sure not to mix up things. Solely use `pd.to_datetime` to parse string to datetime, then `.dt.strftime` on a datetime Series if you need string dtype again. – FObersteiner Dec 08 '21 at 06:21
  • @MrFuppes, I want in Datetime format. Thanks for clarifying that its the datetime format and not the date format only, even If I format it to 'YMD' format. – AMIT BISHT Dec 16 '21 at 06:34

1 Answers1

1

If you are not already using it the date_format parameter might be what you're missing.

df.to_csv(filename, date_format='%Y-%m-%d')

As you have demonstrated above, truncating the data to just 'Y-m-d' still gives the longer format 'Y-m-d H:M:S.f' (and an extra zero) when exporting because the date type hasn't changed. This means the output format for the date type hasn't changed either.

Also a side note: You want to avoid overwriting the data with strftime() just to get a different format as you may need the lost data in later analysis.

moonman4
  • 308
  • 3
  • 12
  • I don't think that is the problem here. The OP succeeds to parse string to datetime just fine. – FObersteiner Dec 07 '21 at 17:07
  • Correct, parsing is not the issue. Overwriting the date column with date data will have no effect on the output format since the column is still a date. OP needs to specify an output format to see what they want. Saving only the Y-m-d of the date does nothing other than destroy data which is likely not a good idea if that is not the explicit intent – moonman4 Dec 07 '21 at 17:19
  • @moonman4, Thanks I think that's what I was missing. Just one more question, Is it possible to convert the date object to float? Do I have to calculate it manually from 1 January 1970? – AMIT BISHT Dec 16 '21 at 06:18
  • @AMITBISHT Not sure, but this question may have something that may help: https://stackoverflow.com/questions/24588437/convert-date-to-float-for-linear-regression-on-pandas-data-frame – moonman4 Dec 21 '21 at 16:30