0

I want to generate time/date format strings from the input data I got. Is there an easy way to do this?

My input data looks like this:

'01.12.2016 23:30:59,123'

So my code should generate the following format string:

'%d.%m.%Y %H:%M:%S,%f'

Background:

I used pandas.to_datetime() to generate datetime object for further processing. This works great but this function gets slow (uses dateutil.parser.parse here) with a lot of data (>~50k). At the moment I'm providing the format string above hardcoded within my code to speed up to_datetime() which also works great. Now I wanted to generate the format string within code to be more flexible regaring the input data.

edit (because the first two answers do not fit to my question):

I want to generate the format string not the datetime string.

edit2:

New approch to formulate the question: I'm reading in a file with a lot of data. Every line of data has got a timestamp with the following format: '01.12.2016 23:30:59,123'. I want to convert these timestamps into datetime objects. For this I'm using pandas.to_datetime() at the moment. This function works perfectly but it get slow since I got some files with over 50k datasets. To speed this process up I'm passing a format string within the function pandas.to_datetime(format='%d.%m.%Y %H:%M:%S,%f'). This speeds up the process but it is less flexible. Therefore I want to evaluate the format string only for the first dataset and use it for the rest of the 50k or more datasets.

How is this possible?

Community
  • 1
  • 1
Burner
  • 71
  • 1
  • 12
  • so you a re going to reinvent format guessing, which is already implemented in `pandas.to_datetime()`? ;) Do you know beforehand what formats are you going to have? – MaxU - stand with Ukraine Jul 08 '16 at 11:38
  • how do you know whether `01.12.2016` is `1 Dec` or `12 Jan`? – MaxU - stand with Ukraine Jul 08 '16 at 12:01
  • @MaxU: No I don't want to reinvent it because of that I'm asking. At the moment I know the format that's why I hard coded the format string into my code. But I want to make it more flexible and keep it fast. Maybe you should read the question... – Burner Jul 08 '16 at 12:08
  • Wow getting negative votes because people are not able to read my question or tell me what exactly is unclear with my question. That's great... – Burner Jul 08 '16 at 12:47
  • 1
    You should consider restating your question. It is apparent many people are confused by what you are asking. It reads as if you want to format the datetime string, despite your edit. Instead of refuting what others are saying to try and **help you**, maybe you should take a different approach. Just my $0.02 – jape Jul 08 '16 at 12:47
  • @jape: I understand that they don't get my question but I don't know how to make it more clear. I got a datetime string which I want to evaluate and get the corresponding format string for it. For me this problem is very simple to understand but without a feedback (I wrote comments to every answer) what's the problem with my question I'm not able to give additional information. Maybe there is a problem with my english since I'm no native speaker. – Burner Jul 08 '16 at 12:53
  • possible duplicate of http://stackoverflow.com/questions/9507648/datetime-from-string-in-python-best-guessing-string-format – hansaplast Jul 08 '16 at 13:14
  • @Philipp Keller: Not a duplicate since I'm already using the solution posted in the thread you posted. The problem is this gets very slow with lots of input data. As I wrote in my question... That's why I want the format string and not the datetime object. – Burner Jul 08 '16 at 13:18
  • @Burner, try to make use of `pd.tseries.tools._guess_datetime_format_for_array()` and `pd.tseries.tools._guess_datetime_format()` functions... – MaxU - stand with Ukraine Jul 08 '16 at 13:23
  • 1
    @Burner, i also didn't get from your question what is wrong with `infer_datetime_format=True` - it should do exactly the same what you are going to "re-invent", IMO – MaxU - stand with Ukraine Jul 08 '16 at 13:31
  • @MaxU: There is nothing wrong with it, it's the answer. I just didn't see it. Thank's to you in the end. Just write a answer and I will upvote and accept it. As I stated in my first comment, I don't want to reivent it so i asked and was right there is already an implementation for this more or less. It does not provide a format string but solves my problem in the first place. – Burner Jul 08 '16 at 13:38
  • @MaxU: Just tried it using infer_datetime_format=True. But in fact it did not speed up the process. With infer_datetime_format=True my code takes ~16.5s, with infer_datetime_format=False it takes ~16s, with a fixed format string it takes: ~2s – Burner Jul 08 '16 at 13:44
  • @Burner, the funny thing is that you've posted a [link](http://stackoverflow.com/a/32034914/5741205) to the solution you were looking for - just read it carefully ;) – MaxU - stand with Ukraine Jul 08 '16 at 13:45
  • Just saw the problem: infer_datetime_format=True does not work with timestamps using fractions of seconds. So my problem is still there. – Burner Jul 08 '16 at 13:47

4 Answers4

1

you can try to use infer_datetime_format parameter, but be aware - pd.to_datetime() will use dayfirst=False per default

Demo:

In [422]: s
Out[422]:
0    01.12.2016 23:30:59,123
1    23.12.2016 03:30:59,123
2    31.12.2016 13:30:59,123
dtype: object

In [423]: pd.to_datetime(s, infer_datetime_format=True)
Out[423]:
0   2016-01-12 23:30:59.123
1   2016-12-23 03:30:59.123
2   2016-12-31 13:30:59.123
dtype: datetime64[ns]

In [424]: pd.to_datetime(s, infer_datetime_format=True, dayfirst=True)
Out[424]:
0   2016-12-01 23:30:59.123
1   2016-12-23 03:30:59.123
2   2016-12-31 13:30:59.123
dtype: datetime64[ns]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • It's right this would solve my problem but I just tried it and it seems infer_datetime_format=True does not work with datetime strings with fractions of seconds :-(. – Burner Jul 08 '16 at 13:50
  • @Burner, well, you may try to dig into `to_datetime()` implementation - maybe it'll help you to figure out how to get format – MaxU - stand with Ukraine Jul 08 '16 at 13:54
  • Yes it seems there is no out of the box solution when using fractions of seconds. So my question is not that basic ;-). I think I will have a closer look because somehow to_datetime() is able to get it right with fractions. – Burner Jul 08 '16 at 13:57
  • @Burner, post an answer to your own question if you'll come with a working solution - it might help others in future... – MaxU - stand with Ukraine Jul 08 '16 at 13:59
  • I will! But I'm not sure if I succeed since I'm new to Python (~3 weeks) – Burner Jul 08 '16 at 14:00
0

use "datatime" to return the data and time. I this this will help you.

import datetime
print datetime.datetime.now().strftime('%d.%m.%Y %H:%M:%S,%f')
Subin Gopi
  • 541
  • 2
  • 12
0

You can use datetime.strptime() inside datetime package which would return a datetime.datetime object.

In your case you should do something like:

datetime.strptime('01.12.2016 23:30:59,123', '%d.%m.%Y %H:%M:%S,%f').

After you have the datetime.datetime object, you can use datetime.strftime() function to get the datetime in the desired string format.

shiva
  • 2,535
  • 2
  • 18
  • 32
  • I think you did not understand the thing I want to do. I want to generate the format string not the datetime string. I want to input '01.12.2016 23:30:59,123' and get '%d.%m.%Y %H:%M:%S,%f'. – Burner Jul 08 '16 at 12:34
  • So is there any particular pattern of string? Like time will be separated by ':' and date by '.'? – shiva Jul 08 '16 at 12:56
  • There are a lot of patterns since around the world the time and date is formatted differently. Date could be separated by '.' or '/'. Day and month could be switched. Month given by name or number. The year could be stated first. And so on. All this work is done by pandas.to_datetime() already. I only want to get the format string of it and not the datetime object. – Burner Jul 08 '16 at 13:05
0

You should probably have a look here: https://github.com/humangeo/DateSense/

From its documentation:

>>> import DateSense
>>> print DateSense.detect_format( ["15 Dec 2014", "9 Jan 2015"] )
%d %b %Y
norok2
  • 25,683
  • 4
  • 73
  • 99
  • DateSense.detect_format( ["01.01.02 15:30:59.123123"] ) -> %m.%d.%y %H:%M:%S.123123 DateSense.detect_format( ["01.01.02 15:30:59,1"] ) -> %m.%d.%y %H:%M:%S,%w DateSense.detect_format( ["01.01.02 15:30:59,123"] ) -> %m.%d.%y %H:%M:%S,%Y -> Not even close. – Burner Aug 08 '16 at 08:26
  • maybe you should try with more dataset? btw it looks pretty close to me, except for whatever is the last part.. to be honest, even as a human I do not fully understand what is to be expected.. – norok2 Aug 08 '16 at 09:11
  • but if it is the microsecond parameter, it looks to me the only thing is miscalculating. Probably your best shot is then to adapt that code to your needs. – norok2 Aug 08 '16 at 09:33