0

I have date time string t1

'Sat 02 May 2015 19:54:36 +0530'

I want to extract the remove the first and last word, i.e. Sat and +0530. Here is the behavior of the three regex I wrote:

(1) re.search(r'(\d{2})([^:]+)([:\d{2}]+)',t1) matches '02 May 2015 19:54:36'
(2) re.search(r'(\d{2})([^:]+)([:\d{2}]{2})',t1) matches '02 May 2015 19:5'
(3) re.search(r'(\d{2})(.+)([\:\d{2}])',t1) matches '02 May 2015 19:54:36 +0530'

Can someone explain what's the problem with number 2 and number 3? I thought all of these should yield the same result.

Rahul
  • 2,658
  • 12
  • 28
Sumit
  • 2,242
  • 4
  • 25
  • 43
  • Why not just parse it to a Date object and then format the Date object? – J.N. May 15 '17 at 02:54
  • @J.N. Could you please show me an example? I am new to Python. Thank you. – Sumit May 15 '17 at 02:55
  • 1
    Assuming that you're using Python 2: Here is how to parse the strings as DateTime objects: http://stackoverflow.com/questions/466345/converting-string-into-datetime And here is how to format the DateTime objects: https://docs.python.org/2/library/datetime.html Parsing the date and times represented to DateTime objects is way easier than creating relatively fragile regular expressions. – J.N. May 15 '17 at 02:57
  • @J.N. Thank you. – Sumit May 15 '17 at 02:59

2 Answers2

2

Can someone explain what's the problem with number 2 and number 3?

The problem in your regex (\d{2})([^:]+)([:\d{2}]{2}) you are using character class in third group i.e ([:\d{2}]{2}) which means it will match either of these characters :, digits, { ,2, } twice. Hence it matches :5 and stops. Same is with third one.

Your first regex (\d{2})([^:]+)([:\d{2}]+) because you have used + (more than one) quantifier which consumes :54:36 since they are in character class [:\d{2}].

Removing the character class your second regex will be (\d{2})([^:]+)(:\d{2}){2} which will work just fine.

Regex101 Demo

Rahul
  • 2,658
  • 12
  • 28
1

The title of your question relates to regex, but it seems that your question is really about how to remove the first and last word from a date string. In your case, I personally would not use regex. Instead you could simply split the string on spaces, and join the resultant list, leaving out the first and last element:

In [1]: s = 'Sat 02 May 2015 19:54:36 +0530'

In [2]: ' '.join(s.split(' ')[1:-1])
Out[2]: '02 May 2015 19:54:36'

[1:-1] will give you all elements of a sequence (in this case a list of strings created by split()) from the second element, up to (but not including) the last element.

Regex is not the "wrong" way to solve your problem, and mine is not "right". However, I find that, where applicable, string methods are often better suited for this kind of job, are easier to read, and are less error-prone. That has been my experience at least.

elethan
  • 16,408
  • 8
  • 64
  • 87
  • Thank you. I was just trying to use regex for this problem. – Sumit May 15 '17 at 03:09
  • @Sumit got it! I just wanted to put this out there just in case it was useful for you. I remember when I first learned regex, I was in awe of their power and wanted to use them for everything. In my case it took me a while before I even knew that there were other ways to do things, haha! – elethan May 15 '17 at 03:13
  • That's funny. Agree, regex is super powerful. – Sumit May 15 '17 at 04:00