0

I have CSV file which contains various timezone dates, but before feeding those data to tests, I want to replace all the dates with unify value.

date column contains values like below, 2019-01-01 00:00:00+05:30 2018-12-31 18:30:00+00 2018-02-02 00:00:00-04:00

I want replace them like

2019-01-01 00:00:00+00 2018-12-31 00:00:00+00 2018-02-02 00:00:00+00

How do I write Regex to cover all possible timezones?

I wrote:

([0-9]){4}(-:?)([0-9]){2}(-:?)([0-9]){2} ([0-9]){2}:([0-9]){2}:([0-9]){2}(+-?)([0-9]){2}:([0-9]){2}

but it fails when it encounter 2018-12-31 18:30:00+00, How can I handle this case?

Murtuza Z
  • 5,639
  • 1
  • 28
  • 52
  • Just trying to understand your question here - do you want to convert all timezones to UTC? or completely remove the timezones? – verisimilitude Aug 27 '19 at 05:43
  • @verisimilitude Make them UTC – Murtuza Z Aug 27 '19 at 05:45
  • 3
    I don't understand the connection between the expected output and the input, but in any case, regex alone is not the right tool to be doing this. You need a date/time API of some sort. – Tim Biegeleisen Aug 27 '19 at 05:47
  • @TimBiegeleisen why is an API needed? – T.Woody Aug 27 '19 at 05:50
  • @TimBiegeleisen I wan to feed those unify data to tests, so that I don't have to bother about different timezone which is any ways handled by Frontend, this is Backend side tests, to make tests simpler I want to unify them make them UTC using RegEx. – Murtuza Z Aug 27 '19 at 05:55

3 Answers3

1

Tim Biegeleisen is very right, you should not be using regex for this, you should use a datetime API provided by Python. I have sourced my answer from an excellent post on this by jfs here

The below is for Python 3.3+ (since you have tagged your question with Python 3.0

time_string = "2019-01-01 00:00:00+05:30"
# Parses a datetime instance from a string 
dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
# Changes the timezone to UTC by setting tzinfo
timestamp = dt.replace(tzinfo=datetime.timezone.utc).timestamp()
# Converts back to a datetime object
dt = datetime.datetime.fromtimestamp(timestamp)
# Formats and prints it out.
print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))

For Python versions < 3.3, for an aware datetime

    time_string = "2019-01-01 00:00:00+05:30"
    # Parses a datetime instance from a string 
    dt = datetime.datetime.strptime(time_string,'%Y-%m-%d %H:%M:%S%z')
    # Changes the timezone to UTC by setting tzinfo
    timestamp = (dt - datetime(1970,1,1, tzinfo=timezone.utc)) / timedelta(seconds=1)
    # Converts back to a datetime object
    dt = datetime.datetime.fromtimestamp(timestamp)
    # Formats and prints it out.
    print(dt.strftime('%Y-%m-%d %H:%M:%S %Z'))

Terminology

An aware object is used to represent a specific moment in time that is not open to interpretation

For our case, timezone information is known.

verisimilitude
  • 5,077
  • 3
  • 30
  • 35
0

You don't need to use regex for this as it seems to be straight forward. You can use the below snippet

ts = ["2019-01-01 00:00:00+05:30", "2018-12-31 18:30:00+00", "2018-02-02 00:00:00-04:00"]
l = [x.split()[0] + " 00:00:00+00" for x in ts]

OR

l = [x[:11] + "00:00:00+00" for x in ts]
0

The best way to solve this problem is using **python datetime **(strp and strf)

If you want to solve it using regex then as per python doc https://docs.python.org/2/library/re.html you can do something like this

def dashrepl(matchobj):
  return "{0} 00:00:00+00".format(matchobj.group(1))

import re
k="(\d{4}(-\d{2}){2})\s(\d{2}:?){3}.[\d:]+"
ab = re.sub(k, dashrepl, "2019-01-01 00:00:00+05:30")
Somil
  • 1,921
  • 1
  • 21
  • 35