0

I want to get the exact date from now. I have an input dataframe like shown below. Age columns is difference of date.

And I want to get the shown output dataframe. It is the date calculated as date = current date - difference(AGE data).

Is there any way available?

Input:

          AGE
0 '1 years 2 days ago'
1 '3 hours 4 mins ago'
2 '5 mins 6 secs ago'
     ...

Output:

        DATE
0 2020-07-29 20:00
1 2021-07-31 16:57
2 2021-07-31 19:55
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
marsss
  • 51
  • 1
  • 5
  • 2
    Try converting your input to [timedelta](https://docs.python.org/3/library/datetime.html#timedelta-objects) instance then subtract timedelta from datetime.now() object. – CodeMonkey Jul 31 '21 at 12:03

4 Answers4

2

Since your string are quite simple, you could use dateparser instead of implementing your own parser. It will convert your strings to datetime objects automagically.

You can then use strftime to print the datetime object in your format of choice.

For example:

import dateparser
timestrings = ['1 years 2 days ago' ,'3 hours 4 mins ago','5 mins 6 secs ago']

for timestring in timestrings:
    dt = dateparser.parse(timestring)
    print(dt.strftime("%Y-%m-%d %H:%M"))

Output:

2020-07-29 22:33
2021-07-31 19:29
2021-07-31 22:28
Fausto
  • 21
  • 3
  • That's a clever solution! But it has a problem: If you set `df = pd.DataFrame({'AGE': ["59 secs ago"] * 100})` and then look at `df.AGE.map(dateparser.parse)` you will notice that you get slightly different dates. That's due to the shifting base date (time in this case). I've skimmed over the dateparser docs and couldn't find a way to configure the base. Do you know a fix? (Maybe it doesn't matter in the use case the op is interested in.) – Timus Jul 31 '21 at 23:21
  • @Timus I think it's a feature: if you are dealing with relative time, usually it's relative to now. Anyway in the [docs](https://dateparser.readthedocs.io/en/latest/dateparser.html#dateparser.conf.Settings) there is a flag to do that. Just pass a datetime object to use as base time and the problem is solved. Try `df.AGE.apply(dateparser.parse, settings={"RELATIVE_BASE" : reference})` where reference is a datetime object. – Fausto Aug 01 '21 at 15:12
  • I get that it's mostly a feature, but perhaps not in this case :)) (`dateparser` is rather slow - understandably, because it checks for a lot more possibilities than the ones here -, so depending on the size of the data this could indeed be an issue.) Good to know that the behaviour can be modified. – Timus Aug 03 '21 at 11:13
1

The simplest way I can think of would be to subtract your time from the current time using time delta (as described here).

This is a simple example:

from datetime import timedelta, datetime
t1 = timedelta(days = 1, hours=7, minutes=36, seconds=20)
print(datetime.now()-t1)

Unfortunately, things get more complicated when you add years because of things like leap years.

This computerphile video by Tom Scott explains in detail the problems you may encounter when working with time.

This stack overflow question explains how this can still be achieved.

Geo
  • 543
  • 5
  • 16
1

EDIT: Another version that includes months:

import re
import pandas as pd

units = ("sec", "min", "hour", "day", "week", "month", "year")
re_offset = re.compile("|".join(r"\d+\s*" + f"{unit}s?" for unit in units))
re_num_unit = re.compile(r"(\d+)\s*(\w+)")

def offset(s):
    keys = {
        "sec": "seconds", "secs": "seconds",
        "min": "minutes", "mins": "minutes",
        "hour": "hours", "day": "days", "week": "weeks",
        "month": "months", "year": "years"
    }

    values = {}
    for num_unit in re_offset.findall(s):
        num_str, unit = re_num_unit.match(num_unit).groups()
        unit = keys.get(unit, unit)
        values[unit] = int(num_str)

    return pd.DateOffset(**values)

df["DATE"] = (dt.datetime.now() - df.AGE.apply(offset)).dt.strftime("%Y-%m-%d %H:%M")

or shorter in case the strings behave really well:

def offset(lst):
    keys = {
        "sec": "seconds", "secs": "seconds",
        "min": "minutes", "mins": "minutes",
        "hour": "hours", "day": "days", "week": "weeks",
        "month": "months", "year": "years"
    }
    return pd.DateOffset(
        **{keys.get(unit, unit): int(num_str) for num_str, unit in lst}
    )

df["DATE"] = (dt.datetime.now()
              - df.AGE.str.findall(r"(\d+)\s*(\w+)")
                      .map(offset)).dt.strftime("%Y-%m-%d %H:%M")

Old version:

import re
import datetime as dt

units = ("sec", "min", "hour", "day", "week", "year")
re_timedelta = re.compile("|".join(r"\d+\s*" + f"{unit}s?" for unit in units))
re_num_unit = re.compile(r"(\d+)\s*(\w+)")

def ago(s, now):
    keys = {
        "sec": "seconds", "secs": "seconds",
        "min": "minutes", "mins": "minutes",
        "hour": "hours", "day": "days", "week": "weeks", "year": "years"
    }
    values = {"seconds": 0, "minutes": 0, "hours": 0, "days": 0, "weeks": 0}
    
    for num_unit in re_timedelta.findall(s):
        num_str, unit = re_num_unit.match(num_unit).groups()
        unit = keys.get(unit, unit)
        if unit == "years":
            years = int(num_str)
            if now.month == 2 and now.day == 29 and years%4:
                ago_date = dt.date(now.year - years, 3, 1)
            else:
                ago_date = dt.date(now.year - years, now.month, now.day)
            values["days"] += (now.date() - ago_date).days
        else:
            values[unit] += int(num_str)
    
    return (now - dt.timedelta(**values)).strftime("%Y-%m-%d %H:%M")

With that preparation you can do (df your dataframe):

now = dt.datetime.now()
df["DATE"] = df.AGE.apply(ago, args=[now])

Result:

                  AGE              DATE
0  1 years 2 days ago  2020-07-29 17:51
1  3 hours 4 mins ago  2021-07-31 14:47
2   5 mins 6 secs ago  2021-07-31 17:46
Timus
  • 10,974
  • 5
  • 14
  • 28
0

It is possible to do something like that, we can use the date module to get the exact date and time and compare it to the input. Here is the code:

from datetime import datetime

#Get the current year (I will make a simpler version, only with the year, it will be the same idea for days, months, etc.)
currentYear = datetime.now().strftime("%Y")

#Input the year to be substracted
secondYear = int(input("What is the year to be substracted?"))

#We will use the int() function on the currentYear variable because normally it is a string type var
#Substract the two years
print(int(currentYear) - secondYear)

I hope you understood what I was trying to say, I used a simpler example just to show you how it's done. You can use the same technique on the other dates.