Python: converting timestamp to date time not working

Question

I am requesting data from the api.etherscan.io website. For this, I require a free API key. I am getting information for the following wallet addresses 0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5, 0xc508dbe4866528db024fb126e0eb97595668c288. Below is the code I am using:

wallet_addresses = ['0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5', '0xc508dbe4866528db024fb126e0eb97595668c288']

page_number = 0
df_main = pd.DataFrame()
while True:
    for address in wallet_addresses:
        url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
        output = requests.get(url).text
        df_temp = pd.DataFrame(json.loads(output)['result'])
        df_temp['wallet_address'] = address
        df_main = df_main.append(df_temp)
        page_number += 1
        df_main['timeStamp'] = pd.to_datetime(df_main['timeStamp'], unit='s')
        if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime.date(2022, 1, 1):
            pass

Note that you need your own (free) ether_api.

What I want to do is get data from today's date, all the way back to 2022-01-01 which is what I am trying to achieve in the if statement.

However, the above gives me an error: ValueError: unit='s' not valid with non-numerical val='2022-09-19 18:14:47'

How can this be done? I've tried multiple methods to get pandas datetime to work, but all of them gave me errors.

If it is already a timestamp, then maybe a conversion is not needed and you can use the object directly to find the date? — Amadeus, Dec 02 '22 at 15:02
Yes, should be fine. I've already registered to get api, will take a look — DiMithras, Dec 02 '22 at 15:03
Also [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.date.html#pandas.Timestamp.date). you can extract the date by doing a `df_main['timeStamp'].date` — Amadeus, Dec 02 '22 at 15:05
@LucaZangari I've tried that but gives an error `AttributeError: 'Series' object has no attribute 'date'` — MathMan 99, Dec 02 '22 at 15:08
Have you considered using `datetime` lib? Apply `strptime(timestamp, '%Y-%m-%d %H:%M:%S')` to all values in the column to parse the timestamps — lotrus28, Dec 02 '22 at 15:13
just use `pd.to_datetime(df_main['timeStamp'])`. unit='s' is used if your input is Unix time as seconds since the epoch. If your input already is converted to datetime, remove the call to pd.to_datetime. — FObersteiner, Dec 02 '22 at 15:14
Yes, it is reproducible, I found the error. In a `for` loop on first iteration there is `1663611287` which is super fine to convert. But on next iteration there is `2022-09-19 18:14:47`, a date that does not need conversion. You need to supply your code with `if` statement checking for type of values you get in `df_main['timeStamp']` on each iteration. — DiMithras, Dec 02 '22 at 15:30
@MathMan99 actually, the api is fine, it only provides timestamps. What's happening here, you have `df_main = df_main.append(df_temp)`, that means you mix `timestamps` with casted `datetime`. that results in an error. You need to rework the logic on appending, so that when you cast `timestamps` to date, you do it only on `df_temp` and only than add it to `df_main` — DiMithras, Dec 02 '22 at 15:38

DiMithras · Accepted Answer · 2022-12-03T13:41:42.997

Here you go, it's working without an error:

page_number = 0
df_main = pd.DataFrame()
while True:
    for address in wallet_addresses:
        url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
        output = requests.get(url).text
        df_temp = pd.DataFrame(json.loads(output)['result'])
        df_temp['wallet_address'] = address
        page_number += 1
        df_temp['timeStamp'] = pd.to_datetime(df_temp['timeStamp'], unit='s')
        df_main = df_main.append(df_temp)
        if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime(2022, 1, 1).date():
            pass

Wrong append

So, what has happened here. As suggested in the first comment under question we acknowledged the type of first record in df_main with type(df_main['timeStamp'].iloc[0]). With IPython and Jupyter-Notebook one can look what is happening with df_main just after receiving an error with it being populated on the last for loop iteration that failed.
Otherwise if one uses PyCharm or any other IDE with a possibility to debug, the contents of df_main can be revealed via debug.

What we were missing, is that df_main = df_main.append(df_temp) is placed in a slightly wrong place. On first iteration it works well, pd.to_datetime(df_main['timeStamp'], unit='s') gets an str type with Linux epoch and gets converted to pandas._libs.tslibs.timestamps.Timestamp.

But on next iteration df_main['timeStamp'] already has the Timestamp type and it gets appended with str type, so we get a column with mixed type. E.g.:

type(df_main['timeStamp'].iloc[0]) == type(df_main['timeStamp'].iloc[-1])

This results with False. Hence when trying to convert Timestamp to Timestamp one gets an error featured in question.

To mitigate this we can place .append() below the conversion and do this conversion on df_temp instead of df_main, this way we will only append Timestamps to the resulting DataFrame and the code below with if clause will work fine.

As a side note

Another small change I've made was datetime.date(2022, 1, 1). This change was not needed, but the way one works with datetime depends on how this library was imported, so it's worth mentioning:

import datetime
datetime.date(2022, 1, 1)
datetime.datetime(2022, 1, 1).date()

from datetime import datetime
datetime(2022, 1, 1).date()

All the above is legit and will produce the same. On the first import module gets imported, on the second one type gets imported.

Alternative solution

Conversion to Timestamp takes time. If the API provides Linux epoch dates, why not use this date for comparison? Let's add this somewhere where you define wallet_addresses:

reference_date = "01/01/2021"
reference_date = int(time.mktime(datetime.datetime.strptime(reference_date, "%d/%m/%Y").timetuple()))

This will result in 1609448400. Other stack overflow question as reference.

This integer can now be compared with timestamps provided by the API. The only thing left is to cast str to int. We can have your code left intact with some minor changes at the end:

<< Your code without changes >>
        df_main['timeStamp'] = df_main['timeStamp'].astype(int)
        if min(df_main['timeStamp']) < reference_date:
            pass

To make a benchmark I've changed while True: to for _ in range(0,4): to limit the infinite cycle, results are as follows:

Initial solution took 11.6 s to complete
Alternative solution took 8.85 s to complete

It's 30% faster. Casting str to int takes less time than conversion to TimeStamps, I would call this a preferable solution.

Future warning

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

It makes sense to comply with this warning. df_main = df_main.append(df_temp) has to be changed to df_main = pd.concat([df_main, df_temp]).

As for current 1.5.0 version it's already deprecated. Time to upgrade!

The error is now coming from `datetime(2022,1,1)` saying `TypeError: module is not callable`. But the other error is solved. Did it work on your end? — MathMan 99, Dec 02 '22 at 15:51
Yeah, I changed it a bit as I was getting the error there . Change `datetime(2022, 1, 1).date()` back to `datetime.date(2022, 1, 1)` if it's working for you. Might be a different python / library version or the way we import `datetime`. I do with `from datetime import datetime`. — DiMithras, Dec 02 '22 at 15:55
I gave it a try. And, yes, it's a different way we import `datetime`. My bad, I should have mentioned it. — DiMithras, Dec 02 '22 at 15:58

Python: converting timestamp to date time not working

1 Answers1

Wrong append

As a side note

Alternative solution

Future warning