0

I am requesting data from the api.etherscan.io website. For this, I require a free API key. I am getting information for the following wallet addresses 0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5, 0xc508dbe4866528db024fb126e0eb97595668c288. Below is the code I am using:

wallet_addresses = ['0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5', '0xc508dbe4866528db024fb126e0eb97595668c288']

page_number = 0
df_main = pd.DataFrame()
while True:
    for address in wallet_addresses:
        url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
        output = requests.get(url).text
        df_temp = pd.DataFrame(json.loads(output)['result'])
        df_temp['wallet_address'] = address
        df_main = df_main.append(df_temp)
        page_number += 1
        df_main['timeStamp'] = pd.to_datetime(df_main['timeStamp'], unit='s')
        if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime.date(2022, 1, 1):
            pass

Note that you need your own (free) ether_api.

What I want to do is get data from today's date, all the way back to 2022-01-01 which is what I am trying to achieve in the if statement.

However, the above gives me an error: ValueError: unit='s' not valid with non-numerical val='2022-09-19 18:14:47'

How can this be done? I've tried multiple methods to get pandas datetime to work, but all of them gave me errors.

MathMan 99
  • 665
  • 1
  • 7
  • 19
  • What does `type(df_main['timeStamp'].iloc[0]` show? – DiMithras Dec 02 '22 at 14:56
  • It shows this: `pandas._libs.tslibs.timestamps.Timestamp` – MathMan 99 Dec 02 '22 at 15:00
  • If it is already a timestamp, then maybe a conversion is not needed and you can use the object directly to find the date? – Amadeus Dec 02 '22 at 15:02
  • Yes, should be fine. I've already registered to get api, will take a look – DiMithras Dec 02 '22 at 15:03
  • Also [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.date.html#pandas.Timestamp.date). you can extract the date by doing a `df_main['timeStamp'].date` – Amadeus Dec 02 '22 at 15:05
  • @LucaZangari I've tried that but gives an error `AttributeError: 'Series' object has no attribute 'date'` – MathMan 99 Dec 02 '22 at 15:08
  • Have you considered using `datetime` lib? Apply `strptime(timestamp, '%Y-%m-%d %H:%M:%S')` to all values in the column to parse the timestamps – lotrus28 Dec 02 '22 at 15:13
  • just use `pd.to_datetime(df_main['timeStamp'])`. unit='s' is used if your input is Unix time as seconds since the epoch. If your input already is converted to datetime, remove the call to pd.to_datetime. – FObersteiner Dec 02 '22 at 15:14
  • @lotrus28 that did not work either for me – MathMan 99 Dec 02 '22 at 15:17
  • @FObersteiner that also did not work – MathMan 99 Dec 02 '22 at 15:17
  • please verify that your example is reproducible -> [mre] – FObersteiner Dec 02 '22 at 15:19
  • Yes, it is reproducible, I found the error. In a `for` loop on first iteration there is `1663611287` which is super fine to convert. But on next iteration there is `2022-09-19 18:14:47`, a date that does not need conversion. You need to supply your code with `if` statement checking for type of values you get in `df_main['timeStamp']` on each iteration. – DiMithras Dec 02 '22 at 15:30
  • @DiMithras Alright, thank you. I'll try to do that! – MathMan 99 Dec 02 '22 at 15:33
  • 2
    @MathMan99 actually, the api is fine, it only provides timestamps. What's happening here, you have `df_main = df_main.append(df_temp)`, that means you mix `timestamps` with casted `datetime`. that results in an error. You need to rework the logic on appending, so that when you cast `timestamps` to date, you do it only on `df_temp` and only than add it to `df_main` – DiMithras Dec 02 '22 at 15:38

1 Answers1

1

Here you go, it's working without an error:

page_number = 0
df_main = pd.DataFrame()
while True:
    for address in wallet_addresses:
        url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
        output = requests.get(url).text
        df_temp = pd.DataFrame(json.loads(output)['result'])
        df_temp['wallet_address'] = address
        page_number += 1
        df_temp['timeStamp'] = pd.to_datetime(df_temp['timeStamp'], unit='s')
        df_main = df_main.append(df_temp)
        if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime(2022, 1, 1).date():
            pass

Wrong append

So, what has happened here. As suggested in the first comment under question we acknowledged the type of first record in df_main with type(df_main['timeStamp'].iloc[0]). With IPython and Jupyter-Notebook one can look what is happening with df_main just after receiving an error with it being populated on the last for loop iteration that failed.
Otherwise if one uses PyCharm or any other IDE with a possibility to debug, the contents of df_main can be revealed via debug.

What we were missing, is that df_main = df_main.append(df_temp) is placed in a slightly wrong place. On first iteration it works well, pd.to_datetime(df_main['timeStamp'], unit='s') gets an str type with Linux epoch and gets converted to pandas._libs.tslibs.timestamps.Timestamp.

But on next iteration df_main['timeStamp'] already has the Timestamp type and it gets appended with str type, so we get a column with mixed type. E.g.:

type(df_main['timeStamp'].iloc[0]) == type(df_main['timeStamp'].iloc[-1])

This results with False. Hence when trying to convert Timestamp to Timestamp one gets an error featured in question.

To mitigate this we can place .append() below the conversion and do this conversion on df_temp instead of df_main, this way we will only append Timestamps to the resulting DataFrame and the code below with if clause will work fine.

As a side note

Another small change I've made was datetime.date(2022, 1, 1). This change was not needed, but the way one works with datetime depends on how this library was imported, so it's worth mentioning:

import datetime
datetime.date(2022, 1, 1)
datetime.datetime(2022, 1, 1).date()

from datetime import datetime
datetime(2022, 1, 1).date()

All the above is legit and will produce the same. On the first import module gets imported, on the second one type gets imported.


Alternative solution

Conversion to Timestamp takes time. If the API provides Linux epoch dates, why not use this date for comparison? Let's add this somewhere where you define wallet_addresses:

reference_date = "01/01/2021"
reference_date = int(time.mktime(datetime.datetime.strptime(reference_date, "%d/%m/%Y").timetuple()))

This will result in 1609448400. Other stack overflow question as reference.

This integer can now be compared with timestamps provided by the API. The only thing left is to cast str to int. We can have your code left intact with some minor changes at the end:

<< Your code without changes >>
        df_main['timeStamp'] = df_main['timeStamp'].astype(int)
        if min(df_main['timeStamp']) < reference_date:
            pass

To make a benchmark I've changed while True: to for _ in range(0,4): to limit the infinite cycle, results are as follows:

  • Initial solution took 11.6 s to complete
  • Alternative solution took 8.85 s to complete

It's 30% faster. Casting str to int takes less time than conversion to TimeStamps, I would call this a preferable solution.

Future warning

FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.

It makes sense to comply with this warning. df_main = df_main.append(df_temp) has to be changed to df_main = pd.concat([df_main, df_temp]).

As for current 1.5.0 version it's already deprecated. Time to upgrade!

DiMithras
  • 605
  • 6
  • 14
  • The error is now coming from `datetime(2022,1,1)` saying `TypeError: module is not callable`. But the other error is solved. Did it work on your end? – MathMan 99 Dec 02 '22 at 15:51
  • 1
    Yeah, I changed it a bit as I was getting the error there . Change `datetime(2022, 1, 1).date()` back to `datetime.date(2022, 1, 1)` if it's working for you. Might be a different python / library version or the way we import `datetime`. I do with `from datetime import datetime`. – DiMithras Dec 02 '22 at 15:55
  • 1
    I gave it a try. And, yes, it's a different way we import `datetime`. My bad, I should have mentioned it. – DiMithras Dec 02 '22 at 15:58
  • 1
    @MathMan99 answer updated with an alternative solution. – DiMithras Dec 03 '22 at 13:21