Here you go, it's working without an error:
page_number = 0
df_main = pd.DataFrame()
while True:
for address in wallet_addresses:
url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
output = requests.get(url).text
df_temp = pd.DataFrame(json.loads(output)['result'])
df_temp['wallet_address'] = address
page_number += 1
df_temp['timeStamp'] = pd.to_datetime(df_temp['timeStamp'], unit='s')
df_main = df_main.append(df_temp)
if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime(2022, 1, 1).date():
pass
Wrong append
So, what has happened here. As suggested in the first comment under question we acknowledged the type
of first record in df_main
with type(df_main['timeStamp'].iloc[0])
. With IPython and Jupyter-Notebook one can look what is happening with df_main
just after receiving an error with it being populated on the last for
loop iteration that failed.
Otherwise if one uses PyCharm or any other IDE with a possibility to debug, the contents of df_main
can be revealed via debug.
What we were missing, is that df_main = df_main.append(df_temp)
is placed in a slightly wrong place. On first iteration it works well, pd.to_datetime(df_main['timeStamp'], unit='s')
gets an str
type with Linux epoch and gets converted to pandas._libs.tslibs.timestamps.Timestamp
.
But on next iteration df_main['timeStamp']
already has the Timestamp type and it gets appended with str
type, so we get a column with mixed type. E.g.:
type(df_main['timeStamp'].iloc[0]) == type(df_main['timeStamp'].iloc[-1])
This results with False. Hence when trying to convert Timestamp to Timestamp one gets an error featured in question.
To mitigate this we can place .append()
below the conversion and do this conversion on df_temp
instead of df_main
, this way we will only append Timestamps to the resulting DataFrame and the code below with if
clause will work fine.
As a side note
Another small change I've made was datetime.date(2022, 1, 1)
. This change was not needed, but the way one works with datetime
depends on how this library was imported, so it's worth mentioning:
import datetime
datetime.date(2022, 1, 1)
datetime.datetime(2022, 1, 1).date()
from datetime import datetime
datetime(2022, 1, 1).date()
All the above is legit and will produce the same. On the first import module
gets imported, on the second one type
gets imported.
Alternative solution
Conversion to Timestamp takes time. If the API provides Linux epoch dates, why not use this date for comparison? Let's add this somewhere where you define wallet_addresses
:
reference_date = "01/01/2021"
reference_date = int(time.mktime(datetime.datetime.strptime(reference_date, "%d/%m/%Y").timetuple()))
This will result in 1609448400. Other stack overflow question as reference.
This integer can now be compared with timestamps provided by the API. The only thing left is to cast str
to int
. We can have your code left intact with some minor changes at the end:
<< Your code without changes >>
df_main['timeStamp'] = df_main['timeStamp'].astype(int)
if min(df_main['timeStamp']) < reference_date:
pass
To make a benchmark I've changed while True:
to for _ in range(0,4):
to limit the infinite cycle, results are as follows:
- Initial solution took 11.6 s to complete
- Alternative solution took 8.85 s to complete
It's 30% faster. Casting str
to int
takes less time than conversion to TimeStamps, I would call this a preferable solution.
Future warning
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
It makes sense to comply with this warning. df_main = df_main.append(df_temp)
has to be changed to df_main = pd.concat([df_main, df_temp])
.
As for current 1.5.0 version it's already deprecated. Time to upgrade!