Q :
" ... is ( there ) other way to improve performance of code ...? "
A :
Yes, there are a few ways,
yet do not expect anything magical, as you have already reported the API-provider's throttling/blocking somewhat higher levels of concurrent API-call from being served
There still might be some positive effects from latency-masking tricks from a just-[CONCURRENT]
orchestration of several API-calls, as the End-to-End latencies are principally "long" as going many-times across the over-the-"network"-horizons and having also some remarkable server-side TAT-latency on translation-matching engines.
Details matter, a lot...
A performance boosting code-template to start with
( avoiding 220k+ repeated local-side overheads' add-on costs ) :
import time
import pandas as pd
from deep_translator import GoogleTranslator as gXLTe
xltDF = pd.read_excel( r"latestdata.xlsx" )['column'].fillna( 'novalue' )
resDF = xltDF.copy( deep = True )
PROC_ns_START = time.perf_counter_ns()
#________________________________________________________ CRITICAL SECTION: start
for i in range( len( xltDF ) ):
resDF.iloc( i ) = gXLTe( source = 'auto',
target = 'english'
).translate( xltDF.iloc( i ) )
#________________________________________________________ CRITICAL SECTION: end
PROC_ns_END = time.perf_counter_ns()
resDF.to_csv( r"jobdone.csv",
sep = ';'
)
print( f"Runtime was {0:} [ns]".format( PROC_ns_END - PROC_ns_START ) )
Tips for performance boosting :
- if Google API-policy permits, we may increase thread-count, that participate on CRITICAL SECTION,
- as the Python-interpreter threads are "inside" the same address-space and still are GIL-lock MUTEX-blocked, we may operate all just-
[CONCURRENT]
accesses to the same DataFrame-objects, best using non-overlapping, separate (thread-private) block-iterators over disjunct halves ( for a pair of threads ) over disjunct thirds ( for 3 threads ) etc...
- as the Google API-policy is limiting attempts to overly concurrent access to the API-service, you shall build-in some, even naive-robustness
def thread_hosted_blockCRAWLer( i_start, i_end ):
for i in range( i_start, i_end ):
while True:
try:
resDF.iloc( i ) = gXLTe( source = 'auto',
target = 'english'
).translate( xltDF.iloc( i ) )
# SUCCEDED
break
except:
# FAILED
print( "EXC: _blockCRAWLer() on index ", i )
time.sleep( ... )
# be careful here, not to get on API-provider's BLACK-LIST
continue
- if more time-related details per thread, may reuse this
Do not hesitate to go tuning & tweaking - and anyway, keep us posted how fast you managed to get, that's fair, isn't it?