I have the following python function to extract unique object names in an AWS S3 bucket, and it passed the test runs in python:
###############################
# aws_get_session_name_fuc.py #
###############################
def get_session_names(s3, BUCKET, session_path):
session_names = []
unique_session_names = []
n = 2
session_archives = s3.Bucket(BUCKET).objects.filter(Prefix = session_path)
# Extract object names from path
for i in session_archives:
temp_str = i.key.split('/')
session_names.append(temp_str[n])
# obtain unique object names
for i in session_names:
if i not in unique_session_names:
unique_session_names.append(i)
return unique_session_names
'''
# Test
test_extract = get_session_names(s3, BUCKET, session_path)
test_extract
Out[19]:
['',
'testSession01',
'testSession02',
'testSession03']
'''
When I run the above script using reticulate::source_python("aws_get_session_name_fuc.py")
in R, I would get the following error:
> existing_session = get_session_names(s3, BUCKET, session_path)
Error in py_call_impl(callable, dots$args, dots$keywords) :
RecursionError: maximum recursion depth exceeded
I tried to increase the recursion limit in the python script using sys.setrecursionlimit()
, it would either trigger the same error when the value is not "large enough", or crash the R session if the value was "too large":
So, I'm trying to understand:
- Why does the function pass in Python, but fail when referenced in R?
- Is there a better way to resolve this?
EDIT-1:
I managed to resolve this problem by converting the defined function into script, and then using reticulate::py_run_file("python/aws_get_session_name_script.py")
directly to obtain the extraction, write results to file, and then load back into R.