I'm trying to upload a set of pd.DataFrames
as CSV to a folder in Dropbox using the Dropbox Python SDK (v2). The set of files is not particularly big, but it's numerous. Using batches will help to reduce the API calls and comply with the developer recommendations outlined in the documentation:
"The idea is to group concurrent file uploads into batches, where files in each batch are uploaded in parallel via multiple API requests to maximize throughput, but the whole batch is committed in a single, asynchronous API call to allow Dropbox to coordinate the acquisition and release of namespace locks for all files in the batch as efficiently as possible."
Following several answers in SO (see the most relevant to my problem here), and this answer from the SDK maintainers in the Dropbox Forum I tried the following code:
commit_info = []
for df in list_pandas_df:
df_raw_str = df.to_csv(index=False)
upload_session = dbx.upload_session_start(df_raw_str.encode())
commit_info.append(
dbx.files.CommitInfo(path=/path/to/db/folder.csv
)
dbx.files_upload_finish_batch(commit_info)
Nonetheless, when reading the files_upload_finish_batch
docstring I noticed that the function only takes a list of CommitInfo
as an argument (documentation), which is confusing since the non-batch version (files_upload_session_finish
) does take a CommitInfo
object with a path
, and a cursor object with data about the session.
I'm fairly lost in the documentation, and even the source code is not so helpful to understand how the batch works to upload several files (and not as a case for uploading heavy files). What I am missing here?