I'm following the Databricks example for uploading a file to DBFS (in my case .csv):
import json
import requests
import base64
DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={'Authorization': 'Bearer %s' % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
When using the tutorial as is, I get an error:
Traceback (most recent call last):
File "db_api.py", line 65, in <module>
data = base64.standard_b64encode(block)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 95, in standard_b64encode
return b64encode(s)
File "C:\Miniconda3\envs\dash_p36\lib\base64.py", line 58, in b64encode
encoded = binascii.b2a_base64(s, newline=False)
TypeError: a bytes-like object is required, not 'str'
I tried doing with open('./sample.csv', 'rb') as f:
before passing the blocks to base64.standard_b64encode
but then getting another error:
TypeError: Object of type 'bytes' is not JSON serializable
This happens when the encoded block data is being sent into the API call.
I tried skipping encoding entirely and just passing the blocks into the post
call. In this case the file gets created in the DBFS but has 0 bytes size.
At this point I'm trying to make sense of it all. It doesn't want a string but it doesn't want bytes either. What am I doing wrong? Appreciate any help.