I'm building a Django app that enables users to upload a CSV via a form using a FormField. Once the CSV is imported I use the Pandas read_csv(filename) command to read in the CSV so I can do some processing on the CSV using Pandas.
I've recently started learning the really useful Dask library because the size of the uploaded files can be larger than memory. Everything works fine when using Pandas pd.read_csv(filename) but when I try and use Dask dd.read_csv(filename) I get the error "'InMemoryUploadedFile' object has no attribute 'startswith'".
I'm pretty new to Django, Pandas and Dask. I've searched high and low and can't seem to find this error when associated with Dask anywhere on Google.
Here is the code I'm trying to use below (just the relevant bits... I hope):
Inside forms.py
I have:
class ImportFileForm(forms.Form):
file_name = forms.FileField(label='Select a csv',validators=[validate_file_extension, file_size])
Inside views.py
import pandas as pd
import codecs
import dask.array as da
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
def import_csv(request):
if request.method == 'POST':
form = ImportFileForm(request.POST, request.FILES)
if form.is_valid():
utf8_file = codecs.EncodedFile(request.FILES['file_name'].open(),"utf-8")
# IF I USE THIS PANDAS LINE IT WORKS AND I CAN THEN USE PANDAS TO PROCESS THE FILE
#df_in = pd.read_csv(utf8_file)
# IF I USE THIS DASK LINE IT DOES NOT WORK AND PRODUCES THE ERROR
df_in = dd.read_csv(utf8_file)
And here is the error output I'm getting:
AttributeError at /import_data/import_csv/
'InMemoryUploadedFile' object has no attribute 'startswith'
/home/username/projects/myproject/import_data/services.py in save_imported_doc
df_in = dd.read_csv(utf8_file) …
▶ Local vars
/home/username/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read
**kwargs …
▶ Local vars
/home/username/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read_pandas
**(storage_options or {}) …
▶ Local vars
/home/username/anaconda3/lib/python3.7/site-packages/dask/bytes/core.py in read_bytes
fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs) …
▶ Local vars
/home/username/anaconda3/lib/python3.7/site-packages/fsspec/core.py in get_fs_token_paths
path = cls._strip_protocol(urlpath) …
▶ Local vars
/home/username/anaconda3/lib/python3.7/site-packages/fsspec/implementations/local.py in _strip_protocol
if path.startswith("file://"): …
▶ Local vars
/home/username/anaconda3/lib/python3.7/codecs.py in __getattr__
return getattr(self.stream, name)