2

I am trying to convert the text files in my Azure blob container from ANSI to UTF-8 encoding without downloading the files locally using python. I am getting the following error when I try to import BlockBlobService in my Python code to deal with Azure Blob Storage. I believe I have the correct python modules installed already, but there might be some other module that is missing which I am not aware of or it could be "not having the correct python module version". "pip list" command shows the following on my VM. Any help on this would be good.

pip list Package Version


azure-common         1.1.25
azure-core           1.4.0
azure-nspkg          3.0.2
azure-storage        0.36.0
azure-storage-blob   12.3.0
azure-storage-common 2.1.0
azure-storage-nspkg  3.1.0
bcrypt               3.1.7
certifi              2020.4.5.1
cffi                 1.14.0
chardet              3.0.4
cryptography         2.9
idna                 2.9
isodate              0.6.0
msrest               0.6.13
oauthlib             3.1.0
paramiko             2.7.1
pip                  20.0.2
pycparser            2.20
PyNaCl               1.3.0
python-dateutil      2.8.1
requests             2.23.0
requests-oauthlib    1.3.0
setuptools           41.2.0
six                  1.14.0
urllib3              1.25.8
wheel                0.34.2
sparc
  • 345
  • 1
  • 2
  • 13
  • 3
    `azure-storage-blob 12.3.0` don't use `BlockBlobService `, it uses `BlobServiceClient`. `BlockBlobService ` should be v2 sdk.Check this v12 doc.https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python – George Chen Apr 20 '20 at 08:16
  • it looks like azure-storage-blob 12.3.0 is the latest version Storage - Blobs pypi 12.3.0 docs 12.3.0 github 12.3.0 https://azure.github.io/azure-sdk/releases/2020-04/python.html – sparc Apr 20 '20 at 08:44
  • Yes, just try the sample code in the link it should work. – George Chen Apr 20 '20 at 08:45
  • I am already using 12.3.0. but getting the error as mentioned in my question – sparc Apr 20 '20 at 08:50
  • Check the doc, v12 sdk uses `BlobServiceClient`, if you want to use `BlockBlobService ` should use v2 sdk, refer to this link.https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python-legacy – George Chen Apr 20 '20 at 08:52
  • Thank you. I am trying to convert the files in the container to UTF-8. I am using the code below but it is not working. create_blob_from_text(container_name, filename, file, encoding ='utf-8'). Please can you help – sparc Apr 20 '20 at 12:23
  • I have post my answer, check that. Hope this is what you want. – George Chen Apr 21 '20 at 05:52

2 Answers2

10

Azure-storage-blob, version: 12.3.0 is the latest version which includes BlobServiceClient instead of BlockBlobService, so if you would like to use BlockBlobService, you must specify the azure-storage-blob version to be 2.1.0. just do

pip install azure-storage-blob==2.1.0

This would solve your problem.

Aishwarya Patil
  • 447
  • 5
  • 9
0

If your blob encoding is not UTF-8, it's not able to change it. And you said you want to use create_blob_from_text to do it, so I suppose your text file is not UTF-8 and you want to change it to UTF-8 to upload it.

Firstly you should know, if your text file is UTF-8, you don't need change anything just upload it, it will still be UTF-8. However if you file is not UTF-8, it won't convert it to UTF-8, it will be encoded to UTF-8 with original encoding. If you could understand this, you will know how to upload you file to azure blob with UTF-8 encoding.

Like below I upload a text file with encoding GBK.

txt= open('D:/hello.txt').readline() # GBK Tex

charset = 'UTF-8'
block_blob_service.create_blob_from_text(container_name='test',blob_name='test-gbk.txt',text=txt.encode('ISO-8859-1').decode('GBK'),encoding=charset)

Below is the pic, left is the original file with GBK encoding, right is the file downloading from the azure blob it's encoded with 'UTF-8'.

enter image description here

Update: I open the text file to BytesIO then upload it with the below code. You could ignore the latin-1.

text=open('E:/test.txt',encoding='latin-1').readline()
charset = 'UTF-8'
buf=BytesIO(text.encode('ISO-8859-1').decode('ANSI').encode('UTF-8'))
block_blob_service.create_blob_from_stream(container_name='test',blob_name='test.txt',stream=buf)

enter image description here

George Chen
  • 13,703
  • 2
  • 11
  • 26
  • Thank you George for your detailed answer. However, as I have 100 files of 4GB size each, using create_blob_from_text is taking too long to run. So, I tried blockblob_service.create_blob_from_stream(container_name, file_name, file, count=None, content_settings=ContentSettings(content_encoding='UTF-8')) but it is not converting to UTF-8. It is creating a new blob but with the same encoding as the source which is ANSI. Please can you help. – sparc Apr 21 '20 at 12:14
  • Check my updated answer, hope this could help you. If this could help you, you could accept it as the answer. – George Chen Apr 21 '20 at 13:59
  • Thank you for the answer. I am trying to do this without downloading the files locally. I followed your steps but I could not find any function to open the file from Blob storage directly. Is there any way to convert to UTF-8 without downloading the files to local system? – sparc Apr 23 '20 at 08:50
  • I download to local to check if the encoding is convert to utf-8, the above just upload the file. – George Chen Apr 23 '20 at 08:55
  • You could not change the blob encoding, only could upload file or stream and convert the encoding type to utf-8. Please check the code. – George Chen Apr 23 '20 at 08:57
  • Can I not create a new one with new encoding type utf-8 without downloading them. But I guess without opening the file, you can't convert to utf-8 and I could not find a function to open a blob file without downloading – sparc Apr 23 '20 at 09:03
  • What about `get_blob_to_stream` or `get_blob_to_text`, this could download blob to stream to text, don't have to store to local then convert encoding and reupload. – George Chen Apr 23 '20 at 09:11
  • Thank you George for your suggestion. Will it download huge files? I will try this and let you know. – sparc Apr 23 '20 at 09:29