0

I have a pdf that is already in the blob storage. I need to highlight few lines in it and store it as a new pdf (again in blob storage). I tried finding it in the links below but couldn't. Below is the pseudo code:

import fitz


def edit_pdfs(path_to_pdf_from_blob)  

    ### READ pdf from blob storage
    doc = fitz.open(path_to_pdf_from_blob)

    ## EDIT doc (fitz.fitz.Document) - I already have working code to edit the doc , but won't put it here to avoid complexity


    ### WRITE pdf to blob storage
    doc.save(new_path_to_pdf_from_blob)

Answers already seen:

Access data within the blob storage without downloading
How can I read a text file from Azure blob storage directly without downloading it to a local file(using python)?
Azure Blobstore: How can I read a file without having to download the whole thing first?

newbie101
  • 65
  • 7
  • You could read the blob data as stream instead of saving the file locally. – Gaurav Mantri Jun 17 '23 at 15:43
  • Thanks Gaurav. Do you mind sending across a link that can help me do this. I have been through [this](https://stackoverflow.com/questions/49467961/python-script-to-use-data-from-azure-storage-blob-by-stream-and-update-blob-by) , but the solution is not compatible with the latest version of azure storage as the method has been removed – newbie101 Jun 17 '23 at 16:10
  • Please see this: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-download-python#download-to-a-stream. HTH. – Gaurav Mantri Jun 17 '23 at 16:34

1 Answers1

1

I tried in my environment and got the below results:

Initially, I had one pdf document in my container with the name important.pdf with content like below.

enter image description here

You can use the below code to edit the pdf without downloading it locally.

Code:

from io import BytesIO
import fitz
from azure.storage.blob import BlobServiceClient

connection_string = "your-connection-string"
blob_name = "important.pdf"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container="test", blob=blob_name)

# Download the PDF file as bytes
pdf_bytes = blob_client.download_blob().content_as_bytes()
doc = fitz.open(stream=pdf_bytes, filetype="pdf")
page = doc[0]
rect = fitz.Rect(50, 50, 200, 200)
highlight = page.add_highlight_annot(rect)  
# Set the color of the highlight annotation
highlight.update()

new_blob_name = "demo.pdf"
modified_pdf_stream = BytesIO()
doc.save(modified_pdf_stream)
modified_pdf_bytes = modified_pdf_stream.getvalue() 

# Get a BlobClient object for the new PDF file
new_blob_client = blob_service_client.get_blob_client(container="test", blob=new_blob_name)
new_blob_client.upload_blob(modified_pdf_bytes, overwrite=True)

#delete an original file
blob_client = blob_service_client.get_blob_client(container="test", blob=blob_name)
blob_client.delete_blob()

Output:

enter image description here

Venkatesan
  • 3,748
  • 1
  • 3
  • 15