2

I'm a beginner playing around with the Doc AI Cloud library and I was trying the run the program below.

However, even after diligently following the instructions I get no output, no error report, or anything, just another line to try again like I never did anything.

this is my version of the code


from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore

# TODO(developer): Uncomment these variables before running the sample.
project_id = "document-ai-testing-2"
location = "us"
file_path = "C:/Users/Tyron/OneDrive/Desktop/Software/Document AI/Test Documents/Winnie_the_Pooh_3_Pages.pdf"
processor_display_name = "jumpstart_ocr_processor_version_2"

def quickstart(
    project_id: str,
    location: str,
    file_path: str,
    processor_display_name: str = "My Processor",
):
    # You must set the `api_endpoint`if you use a location other than "us".
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the location, e.g.:
    # `projects/{project_id}/locations/{location}`
    parent = client.common_location_path(project_id, location)

    # Create a Processor
    processor = client.create_processor(
        parent=parent,
        processor=documentai.Processor(
            type_="OCR_PROCESSOR",  # Refer to https://cloud.google.com/document-ai/docs/create-processor for how to get available processor types
            display_name=processor_display_name,
        ),
    )

    # Print the processor information
    print(f"Processor Name: {processor.name}")

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    # Load binary data
    raw_document = documentai.RawDocument(
        content=image_content,
        mime_type="application/pdf",  # Refer to https://cloud.google.com/document-ai/docs/file-types for supported file types
    )

    # Configure the process request
    # `processor.name` is the full resource name of the processor, e.g.:
    # `projects/{project_id}/locations/{location}/processors/{processor_id}`
    request = documentai.ProcessRequest(name=processor.name, raw_document=raw_document)

    result = client.process_document(request=request)

    # For a full list of `Document` object attributes, reference this page:
    # https://cloud.google.com/document-ai/docs/reference/rest/v1/Document
    document = result.document

    # Read the text recognition output from the processor
    print("The document contains the following text:")
    print(document.text)

Please Help

  1. I created a virtual environment using vscode.
  2. I installed gcloud in the environment
  3. I did the authentication
  4. Navigated to the separate directory for the python script: https://cloud.google.com/document-ai/docs/libraries#client-libraries-install-python
  5. used:python "The File.py"
  6. python icon briefly flashes when I press enter
  7. no output

note: At first I was struggling to create the venv then I had to bypass the execution policy.

Tyrone
  • 21
  • 3
  • 1
    Please post the actual code you are using. The code you are using will do nothing at all. Your code defines a function `quickstart(...)` that is never called. Maybe you misunderstand how to write Python? If that is the case, you need to add a line at the bottom to call the method `quickstart()` with the correct parameters. – John Hanley Jul 24 '23 at 23:17
  • Oh my mistake! I'm still an amateur when it comes to python. I never realized it was never called. I'll edit the question and add the actual code. Thanks for the help! – Tyrone Jul 25 '23 at 18:12
  • The answer from @holt-skinner will help you get started. – John Hanley Jul 25 '23 at 18:41

1 Answers1

1

Followup to @john-hanley's comment. The code you have posted will not output anything on its own, it just defines the function, but it does not call it. If you want to run just the sample function, you can add this to the code:

if __name__ == "__main__":
    project_id = "YOUR_PROJECT_ID"
    location = "YOUR_PROCESSOR_LOCATION"  # Format is "us" or "eu"
    file_path = "/path/to/local/pdf"
    processor_display_name = "YOUR_PROCESSOR_DISPLAY_NAME"  # Must be unique per project, e.g.: "My Processor"

    quickstart(
        project_id=project_id,
        location=location,
        file_path=file_path,
        processor_display_name=processor_display_name,
    )
Holt Skinner
  • 1,692
  • 1
  • 8
  • 21
  • thank you, kind sir, I really appreciate your help – Tyrone Jul 25 '23 at 18:17
  • nvm, the snippet you provided really helped. It turns out every time i would ctrl + s the code after making the changes, it wouldn't update the python file. silly rookie error... but I really appreciate the help thanks!! – Tyrone Jul 25 '23 at 20:28