I was given help with the splitlines() function which worked perfect on string output which wasn't seperated by page numbers, see How to Create Spark or Pandas Dataframe from str output in Apache Spark on Databricks
I am now using str_output = result.pages
as opposed to str_output = result.content
Now, when I execute
df_data = pd.DataFrame({'ColumnA':str_output.splitlines()})
df_data
I get the following error:
AttributeError: 'list' object has no attribute 'splitlines'
I think its because of the way that I'm using the splitlines function, but I'm not sure.
Any help appreciated
I should show the full code, see below:
import pandas as pd
from azure.ai.formrecognizer import DocumentAnalysisClient
# field_list = ["result.content"]
document_analysis_client = DocumentAnalysisClient(
endpoint=endpoint, credential=AzureKeyCredential(key)
)
for blob in container.list_blobs():
blob_url = container_url + "/" + blob.name
poller = document_analysis_client.begin_analyze_document_from_url(
"prebuilt-read", blob_url)
result = poller.result()
print("Scanning " + blob.name + "...")
print ("document contains", result.content)
myoutput = result.pages
df_data = pd.DataFrame({'RAWTEXT':myoutput.splitlines()})
df_data
As resuesting, a sample of the data is as follows:
Scanning 05Jul11 Raet Prelim.pdf... document contains PRELIMINARY REPORT RAET HOLDING B.V. 5 JULY 2011 1 RæT CONTENTS 1 INVESTMENT PROPOSAL ............................................................................................................ 5 1.1 Background to business................................................................................................................ 5 1.2 Process ........................................................................................................................................ 6 1.2.1 Overview .............................................................................................................................. 6 1.2.2 Due Diligence ....................................................................................................................... 7 1.2.3 Banking / Financing .............................................................................................................. 8 1.2.4 Proposed Tactics / Recommendation .................................................................................... 8 1.3 Investment Overview .................................................................................................................... 9 1.3.1 Investment thesis .................................................................................................................. 9 1.3.2 Business Strengths ............................................................................................................... 9 1.3.3 Investment Case Returns .....................................................................................................11 1.4 Key judgment calls ......................................................................................................................12 1.5 Recommendation ........................................................................................................................18 2 MARKET AND BUSINESS