Questions tagged [amazon-textract]

Amazon Textract enables document text detection and analysis in applications. The Amazon Textract Text Detection API can detect text in a variety of documents including financial reports, medical records, and tax forms. For documents with structured data, you can use the Amazon Textract Document Analysis API to detect linked text, tables, option buttons (radio buttons), and check boxes.

Amazon Textract documentation

226 questions

votes

2 answers

Unsupported Document format while using Amazon Textract,

When i try to parse pdf file accessed via amazon s3, it gives me an error, Request has unsupported document format. i am using Amazon textract with boto3. When i try to parse pdf file accessed via amazon s3, it gives me an error, Request has…

python python-3.x amazon-textract

asked Jul 18 '19 at 07:08

Jung Thapa

votes

2 answers

Amazon Textract vs Amazon Rekognition DetectText

How do I decide when to use Amazon Textract vs Amazon Rekognition's TextDetect method? My usecase is click picture from mobile and convert image data into text and store into AWS…

amazon-web-services amazon-rekognition amazon-textract

asked May 06 '19 at 15:34

vaquar khan

10,864
5
72
96

votes

5 answers

AWS Textract StartDocumentAnalysis function not publishing a message to the SNS Topic

I am working with AWS Textract and I want to analyze a multipage document, therefore I have to use the async options, so I first used startDocumentAnalysisfunction and I got a JobId as the return, But it needs to trigger a function that I have set…

amazon-web-services aws-lambda aws-sdk aws-sdk-nodejs amazon-textract

asked Jun 23 '19 at 23:53

gokublack

1,260
2
15
36

votes

6 answers

How to use the Amazon Textract with PDF files

I already can use the textract but with JPEG files. I would like to use it with PDF files. I have the code bellow: import boto3 # Document documentName = "Path to document in JPEG" # Read document content with open(documentName, 'rb') as…

amazon-web-services ocr text-extraction amazon-textract

asked Nov 25 '19 at 18:46

ArthurS

votes

1 answer

InvalidS3ObjectException: Unable to get object metadata from S3?

So I am trying to use Amazon Textract to read in multiple pdf files, with multiple pages using the StartDocumentTextDetection method as follows: client = boto3.client('textract') textract_bucket = s3.Bucket('my_textract_console-us-east-2') for…

python amazon-web-services amazon-s3 boto3 amazon-textract

asked Aug 31 '20 at 15:27

ocean800

3,489
13
41
73

votes

2 answers

AWS Textract - UnsupportedDocumentException - PDF

I'm using boto3 (aws sdk for python) to analyze a document (a pdf) to get the form key:value pairs. import boto3 def process_text_analysis(bucket, document): # Get the document from S3 s3_connection = boto3.resource('s3') s3_object =…

python amazon-web-services boto3 amazon-textract

asked Mar 03 '20 at 06:34

gmwill934

votes

5 answers

Amazon textextract I can't find trp module

I want to use this amazon table textract script The problem I encounter is that I don't have any clue what is trp module and how I can install it. I tried pip install trp But when I try to run then I get this…

python python-3.x amazon-web-services amazon-textract

asked Aug 09 '19 at 17:17

Iakovos Belonias

1,217
9
25

votes

2 answers

Using Textract, how do you extract tables from a pdf file and output it into a csv file via .py script?

I want to use textract (via aws cli) to extract tables from a pdf file (located in an s3 location) and export it into a csv file. I have tried writing a .py script but am struggling to read from the file. Any suggestions for writing the .py script…

python amazon-web-services text-extraction amazon-textract

asked Oct 13 '20 at 17:18

Chris You

votes

1 answer

Using Textract for OCR locally

I want to extract text from images using Python. (Tessaract lib does not work for me because it requires installation). I have found boto3 lib and Textract, but I'm having trouble working with it. I'm still new to this. Can you tell me what I need…

python amazon-web-services amazon-textract

asked Sep 24 '20 at 10:57

taga

3,537
13
53
119

votes

2 answers

How to retrieve tables which exists in a pdf using AWS Textract in java

I found article below to do in python. https://docs.aws.amazon.com/textract/latest/dg/examples-export-table-csv.html also I used article below to extract text. https://docs.aws.amazon.com/textract/latest/dg/detecting-document-text.html but above…

java amazon-web-services spring-boot amazon-textract

asked Apr 07 '20 at 18:30

Farhan

votes

4 answers

AWS Textract InvalidParameterException

I have a .Net core client application using amazon Textract with S3,SNS and SQS as per the AWS Document , Detecting and Analyzing Text in Multipage Documents(https://docs.aws.amazon.com/textract/latest/dg/async.html) Created an AWS Role with…

amazon-web-services .net-core amazon-textract

asked Nov 30 '19 at 05:52

Nabeel

votes

0 answers

URL.hostname is not implemented

I'm looking for some help on my textract client project. I am trying to follow the AWS Textract documentation, but I am stuck at the textractClient.send(). I am getting the error URL.hostname is not implemented I have followed the steps on AWS to…

javascript amazon-web-services react-native aws-sdk-js amazon-textract

asked Feb 01 '22 at 04:35

Jackc01999

votes

1 answer

How to get the font style from an image with text?

I am using the Amazon Textract API, through AWS' Python API, to extract text from a document (pdf or jpg). I do get the text and coordinates of its bounding box, but I would also love to have the font type (only the major ones needed: Arial,…

python ocr image-recognition amazon-textract

asked May 15 '21 at 15:30

tyrex

8,208
12
43
50

votes

0 answers

AccessDeniedException when calling AnalyzeDocument

When calling AnalyzeDocument I receive an Amazon.Textract.Model.AccessDeniedException: Additional information: User: arn:aws:iam::[number]:user/service is not authorized to perform: textract:AnalyzeDocument The user is in a group with the…

amazon-web-services amazon-textract

asked Jun 07 '19 at 01:18

Wolfgang Radl

2,319
2
17
22

votes

3 answers

I am using aws textract StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. I want only lines to be returned, not the single words

I am creating an OCR internal tool using aws textract and nodejs to detect text from a scanned pdf, specifically StartDocumentTextDetectionCommand and GetDocumentTextDetectionCommand. Currently returned in a list of block objects with the lines…

amazon-web-services ocr text-extraction amazon-textract

asked Aug 30 '22 at 13:10

Faris Ashhab

2 3

…

14 15 Next