0

I have Amazon sample code for running comprehend.start_topics_detection_job. Here is the code with the variables filled in for my job:

import re
import csv
import pytz
import boto3
import json

# https://docs.aws.amazon.com/code-samples/latest/catalog/python-comprehend-TopicModeling.py.html
# https://docs.aws.amazon.com/comprehend/latest/dg/API_InputDataConfig.html

# Set these values before running the program
input_s3_url = "s3://comprehend-topic-modelling-bucket/input_800_cleaned_articles/"
input_doc_format = "ONE_DOC_PER_LINE"
output_s3_url = "s3://comprehend-topic-modelling-bucket/output"
data_access_role_arn = "arn:aws:iam::372656143103:role/access-aws-services-from-sagemaker"
number_of_topics = 30

# Set up job configuration
input_data_config = {"S3Uri": input_s3_url, "InputFormat": input_doc_format}
output_data_config = {"S3Uri": output_s3_url}

# Begin a job to detect the topics in the document collection
comprehend = boto3.client('comprehend')
start_result = comprehend.start_topics_detection_job(
    NumberOfTopics=number_of_topics,
    InputDataConfig=input_data_config,
    OutputDataConfig=output_data_config,
    DataAccessRoleArn=data_access_role_arn)

# Output the results
print('Start Topic Detection Job: ' + json.dumps(start_result))
job_id = start_result['JobId']
print(f'job_id: {job_id}')

# Retrieve and output information about the job
describe_result = comprehend.describe_topics_detection_job(JobId=job_id)
print('Describe Job: ' + json.dumps(describe_result)) . #<===LINE 36

# List and output information about current jobs
list_result = comprehend.list_topics_detection_jobs()
print('list_topics_detection_jobs_result: ' + json.dumps(list_result))

It's failing with the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-840a7ee043d4> in <module>()
     34 # Retrieve and output information about the job
     35 describe_result = comprehend.describe_topics_detection_job(JobId=job_id)
---> 36 print('Describe Job: ' + json.dumps(describe_result))
     37 
     38 # List and output information about current jobs

~/anaconda3/envs/python3/lib/python3.6/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

~/anaconda3/envs/python3/lib/python3.6/json/encoder.py in default(self, o)
    178         """
    179         raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180                         o.__class__.__name__)
    181 
    182     def encode(self, o):

TypeError: Object of type 'datetime' is not JSON serializable

It fails instantly, the second I pus "run". It seems to me that the call to comprehend.start_topics_detection_job may be failing, leading to an error line 36, print('Describe Job: ' + json.dumps(describe_result)).

What am I missing?

UPDATE

The same IAM role is being used for the notebook, as well as in the above code. Here are the permissions currently assigned to that IAM role:

enter image description here

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
VikR
  • 4,818
  • 8
  • 51
  • 96
  • Did you check the IAM roles that you are using? Most often the IAM role is missing permission to access comprehend, or the specific API call (start_topics_detection_job). – Guy Apr 25 '19 at 20:34
  • I was getting permission errors before, but I fixed that -- I **think** -- based on the advice in this SO post --https://stackoverflow.com/questions/55840023/iam-roles-for-sagemaker. The same IAM role is being used for the notebook, as well as in the above code, and it has IAMFullAccess and ComprehendFullAccess. Is there some other kind of access it could need? – VikR Apr 25 '19 at 22:00
  • If you are using a jupyter notebook you should split the code to multiple cells and execute them independently, until you get it to run correctly. You can also use a debugger to see where you are failing. It seems that you are calling the describe command either with the wrong parameters or too quickly. You can also see the status of the job in the Comprehend console and see if it has started correctly. – Guy Apr 26 '19 at 15:45

1 Answers1

0

It turns out that there was nothing wrong with the call to comprehend.describe_topics_detection_job -- it was just returning, in describe_result, something that could not be json serialized, so json.dumps(describe_result)) was throwing an error.

VikR
  • 4,818
  • 8
  • 51
  • 96