0

I am trying to filter output of boto3 iam list_roles method. So i would like to get roles which have Pricipal as AWS mentioned in AssumeRolePolicyDocument. I would like to omit all roles which have Principal as service in AssumeRolePolicyDocument. This working for jq but not with pyjq

import pyjq
import boto3
import re
import json

client = boto3.client("iam")
paginator = client.get_paginator('list_roles')
response_iterator = paginator.paginate()
for page in response_iterator:
    role_list = json.dumps(page['Roles'], indent=4, sort_keys=True, default=str)
    my_data = pyjq.all(".[].AssumeRolePolicyDocument.Statement[].Principal.AWS",role_list)
    print(my_data)

But this is throwing error

Traceback (most recent call last):
  File "iam.py", line 16, in <module>
    my_data = pyjq.all(".[].AssumeRolePolicyDocument.Statement[].Principal.AWS",role_list)
  File "/home/ssm-user/.local/lib/python3.7/site-packages/pyjq.py", line 50, in all
    return compile(script, vars, library_paths).all(_get_value(value, url, opener))
  File "_pyjq.pyx", line 211, in _pyjq.Script.all
_pyjq.ScriptRuntimeError: Cannot iterate over string ("[\n    {\n...)

Appreciate any suggestions on how to fix above issue?

  • Your error message and code do not match. (`.Statement[].Principal.AWS` is missing from the error message). I have never used `pyjq` myself, but can it handle strings or do you have to call it with a JSON object (not the serialized string)? Because the error message sounds like you are trying to iterate over a string, not an array. Or perhaps `Statement` is a serialized JSON array and not a real JSON array? Without seing the content of `role_list` nobody will be able to help. Please take the [tour], read [ask], and then [edit] your question to provide a [mre]. – knittl Dec 28 '22 at 20:25
  • Also note that there's a stray `roles_list=[]` variables that's never used (and `roles_list` != `role_list`) – knittl Dec 28 '22 at 20:32
  • Your input should be the object itself, not in its serialized form. Just pass in `page['Roles']` in directly. – Jeff Mercado Dec 28 '22 at 22:24
  • Already tried with page['Roles'], but as there are some datatime objects in page['Roles'] it is throwing "TypeError: could not be converted to json" error – prashanth pallu Dec 29 '22 at 08:57

2 Answers2

0

Use default=json_serial (instead of default=str) as per How to overcome "datetime.datetime not JSON serializable"?

Other solutions are also given there. https://github.com/ijl/orjson looks promising.

peak
  • 105,803
  • 17
  • 152
  • 177
0

below will give you expected answer, use select query from jq to filter.

This will give you list of iam roles with principal as "AWS"

import pyjq
import boto3
import re
import json

iam_client = boto3.client("iam")
paginator = iam_client.get_paginator('list_roles')
response_iterator = paginator.paginate()
my_data = []
for page in response_iterator:
    role_list = json.dumps(page['Roles'], indent=4, sort_keys=True, default=str)
    role_list_new = json.loads(role_list) #//convert back to json for use in pyjq    
    my_data2 = pyjq.all('map(select(.AssumeRolePolicyDocument.Statement[].Principal.AWS != null ) )',role_list_new)    
    role_dict = {}
    for my_list in my_data2:
        for role in my_list:
            print(role['RoleName'])
            print(role['AssumeRolePolicyDocument'])
            print("\n")
            acc_nums = re.findall('\d{12}', str(role['AssumeRolePolicyDocument']['Statement']) )
            new_acc_num_list = list(set(acc_nums))
            role_dict['RoleName'] = role['RoleName']
            role_dict['account_mentioned']=new_acc_num_list
            my_data.append(role_dict)
            
print(my_data)