0

I have a table where I have a filed (string) with a full Swagger text on it. From this swagger I need to retrieve the elements under the scope object.

Example of the field

swagger: '2.0'
info:
  title: "Values"
  version: 2.4.0
  description: "Lorem iosum."
  license:
    name: Copyright
    url: 'https://lorem.com/'
  x-catalogation:
    ba: "Execution"
    bd: "Loans"
    bdd: "Account" 
schemes:
  - "https"
basePath: "/v2/values"
consumes:
  - "application/json"
produces:
  - "application/json"
securityDefinitions:
  JWTProfile:
    type: "oauth"
    description: 'OGT'
    flow: "implicit"
    scopes:
      valueid.read: "lorem"
      valuelist.read: "lorem"
      valuetran.read: "lorem"
      value: "lorem"
      vlimit: "lorem"
      valuestat.read: "lorem"
    authorizationUrl: "$(authorization-url)"
    x-tokenIntrospect:
      url: "$(sca-security-url)"
paths:...

The desire result of the query must be valueid.read valuelist.read valuetran.read value vlimit valuestat.read

I tried with this query select unnest (regexp_matches(api_spec, E'\"([a-z]+\\.[a-z]+)\"', 'g')) as scopes from apis

but I retrieve only this scopes valueid.read valuelist.read valuetran.read valuestat.read

I'm in a corner, someone can help me?

zubintro
  • 3
  • 2
  • 1
    Using regex for this seems like a doomed project. You want an actual YAML parser. – tripleee Mar 27 '23 at 08:51
  • 1
    Possible duplicate of https://stackoverflow.com/questions/22502798/postgres-query-serialized-yaml-field – tripleee Mar 27 '23 at 08:52
  • Assuming indentation is always the same and none of the values has children, you could try to extract values of `scope` by something like this: `select regexp_matches(api_spec, 'scopes:\n((?: [^\n]+\n)+)') from apis` – markalex Mar 27 '23 at 09:16
  • @tripleee thanks but I can't use external framework, library and/or other king of code but querys – zubintro Mar 27 '23 at 13:48
  • @markalex thanks but the regular expression that you mention recovers also the data among the "" the expected result must be something like this "valueid.read valuelist.read valuetran.read valuestat.read" – zubintro Mar 27 '23 at 13:52
  • Well, this is first step then. Wrap my query with outer one that will split lines by `\n` and extract words before `:` – markalex Mar 27 '23 at 13:55
  • You can probably come up with a two-pronged regex where the first pass needs to find the indentation level of the keys you want to extract and the second actually extracts them, whilst disregarding any other keys with the same indentation level. Again, doing this with a regex is like paddling a canoe with an elephant. – tripleee Mar 27 '23 at 14:42

1 Answers1

0

DISCLAIMER: I suggest treating this answer as a pure exercise: if you need to parse YAML, it is always better to use some programming language with libraries supporting YAML format parsing.


Assuming indentation oh scope's children is always the same and none of them has any children, you could extract names of scope child elements with following query:

select
    unnest (regexp_matches (t2.val2, '\S+(?=:)', 'g')) as children
from
    (
        select
            unnest (regexp_matches (api_spec, 'scopes:\n((?:      [^\n]+\n)+)')) as val2
        from
            apis
    ) as t2

Expected output for your example:

children
valueid.read
valuelist.read
valuetran.read
value
vlimit
valuestat.read

Demo here

markalex
  • 8,623
  • 2
  • 7
  • 32