3

I have the following script tha that gets the service_name of a tnsfiles if available if not it get the SID it seems to work fine but it is returning me tuples that I am unable to parse

#!/usr/bin/env python

import re

regexes = re.compile(r'SERVICE_NAME\s?=\s?(.+?)\)|SID\s?=\s?(.+?)\)')

with open('tnsnames.ora.test') as tns_file:
    for tnsname in tns_file:
        match = regexes.search(tnsname)

        if match:
          print(match.groups())

the script returns the following:

(None, 'db1')
('db2', None)
('db3', None)

but I only want to have the name of the db returned not the None

how can I strip the "None" from the output. i cannot use re.findall because there are some lines in the tnsnames that have a service_name and a sid and then I will have duplicates.

how can I parse the output of match regex object to ignore the none?

zn553
  • 87
  • 7

2 Answers2

1

You are using .groups() method that returns all captured values even if they are empty. Since the regex contains an alternation with a capturing group in each, one of them will always be empty upon a valid match.

A generic solution for this is to filter out a None value from the two item tuple, and you may do that using a lot of approaches. One way is to concat the two values:

m = match.groups()
print(r'{}{}'.format(m[0] or '', m[1] or ''))

The m[x] or '' syntax is OK here as we can only have a string or None in the match.groups().

Another solution is to re-write the pattern so that it contains just 1 capturing group.

It is easy to make the pattern contain a single group as the part matching between parentheses is duplicated in both alternatives:

r'(?:SERVICE_NAME|SID)\s*=\s*([^)\r\n]+)'
  ^^^^^^^^^^^^^^^^^^^^

See the regex demo and the regex graph:

enter image description here

Details

  • (?:SERVICE_NAME|SID) - a non-capturing group that matches either SERVICE_NAME or SID
  • \s*=\s* - a = enclosed with 0+ whitespaces
  • ([^)\r\n]+) - Group 1: any chars, one or more occurrences, other than ), CR and LF (excluded because of . in the original attempt).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

If you want a single capturing group to prevent getting 2 groups where one will be empty due to the alternation, you could move the alternation to the start of the pattern between SERVICE_NAME and SID (?:SERVICE_NAME|SID) and make it a non capturing group.

If both words can not be part of a larger word, you could prepend a wordboundary \b to the pattern.

(?:SERVICE_NAME|SID)\s?=\s?(.+?)\)

Explanation

  • (?:SERVICE_NAME|SID) Match either SERVICE_NAME or SID
  • \s?=\s? Match a = surrounded by an optional whitespace char
  • (.+?)\) Match any character except a newline non greedy, then match )

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70