4
import re
name = 'propane'
a = []
Alkane = re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)
if Alkane != a:
    print(Alkane)

As you can see when the regular express takes in propane it will output two empty strings.

[('', '', 'prop', 'ane')]

For these types of inputs, I want to remove the empty strings from the output. I don't know what kind of form this output is in though, it doesn't look like a regular list.

Okeh
  • 163
  • 1
  • 7

3 Answers3

2

You can use str.split() and str.join() to remove empty strings from your output:

>>> import re
>>> name = 'propane'
>>> Alkane = re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)
>>> Alkane
[('', '', 'prop', 'ane')]
>>> [tuple(' '.join(x).split()) for x in Alkane]
[('prop', 'ane')]

Or using filter():

[tuple(filter(None, x)) for x in Alkane]
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
0

You can use filter to remove empty strings:

import re
name = 'propane'
a = []
Alkane = list(map(lambda m: tuple(filter(bool, m)), re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)))
if Alkane != a:
    print(Alkane)

Or you can use list/tuple comprehension:

import re
name = 'propane'
a = []
Alkane = [tuple(i for i in m if i) for m in re.findall('(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)]
if Alkane != a:
    print(Alkane)

Both output:

[('prop', 'ane')]
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Would you mind if you explain how the second method works? Or even give me a link to where I can learn to interpret this sort of logic? It is completely incomprehensible to me. – Okeh Jul 08 '18 at 03:39
  • You can read about list comprehension [here](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions). Basically since `re.findall` returns a list of tuples, we first iterate through the list, and then for each tuple in the list, we iterate through each item in the tuple to test if the item evaluates to True, which an empty string is not. – blhsing Jul 08 '18 at 03:43
0

It is stated in the doc that empty match are included.

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

This means you will need to filter out empty compounds yourself. Use falsiness of the empty string for that.

import re
name = 'propane'
alkanes = re.findall(r'(\d+\W+)*(methyl|ethyl|propyl|butyl)*(meth|eth|prop|but|pent|hex)(ane)', name)

alkanes = [tuple(comp for comp in a if comp) for a in alkanes]

print(alkanes) # [('prop', 'ane')]

Also, avoid using capitalized variable names as those are generally reserved for class names.

Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73