14

When I was trying to answer this question: regex to split %ages and values in python I noticed that I had to re-order the groups from the result of findall. For example:

data = """34% passed 23% failed 46% deferred"""
result = {key:value for value, key in re.findall('(\w+)%\s(\w+)', data)}
print(result)
>>> {'failed': '23', 'passed': '34', 'deferred': '46'}

Here the result of the findall is:

>>> re.findall('(\w+)%\s(\w+)', data)
>>> [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]

Is there a way to change/specify the order of the groups that makes re.findall return:

[('passed', '34'), ('failed', '23'), ('deferred', '46')]

Just to clarify, the question is:

Is it possible to specfic the order or re-order the groups for the return of the re.findall function?

I used the example above to create a dictionary to provide a reason/use case for when you would want to change the order (making key as value and value as key)

Further clarification:

In order to handle groups in larger more complicated regexes, you can name groups, but those names are only accessible when you do a re.search pr re.match. From what I have read, findall has a fixed indices for groups returned in the tuple, The question is anyone know how those indices could be modified. This would help make handling of groups easier and intuitive.

Community
  • 1
  • 1
ashwinjv
  • 2,787
  • 1
  • 23
  • 32
  • 1
    It is **not** possible to alter the order of the groups returned by `findall`, but it is easy to re-order them after the fact as I showed in my second answer: http://stackoverflow.com/a/25629693/20789 – Dan Lenski Sep 02 '14 at 18:14
  • 1
    Thats what I assumed, but could not find documentation to state that. Hence my question here. – ashwinjv Sep 02 '14 at 18:15

3 Answers3

22

Take 3, based on a further clarification of the OP's intent in this comment.

Ashwin is correct that findall does not preserve named capture groups (e.g. (?P<name>regex)). finditer to the rescue! It returns the individual match objects one-by-one. Simple example:

data = """34% passed 23% failed 46% deferred"""
for m in re.finditer('(?P<percentage>\w+)%\s(?P<word>\w+)', data):
    print( m.group('percentage'), m.group('word') )
Community
  • 1
  • 1
Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
1

As you've identified in your second example, re.findall returns the groups in the original order.

The problem is that the standard Python dict type does not preserve the order of keys in any way. Here's the manual for Python 2.x, which makes it explicit, but it's still true in Python 3.x: https://docs.python.org/2/library/stdtypes.html#dict.items

What you should use instead is collections.OrderedDict:

from collections import OrderedDict as odict

data = """34% passed 23% failed 46% deferred"""
result = odict((key,value) for value, key in re.findall('(\w+)%\s(\w+)', data))
print(result)
>>> OrderedDict([('passed', '34'), ('failed', '23'), ('deferred', '46')])

Notice that you must use the pairwise constructor form (dict((k,v) for k,v in ...) rather than the dict comprehension constructor ({k:v for k,v in ...}). That's because the latter constructs instances of dicttype, which cannot be converted to OrderedDict without losing the order of the keys... which is of course what you are trying to preserve in the first place.

Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
  • I was wondering if I can specify or change the original order of the return for re.findall. The conversion to dict was just more of an example of when I want to re-order the groups. – ashwinjv Sep 02 '14 at 18:08
  • Your question does not make it clear at all what you are trying to reorder. Please edit it to clarify this. – Dan Lenski Sep 02 '14 at 18:09
  • 2
    **Update:** Python `dict` **does** preserve key ordering for newer versions of Python (**see also** [SPEC](https://mail.python.org/pipermail/python-dev/2017-December/151283.html) [SO Post](https://stackoverflow.com/a/39537308/42223) ) – dreftymac Apr 12 '19 at 21:04
1

Per the OP's comment on my first answer: If you are simply trying to reorder a list of 2-tuples like this:

[('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]

... to look like this, with individual elements reversed:

[('passed', '34'), ('failed', '23'), ('deferred', '46')]

There's an easy solution: use a list comprehension with the slicing syntax sequence[::-1] to reverse the order of the elements of the individual tuples:

a = [('34', 'passed'), ('23', 'failed'), ('46', 'deferred')]
b = [x[::-1] for x in a]
print b
Community
  • 1
  • 1
Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
  • I know how to re-order tuples, the questions is to specific the order to re.findall. – ashwinjv Sep 02 '14 at 18:13
  • The order of **what** to `re-findall`? I'm showing you how to take the output of `re.findall` and alter it to have the order you said you wanted. – Dan Lenski Sep 02 '14 at 18:15
  • 1
    In order to handle groups in larger more complicated regexes, you can name groups, but those names are only accessible when you do a re.search pr re.match. From what I have read, findall has a fixed indices for groups returned in the tuple, The question is anyone know how those indices could be modified. This would help make handling of groups easier and intuitive. – ashwinjv Sep 02 '14 at 18:19
  • The documentaion here: https://docs.python.org/3.1/library/re.html#re.findall says you will get the list of tuples with groups, but does not talk about the indices of groups in that tuple. – ashwinjv Sep 02 '14 at 18:22
  • 1
    Ah, named groups are a separate issue (also not in your question). You are correct that `findall` returns only captured groups and ignores names; however you can simply use [`finditer`](https://docs.python.org/2/library/re.html#re.finditer) instead to return the match objects, by which you will be able to access named groups. – Dan Lenski Sep 02 '14 at 18:24
  • 1
    That sir, was what I was looking for. If you can add/modify your answer, I will accept it. Thanks – ashwinjv Sep 02 '14 at 18:27