2

As a follow up to this question, I have an expression such as this:['(', '44', '+(', '3', '+', 'll', '))'] which was created by using re.findall('\w+|\W+',item) method, however within this list of strings, there are two errors. One is the '+(' and the other is the '))'.

Is there a pythonic way that I could split just the operators such that the list would be something like ['(', '44', '+','(', '3', '+', 'll', ')',')'].

(keep the digits/letters together, separate the symbols)

Thanks

Community
  • 1
  • 1

4 Answers4

1

You want to split characters of grouped non-alphanumerical characters.

I would create a 1-list item if the item is ok (alphanumerical) or a list of characters if the item is a sequence of symbols.

Then, I'd flatten the list to get what you asked for

import itertools

l = ['(', '44', '+(', '3', '+', 'll', '))']
new_l = list(itertools.chain.from_iterable([x] if x.isalnum() else list(x) for x in l))
print(new_l)

result:

['(', '44', '+', '(', '3', '+', 'll', ')', ')']

EDIT: actually you could link your 2 questions into one answer (adapting the regex answer of the original question) by not grouping symbols in the regex:

import re
lst = ['z+2-44', '4+55+((z+88))']
print([re.findall('\w+|\W', s) for s in lst])

(note the lack of + after \W) and you get directly:

[['z', '+', '2', '-', '44'], ['4', '+', '55', '+', '(', '(', 'z', '+', '88', ')', ')']]
Community
  • 1
  • 1
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

Short solution using str.join() and re.split() functions:

import re
l = ['(', '44', '+(', '3', '+', 'll', '))']
new_list = [i for i in re.split(r'(\d+|[a-z]+|[^\w])', ''.join(l)) if i.strip()]

print(new_list)

The output:

['(', '44', '+', '(', '3', '+', 'll', ')', ')']
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

An alternative would be to change the regex in order to keep the non-alphanumeric characters separate :

import re
lst = ['z+2-44', '4+(55+z)+88']
[re.findall('\w+|\W', s) for s in lst]

#[['z', '+', '2', '-', '44'], ['4', '+', '(', '55', '+', 'z', ')', '+', '88']]
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
1

Try this:

import re
lst = ['z+2-44', '4+(55+z)+88']
[re.findall('\w+|\W', s) for s in lst]

May be it helps to others.

bob marti
  • 1,523
  • 3
  • 11
  • 27