0

I know it's possible to operate a list comprehension over a "total" combinations of two lists. For example:

letters = ['A', 'B', 'C']
numbers = [1,2,3]

def concat(letter, number):
    return letter + str(number)

Can be combined using:

combinations = [concat(letter, number) for letter in letters for number in numbers]

Which has the same output as

combinations = []
for letter in letters:
    for number in numbers:
        combinations.append(concat(letter, number))

Producing:

['A1', 'A2', 'A3', 'B1', 'B2', 'B3', 'C1', 'C2', 'C3']

I'm trying to clean a defined set of characters from a list of strings. For instance:

unwanted = ['$', '@']
raw_lines = [
    'phra$se1',
    'phr@ase2'
]

clean_lines = []

for line in raw_lines:
    for char in unwanted:
        line = line.replace(char, '')
    clean_lines.append(line)

outputs:

['phrase1', 'phrase2']

I want to refactor it using a list comprehension, but I'm failing as it produces all possible combinations of removed characters:

clean_lines = [line.replace(char, '') for char in unwanted for line in raw_lines]

outputs

['phrase1', 'phr@ase2', 'phra$se1', 'phrase2']

I got the reason it occurs, it's obvious after thinking about the numbers and letters combinations. List comprehension writing the for as:

clean_lines = []
for line in raw_lines:
    for char in unwanted:
        clean_lines.append(line.replace(char, ''))

Which also outputs

['phrase1', 'phra$se1', 'phr@ase2', 'phrase2']

Is there a workaround for accessing the "outer loop" when using list comprehension?

nluizsoliveira
  • 355
  • 1
  • 9
  • 2
    I don't think there is a way to access the outer loop in that way that you're suggesting here. I would recommend using `str().translate()` https://stackoverflow.com/questions/3939361/remove-specific-characters-from-a-string-in-python – Charmander35 Aug 23 '22 at 14:50
  • 1
    A simpler cleanup method: `mapping = dict.fromkeys(map(ord, unwanted))` `[line.translate(mapping) for line in raw_lines]` – Mechanic Pig Aug 23 '22 at 14:51
  • 1
    @MechanicPig: I'd recommend using `str.maketrans` for the purpose; it's designed to produce the correct structure for `str.translate` with less complication (and conceivably some optimizations, though right now I believe it's equivalent to what you did). `mapping = str.maketrans('', '', ''.join(unwanted))` – ShadowRanger Aug 23 '22 at 14:56
  • Thanks a lot for the answers @Charmander35, Mechanic Pig and ShadowRanger. I simplified my problem (which was replacing substrings) in order to make a cleaner example, but i'm convinced it's not possible with comprehensions. Moreover, after reading this thread https://discuss.python.org/t/making-str-replace-accept-lists/4144/6 i'm convinced replace is problematic with substrings, thererefore I accepted the regex answer – nluizsoliveira Aug 23 '22 at 15:24

2 Answers2

1

You may find the re module helpful for this:

For example:

import re
unwanted = ['$', '@']
raw_lines = [
    'phra$se1',
    'phr@ase2'
]
expr = f'[{"|".join(re.escape(c) for c in unwanted)}]'
clean_lines = [re.sub(expr, '', line) for line in raw_lines]
print(clean_lines)

Output:

['phrase1', 'phrase2']
DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • You definitely want to `re.escape` all the entries in `unwanted`; otherwise, terrible things will happen if one of the unwanted characters happens to be a regex special character. – ShadowRanger Aug 23 '22 at 14:58
  • After reading https://discuss.python.org/t/making-str-replace-accept-lists/4144/2 and the issues with replacing substrings that may overlap i'm convinced regex is the best solution, therefore will mark yours as accepted – nluizsoliveira Aug 23 '22 at 15:12
  • @nluizsoliveira: For your specific case (all things being replaced are single characters), `str.maketrans`+`str.translate` might be a better solution (it's also one-pass, involves no imports, and you don't have to worry about escaping stuff), but yeah, `str.replace` has no place in solving this specific problem. – ShadowRanger Aug 23 '22 at 16:09
1

I would do it like this:

unwanted = ['$', '@']
raw_lines = [
    'phra$se1',
    'phr@ase2'
]
clean_lines = ["".join([ch for ch in line if ch not in unwanted]) 
               for line in raw_lines]
Robert Lujo
  • 15,383
  • 5
  • 56
  • 73
  • Thank you a lot for your answer! It works fine, but I'll mark the other one as accepted as after reading https://discuss.python.org/t/making-str-replace-accept-lists/4144/2 i'm convinced using regex is better if I had substrings instead of characters – nluizsoliveira Aug 23 '22 at 15:13