1

I have a list of lists and I am trying to remove all non alphabetic characters.

I tried using isalpha()

data = [
    ['we', '\n\n', 'will', 'pray', 'and', 'hope', 'for', 'the', 'best'], 
    ['though', '10/3/2011', 'it', 'may', 'not', '\n\n', 'make', 'landfall', 'all', 'week', '2 000 €', 'if', 'it', 'follows', 'that', '•', 'track'],
    ['heavy', 'rains', 'capable', 'of', 'producing', 'life threatening', 'flash', '•', 'floods', '\n\n', 'are', 'possible'],
]

new_data = ''.join([i for i in data if i.isalpha()])

Expected output:

['we will pray and hope for the best', 
 'though it may not make landfall all week if it follows that track',
 'heavy rains capable of producing life threatening flash floods are possible']

My output:

AttributeError: 'list' object has no attribute 'isalpha'
Brad Solomon
  • 38,521
  • 31
  • 149
  • 235
ladybug
  • 119
  • 1
  • 6

1 Answers1

5

Since you have nested lists (and strings within), you need to use nested list comprehensions for this (the most outer for the sublists, the inner for the strings, and the most-inner for the characters), and join with the ' ' to get the result you want:

data = [['we', '\n\n', 'will', 'pray', 'and', 'hope', 'for', 'the', 'best'], 
        ['though', '10/3/2011', 'it', 'may', 'not', '\n\n', 'make', 'landfall', 'all', 'week', '2 000 €', 'if', 'it', 'follows', 'that', '•', 'track'],
        ['heavy', 'rains', 'capable', 'of', 'producing', 'life threatening', 'flash', '•', 'floods', '\n\n', 'are', 'possible']]

new_data = [' '.join(i for i in sublist if all(j.isalpha() or j == ' ' for j in i)) for sublist in data]

print(new_data)

Output:

['we will pray and hope for the best',
 'though it may not make landfall all week if it follows that track',
 'heavy rains capable of producing life threatening flash floods are possible']

And as it was pointed out to me by @RonaldAaronson, in case you want to filter out the non-alphanumeric (+space) characters from each string, and not completely ignore strings with some bad characters in them, you can use this instead:

data = [['we', '\n\n', 'will', 'pray', 'and', 'hope', 'for', 'the', 'best.'], 
        ['though', '10/3/2011', 'it', 'may', 'not', '\n\n', 'make', 'landfall', 'all', 'week', '2 000 €', 'if', 'it', 'follows', 'that', '•', 'track?'],
        ['heavy', 'rains', 'capable', 'of', 'producing', 'life threatening', 'flash', '•', 'floods', '\n\n', 'are', 'possible!']]

new_data = [
  ' '.join(x.strip() for x in (''.join(c for c in s if c.isalpha() or c == ' ') for s in sl) if x) for sl in data
]

print(new_data)

Output:

['we will pray and hope for the best',
 'though it may not make landfall all week if it follows that track',
 'heavy rains capable of producing life threatening flash floods are possible']
DjaouadNM
  • 22,013
  • 4
  • 33
  • 55
  • 1
    So, if in the first list `'best' `were `'best.'`, it appears you would remove the whole string. Is that really the desired result? – Booboo Sep 01 '19 at 22:37
  • @RonaldAaronson Nice thought, I guess this proves an ambiguity on the desired result, since the OP didn't give such an example, but I will include that. – DjaouadNM Sep 01 '19 at 22:38
  • 1
    The OP said, "Remove all non-alphabetic characters" not "Remove all strings that contain at least one non-alphabetic character" but then admittedly went on to provide a not-so-good example that certainly would be confusing. – Booboo Sep 01 '19 at 22:46