-1

Hi guys so just wondering how do I keep a word from a list to be deleted if it doesn't contain any alphabetical character but will not be deleted if it contains any alphabetical character followed by any kind of special character or number

say that I have a list of sentence/words which is the following:

['python','abc123','@@','!!','12345abc#','hello@','141351351','123abc']

the desired output will be:

['python','abc123','','','12345abc#','hello@','','123abc']

what i have tried is the following:

data = ['python','abc123','@@','!!','12345abc#','hello@','141351351','123abc']
regex = re.compile('[^a-zA-Z0-9&._-]')
filtered= [regex.sub('', each_data) for each_data in data ]

which result in this:

['python', 'abc123', '12345abc', 'hello', '141351351', '123abc']

which delete all the special character which is wrong i'm not sure how to fix this, I'm still thinking about how to solve it using regex, I had also tried with nltk and can't seem to find the answer either. Any kind of hint or help will be appreciated

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
user3646742
  • 199
  • 12

2 Answers2

3

I am not sure I understand the text of your question, but the sample input-output you give can be handled as:

[item if re.search('(?i)[a-z]', item) else '' for item in your_list]

Your example:

your_list = ['python','abc123','@@','!!','12345abc#','hello@','141351351','123abc']

import re
[item if re.search('(?i)[a-z]', item) else '' for item in your_list]

# output:
# ['python', 'abc123', '', '', '12345abc#', 'hello@', '', '123abc']
tzot
  • 92,761
  • 29
  • 141
  • 204
1

You can filter out any items from your list that do not contain a letter using

["" if not any(c.isalpha() for c in x) else x for x in l]

Using re library, you may use a pattern like [^\W\d_] to match any Unicode letter (or [A-Za-z] to only handle ASCII letters), and you can use

import re
print( ["" if not re.search(r'[^\W\d_]', x) else x for x in  l] )

However, a non-regex approach seems already working for you.

NOTE: "any alphabetical character followed by any kind of special character or number" can be matched with a [^\W\d_][\W\d_] ([A-Za-z][^A-Za-z] for ASCII only) pattern, a letter followed by a non-letter.

See a Python demo:

import re
l = ['python','abc123','@@','!!','12345abc#','hello@','141351351','123abc']
print( ["" if not re.search(r'[^\W\d_]', x) else x for x in  l] )
# => ['python', 'abc123', '', '', '12345abc#', 'hello@', '', '123abc']
print( ["" if not any(c.isalpha() for c in x) else x for x in  l] )
# => ['python', 'abc123', '', '', '12345abc#', 'hello@', '', '123abc']
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563