Slightly modified strings in list problem, when checking against a sorted list of strings.
I am checking strings representing content from some files. And I have a list of certain strings I check against, however, sometimes, the same string can have an asterisk(*) appended to the end, this resulting in slightly modified duplicates in this list.
Currently:
# This is minimal very minimal code example:
for _word in sorted(['Microsoft','Microsoft*']):
print(_word)
Desired:
for _word in sorted(['Microsoft']):
print(_word)
# But still be able to check for 'Microsoft*' without having duplicates in the list.
Final solution:
import os
import sys
if __name__ == '__main__':
default_strings = sorted([
'microsoft',
'linux',
'unix',
'android'
])
text = str("""
Microsoft* is cool, but Linux is better.
""")
tokens = text.split(" ")
for token in tokens:
token = token.lower()
if token.endswith('*'):
token = token[:-1]
if token in default_strings:
print(token)
EDIT: If there's a better way, please let me know. Thanks a lot to everyone that participated and responded.