0

I want to get a list of all characters in a text file except for

[A-Z], [0-9], '|', '~'. 

Appreciate your help.

cs95
  • 379,657
  • 97
  • 704
  • 746
rg1105
  • 23
  • 4
  • Have you tried something on your own? If yes please post it here. – Shrikant Shete Aug 03 '17 at 01:30
  • Edited to make a canonical title. Even if the question shows lack of effort, it should be of use to future readers. – cs95 Aug 03 '17 at 01:33
  • [This](https://stackoverflow.com/questions/2991901/regular-expression-any-character-that-is-not-a-letter-or-number) stackoverflow question might help.. – Shrikant Shete Aug 03 '17 at 01:33

1 Answers1

2

Step 1: Read in your file and convert it to a set of chars.

charset = set(open('file.txt').read())

Step 2: Join it back to a string with str.join for the next step.

chars = ''.join(charset)

Step 3: Using regex, substitute all characters that you do not want with '', then display

import re
filtered_chars = re.sub('[A-Z0-9|~]', '', chars)

print(set(filtered_chars))

Other references (similar to your use case but not quite):

  1. List of all unique characters in a string?

  2. How to get all unique characters in a textfile? unix/python

  3. Regular Expression: Any character that is NOT a letter or number

cs95
  • 379,657
  • 97
  • 704
  • 746