0

I am using Python to read a txt which contains the right single quotation mark: ’.

ord("’")
Out[46]: 8217
  1. How to type ’? I found that I can only type '.
  2. How to read ’ as '?
  3. My understanding between these 2 char:’ is a Unicode and ' is an ASCII? is that right?

http://www.fileformat.info/info/unicode/char/2019/index.html I'm reading the txt file using below code:

with open(text_path, 'r', encoding='utf-8') as f:
    transcript = f.read()
Lisa
  • 4,126
  • 12
  • 42
  • 71

1 Answers1

0

You could write a custom encode function that converts a utf-8 character to a ascii character specified in a look up table.

# -*- coding: utf-8 -*-
import io

def encode_file(filepath, conversion_table={}):

    ''' replaces utf-8 chars with specified equivalent ascii char'''

    with io.open(text_path, "r", encoding="utf-8") as f:
        transcript = f.read()

    new_transcript = ""
    for i in transcript:
        new_char = ""
        # append character if ascii
        try:
            new_transcript += i.encode("ascii")
        except UnicodeEncodeError:
            found_char = False
            for c in conversion_table:
                # replace utf-8 with custom ascii equivalent
                if i == unicode(c, encoding="utf-8"):
                    new_transcript += conversion_table[c]
                    found_char = True
            # no conversion found
            if found_char == False:
                new_transcript += "?"
    return new_transcript

text_path = "/path/to/file.txt"
conversion_table = {'ü':'u', 'ô':'o', 'é':'e', 'į':'i'}
print (encode_file(text_path, conversion_table))

For example, with a file that has contents my ünicôdé strįng yields my unicode string.

So, you could add '’':'\'' (or whatever conversion) to the conversion_table and it will do the replacement for you.

jackw11111
  • 1,457
  • 1
  • 17
  • 34
  • 1
    You could reduce the amount of code here by using `str.translate` and `s.encode('ascii', errors='replace')`. This [answer](https://stackoverflow.com/a/55727491/5320906) demonstrates using `str.translate`. – snakecharmerb Aug 02 '20 at 09:02
  • @snakecharmerb Thankyou, that looks like it would be a much more pythonic approach. – jackw11111 Aug 02 '20 at 09:23