2

Hi and many thanks in advance!

I'm working on a Python script handling utf-8 strings and replacing specific characters. Therefore I use msgText.replace(thePair[0], thePair[1]) while looping trough a list which defines unicode characters and their desired replacement, as shown below.

theList = [
    ('\U0001F601', '1f601.png'),
    ('\U0001F602', '1f602.png'), ...
]

Up to here everything works fine. But now consider a csv file which contains the characters to be replaced, as shown below.

\U0001F601;1f601.png
\U0001F602;1f602.png
...

I miserably failed in reading the csv data into the list due to the escape characters. I read the data using the csv module like this:

with open('Data.csv', newline='', encoding='utf-8-sig') as theCSV:
    theList=[tuple(line) for line in csv.reader(theCSV, delimiter=';')]

This results in pairs like ('\\U0001F601', '1f601.png') which evade the escape characters (note the double backslash). I tried several methods of modifying the string or other methods of reading the csv data, but I was not able to solve my problem. How could I accomplish my goal to read csv data into pairs which contain escape characters?

Pontis
  • 343
  • 3
  • 15
  • See: http://stackoverflow.com/a/22601369/2896976 – Jessie Mar 14 '17 at 23:23
  • From the information you're giving, I am not sure why you'd want to go the detour via csv and not just find a generic function to turn any \U000XXXXX character into xxxxx.png ? – trs Mar 14 '17 at 23:28
  • @trs Unfortunately the pattern is not always the same. (The csv data contains several hundreds of lines.) – Pontis Mar 14 '17 at 23:34
  • @user2896976 I cannot figure out how to use `.encode().decode('unicode-escape')` for the list of tuples. – Pontis Mar 14 '17 at 23:38
  • If your data is always just a pair you can do: `theList=[(line[0].encode().decode('unicode-escape'), line[1]) for line in csv.reader(theCSV, delimiter=';') if line]` Which will encode the first element. I also added an `if` in there to skip blank lines – Jessie Mar 14 '17 at 23:41
  • @user2896976 Thanks a lot, works as desired! – Pontis Mar 15 '17 at 00:26
  • The issue is that the CSV file shown here **does not** contain the characters to be replaced. It contains values that start with a backslash, uppercase U etc. – Karl Knechtel Jan 09 '23 at 05:51

1 Answers1

1

I'm adding the solution for reading csv data containing escape characters for the sake of completeness. Consider a file Data.csv defining the replacement pattern:

\U0001F601;1f601.png
\U0001F602;1f602.png

Short version (using list comprehensions):

import csv

# define replacement list (short version)
with open('Data.csv', newline='', encoding='utf-8-sig') as csvFile:
    replList=[(line[0].encode().decode('unicode-escape'), line[1]) \
        for line in csv.reader(csvFile, delimiter=';') if line]
csvFile.close()

Prolonged version (probably easier to understand):

import csv

# define replacement list (step by step)
replList=[]
with open('Data.csv', newline='', encoding='utf-8-sig') as csvFile:
    for line in csv.reader(csvFile, delimiter=';'):
        if line:  # skip blank lines
            replList.append((line[0].encode().decode('unicode-escape'), line[1]))
csvFile.close()
Pontis
  • 343
  • 3
  • 15