How to read emoticons in a CSV file?

Question

I am trying to read emoticons in my sentences and assign the sentiment value into it. I have found a list of emoticons with its sentiment value and copy it to the CSV file with the emoticons Unicode value and sentiment value as shown below.

When I am trying to check whether the sentence has emoticons as below, it works:

if "\U0001f914" in sentence:
    print("in")

But when I try to loop through the created CSV file (emoticons and sentiment) and check whether the emoticons exist in the sentence, it doesn't work.

Below is my code:

Method 1-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')

    if senti_emoji_unicode in sentence:
        print("in")

Method 2-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
    senti_emoji_unicode = '"'+senti_emoji_unicode+'"'

    if senti_emoji_unicode in sentence:
        print("in")

Below is the updated full code as per the answers

file_name_emoji = os.path.dirname(os.path.abspath(__file__)) + '/emoji sentiment.csv'
fo_emoji = open(file_name_emoji, 'r', encoding='utf-8')
lines_emoji = fo_emoji.readlines()
fo_emoji.close()


for line in lines_emoji:
            
   senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
            
   emoji = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
                
   score = float(senti_emoji_score)
            
   if emoji in sentence:       

       print('--------------------------------------')

I am getting the error as 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape. I have seen many posts related to this issue, like adding 'r' and changing ''. But these fixes cannot be apply in my scenario since I am using dynamic list. I have tried below scenario to set this with 'r'. But same error appears.

raw_s = "r'{0}'".format(senti_emoji_unicode)
raw = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')

Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Show where the intermediate results differ from what you expected. — Prune, Jun 29 '21 at 18:35
Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. Your posted code fails on input -- don't expect us to enter test data, or to build a test file. Instead, simply hard-code a test case that causes the problem. — Prune, Jun 29 '21 at 18:35

score 1 · Accepted Answer · answered Jun 29 '21 at 18:51

1

Your data file line is read and split into two multicharacter strings. The escape code is not evaluated and the decimal value is not a float. They must be converted.

Reproducible example:

lines = r'''
\U0001f602,0.221
\U00002764,0.746
'''.strip().splitlines()

for line in lines:
    print(line)

sentence = 'hello ❤'

for line in lines:
    emoji_string,score_string = line.split(',')
    emoji = emoji_string.encode('ascii').decode('unicode_escape')
    score = float(score_string)
    print(emoji,score,emoji in sentence)

Output:

\U0001f602,0.221
\U00002764,0.746
 0.221 False
❤ 0.746 True

answered Jun 29 '21 at 18:51

Mark Tolonen

166,664
26
169
251

Thanks Mark! But still I am getting the error ''unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape '. Only difference here is I am using the csv file reading. Please refer the updated question. – Kate Fernando Jul 04 '21 at 05:39
1

@KateFernando some of your escape codes were invalid. \U escapes require 8 digits. – Mark Tolonen Jul 04 '21 at 07:15
without U escape, it should have 8 digits, right? – Kate Fernando Jul 04 '21 at 07:54
Thank you so much! issue is with the 8 digit problem. You saved me 3 days of wasted time. – Kate Fernando Jul 04 '21 at 08:28

How to read emoticons in a CSV file?

1 Answers1