0

I am trying to read emoticons in my sentences and assign the sentiment value into it. I have found a list of emoticons with its sentiment value and copy it to the CSV file with the emoticons Unicode value and sentiment value as shown below.

enter image description here

When I am trying to check whether the sentence has emoticons as below, it works:

if "\U0001f914" in sentence:
    print("in")

But when I try to loop through the created CSV file (emoticons and sentiment) and check whether the emoticons exist in the sentence, it doesn't work.

Below is my code:

Method 1-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')

    if senti_emoji_unicode in sentence:
        print("in")

Method 2-

for line in lines_emoji:
    senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
    senti_emoji_unicode = '"'+senti_emoji_unicode+'"'

    if senti_emoji_unicode in sentence:
        print("in")

Below is the updated full code as per the answers

file_name_emoji = os.path.dirname(os.path.abspath(__file__)) + '/emoji sentiment.csv'
fo_emoji = open(file_name_emoji, 'r', encoding='utf-8')
lines_emoji = fo_emoji.readlines()
fo_emoji.close()


for line in lines_emoji:
            
   senti_emoji_unicode, senti_emoji_score = line.strip().split(',')
            
   emoji = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
                
   score = float(senti_emoji_score)
            
   if emoji in sentence:       

       print('--------------------------------------')

I am getting the error as 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape. I have seen many posts related to this issue, like adding 'r' and changing ''. But these fixes cannot be apply in my scenario since I am using dynamic list. I have tried below scenario to set this with 'r'. But same error appears.

raw_s = "r'{0}'".format(senti_emoji_unicode)
raw = senti_emoji_unicode.encode('utf-8').decode('unicode_escape')
Kate Fernando
  • 381
  • 1
  • 4
  • 18
  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Show where the intermediate results differ from what you expected. – Prune Jun 29 '21 at 18:35
  • Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. Your posted code fails on input -- don't expect us to enter test data, or to build a test file. Instead, simply hard-code a test case that causes the problem. – Prune Jun 29 '21 at 18:35

1 Answers1

1

Your data file line is read and split into two multicharacter strings. The escape code is not evaluated and the decimal value is not a float. They must be converted.

Reproducible example:

lines = r'''
\U0001f602,0.221
\U00002764,0.746
'''.strip().splitlines()

for line in lines:
    print(line)

sentence = 'hello ❤'

for line in lines:
    emoji_string,score_string = line.split(',')
    emoji = emoji_string.encode('ascii').decode('unicode_escape')
    score = float(score_string)
    print(emoji,score,emoji in sentence)

Output:

\U0001f602,0.221
\U00002764,0.746
 0.221 False
❤ 0.746 True
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251