0

I have trained a masked language model using my own dataset, which contains sentences with emojis (trained on 20,000 entries).

Now, when I make predictions, I want emojis to be in the output, however, most of the predicted tokens are words, so I think that the emojis are right at the bottom of the list somewhere, as they must be less frequent tokens compared to the words.

So far, this is my output - you can see that one emoji has been predicted, but the rest of the predictions are words:

mask_filler("I am so good today, <mask>", top_k=5)

[{'score': 0.2953376770019531,
  'sequence': 'I am so good today, friend',
  'token': 72,
  'token_str': 'friend'},
 {'score': 0.18523386120796204,
  'sequence': 'I am so good today ',
  'token': 328,
  'token_str': ''},
 {'score': 0.1431082785129547,
  'sequence': 'I am so good today, mate',
  'token': 2901,
  'token_str': 'mate'},
 {'score': 0.13269349932670593,
  'sequence': 'I am so good today, father',
  'token': 4,
  'token_str': 'father'},
 {'score': 0.030341114848852158,
  'sequence': 'I am so good today, mother',
  'token': 44660,
  'token_str': 'mother'},

Therefore, I was wondering if there is any code or functions that can filter the predictions, so that there are only emojis in the output, removing any predicted tokens that are words.

I have got one emoji to show in the output, but I think the rest of the emojis are less frequent tokens, so they are not appearing at the top when I make predictions.

So, is it possible to filter out the word tokens in favour of only emojis?

Thanks.

1 Answers1

0

yes, you should try it once- i am writing hints only.

if output is not contains char:

print(output)

or

also you can use regex to create pattern for emojis and filter out them. plz,check it once ,it might be helpful for you. removing emojis from a string in Python

  • Hi - thanks for the response. Though, I am using the Hugging Face library, so would this work with a trained model? –  Jul 25 '21 at 16:43