-1

I am doing sentiment analysis on tweets. Most of the tweets contains short words and i want to replace them as original/full word.

Suppose that tweet is:

I was wid Ali.

I want to convert:

wid -> with

Similarly

wud -> would
u -> you
r -> are

i have 6000 tweets in which there are lots of short words. How i can replace them ? is there any library available in python for this task? or any dictionary of shorts words available online?

i read answer of Replace appostrophe/short words in python Question but it provides dictionary of appostrophe only.

Currently i am using NLTK but this task is not possible with NLTK.

billal
  • 113
  • 1
  • 12
  • 2
    I doubt there's a dictionary for this, as those words aren't standardized in any way. I'd write your own dictionary if I were you. – Jacob G. May 07 '18 at 18:38
  • [Mirriam Webster has a definition for wid.](https://www.merriam-webster.com/dictionary/wid) It is arguably a valid english word. [The same is true of wud.](https://www.merriam-webster.com/dictionary/wud) If you find a way to use context to interpret when a word was typed incorrectly, particularly when the dictionary has an entry for that word, you should write a paper on it and publish it, I'm sure the NLP community would feel that that was an incredible achievement. – ChootsMagoots May 07 '18 at 18:43
  • @JacobG. then it will take alot of time. what smart way you suggest me for this ? – billal May 07 '18 at 18:44
  • @ChootsMagoots okay above two have definition but i said there are lot of other words too .. like U r . i want to replace U -> you similarly r -> are .. etc – billal May 07 '18 at 18:51
  • SO is not great at design questions. You would do better thinking about what your requirements are and just writing some code. Example: assume that we don't want to check _all_ the words in a dataset, just those under a certain size. And we don't want to check them for perfect English, just "correct" some well-known customizations you store in a (Python) dictionary. If this is not good enough, then you need to take a step back and think about your design, requirements, APIs, etc. SO is not the place for that conversation. –  May 07 '18 at 18:51
  • I'm saying, it's very very difficult for a computer to tell a mistyping of a word from an actual word. This is just noise. I doubt you're going to get anywhere. – ChootsMagoots May 07 '18 at 18:55

1 Answers1

3

It seems like the following website has the necessary dictionary: https://www.noslang.com/search You can send request from you python code and get back the translation.

Here is the working code:

import requests
prefixStr = '<div class="translation-text">'
postfixStr = '</div'

slangText = 'I was wid Ali.'

r = requests.post('https://www.noslang.com/', {'action': 'translate', 'p': 
slangText, 'noswear': 'noswear', 'submit': 'Translate'})
startIndex = r.text.find(prefixStr)+len(prefixStr)
endIndex = startIndex + r.text[startIndex:].find(postfixStr)
print(r.text[startIndex:endIndex])
Hezi Shahmoon
  • 388
  • 1
  • 7