3

I've looked at the corpus section of NLTK, but there doesn't seem to be a numbers corpus. I want to change word numbers into text. For example:

input: one thousand two hundred forty three output: 1243

input: second output: 2

input: five percent output: 0.05

jason
  • 3,811
  • 18
  • 92
  • 147

1 Answers1

2

There isn't. What you need to do is build off this Is there a way to convert number words to Integers? or someone else you find useful/easier to work with.

To start off you'll need regex to extract those strings of interest (i.e. one, two...) then replace using the code above.

The first example you've given will be the easiest of the three, the last example is just divide that number by 100 since the output is actually an integer. The second one will be a little tricky as you'll have to modify the code or possibly create a whole new function.

AFAIK, there is no module that will parse the whole text for that.

Another possibility, as I looked further into this, is to use CD tagging from Tree Parser to help identify numbers. But you'll still need a function similar to the one mentioned above.

Community
  • 1
  • 1
Leb
  • 15,483
  • 10
  • 56
  • 75