3

How do I train to find the occurrence of a US state, when this set is constrained to 50 states because we need a large amount of data (say 1000 rows) to train a certain label.

dwayneJohn
  • 919
  • 1
  • 12
  • 30
  • 1
    there is a lot of info in this answer (https://stackoverflow.com/a/59959188/8243797) related to extraction of cities from sentences, the answer more or less explains a whole problem start to end, which might help you. – SajanGohil May 11 '20 at 08:28

1 Answers1

2

I think it depends on the task you're trying to solve here. Do you need to differentiate if some two-letter combinations are US state name or not? Just a simple set of names would work? Or you're trying to build some kind of simple NER (https://en.wikipedia.org/wiki/Named-entity_recognition) for state names? This way, you can also start with simple matching by regex, but if you want to train some model later - you have much more than 50 examples. Your dataset won't be just "is these two letters represent state or not", but many sentences, which have state names somewhere in them, or not at all.

Rayan Ral
  • 1,862
  • 2
  • 17
  • 17