-1

I'm very new to regex and I'm trying to find instances of any of the following:

", FL,"
" FL "
" FL,"
", FL "

where FL can be any US state abbreviation, upper or lower case. (I know some of the above probably overlap.) Which leads me to the second part of my question: How do I write regex code that searches for all of the above, and for every US state abbreviation? (Why am I being asked to edit the question to be more specific? What's the point of making two posts when both are very related to each other? You may answer one or both questions if you wish. @The Fourth Bird)

mrphysics
  • 41
  • 5

3 Answers3

1

Check this expression out:

(?<!\w)(?:,\s+)?(?:A[LKZR]|C[AOT]|DE|FL|GA|HI|I[ADLN]|K[SY]|LA|M[EDAINSOT]|N[EVHJMYCD]|O[HKR]|PA|RI|S[CD]|T[NX]|UT|V[AT]|W[AIVY]),?\s?(?!\w)

Regex Demo

This expression is quite long, so let me explain by parts:

 (?<!\w)           # Make sure state not preceeded by text
 (?:,\s+)?         # Optionally match if followed by a , and several whitespace
 (?:A[LKZR]        # Start matching every combination of US state. Use -i flag to ignore case.
 |
  C[AOT]           # E.g. this will match CA/CO/CT
 |                 # or
  DE               # DE
 |                 # or
  FL               # etc...
 |
  GA
 |
  HI
 |
  I[ADLN]
 |
  K[SY]
 |
  LA
 |
  M[EDAINSOT]
 |
  N[EVHJMYCD]
 |
  O[HKR]
 |
  PA
 |
  RI
 |
  S[CD]
 |
  T[NX]
 |
  UT
 |
  V[AT]
 |
  W[AIVY])
 ,?               # Match if , at the end
 \s?              # Match if whitespace at the end (was in your examples)
 (?!\w)           # Make sure state is not succeeded by letters

Dont forget to set the -i flag, which will ignore letter case.

vs97
  • 5,765
  • 3
  • 28
  • 41
  • I have two questions/comments: (1) Doesn't `(?:,\s+)?` require at least one space following the comma? So it only matches `',FL'` without `(?:,\s+)?` matching the `,` (the group was optional). Which means you will also match `FL` in `AFLCIO` and that can't be desirable. (2) Which state's abbreviation is AE? Is this a territory? There are other "states" I don't recognize. – Booboo Sep 04 '19 at 22:51
  • @RonaldAaronson Thank you for your comment, fixed both points. – vs97 Sep 04 '19 at 23:08
  • @vs97 Thanks so much!! I'm trying to figure out how to set the -i flag. Can you teach me how? – mrphysics Sep 05 '19 at 18:50
  • @mrphysics If you are using re.compile, then e.g. - re.compile(r'*expression-here*', re.I (capital i)). Also this answer might be useful - https://stackoverflow.com/questions/500864/case-insensitive-regular-expression-without-re-compile – vs97 Sep 05 '19 at 19:19
  • @vs97 re.IGNORECASE doesn't work. I have: text = "...." match = re.search(r'expression', text, re.IGNORECASE) if match: text = text.replace(match.group(), '[State]') The change is not made if I have something like New York, ny – mrphysics Sep 05 '19 at 19:37
  • 1
    Figured it out. Just add (?i) at the very beginning of the regex code. – mrphysics Sep 05 '19 at 20:44
0

/, (PA|FL|...)/i

I'll leave the remaining 48 states as an exercise for the reader ;)

fringd
  • 2,380
  • 1
  • 18
  • 13
0
/(?: (?:A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])(?:,| ))|(?:, (?:A[KLRZ]|C[AOT]|D[CE]|FL|GA|HI|I[ADLN]|K[SY]|LA|M[ADEINOST]|N[CDEHJMVY]|O[HKR]|P[AR]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])(?:,| ))/i
LTPCGO
  • 448
  • 1
  • 4
  • 15