4

I have my input data utf8 encoded.

I'm applying a regular expression on the input to find everything before the comma.
However my regex returns None, though I can see the comma visually.

What's wrong with it?
I tested if ',' in MyString, which works fine.

Here is my input data:

 ID            MyString
765427       Units G2 and G3, kings Drive
207162       Unit 5/165,Elizabeth Palace
47568        Unit 766 - 767 Gate 7,Jacks Way,
15498        Unit F, Himalayas Street,

As per my regex - re.search(r".*?,", s['MyString']),
I expect my output to be:

 ID            MyString
765427       Units G2 and G3,
207162       Unit 5/165,
47568        Unit 766 - 767 Gate 7,
15498        Unit F,

But what I am getting is:

 ID            MyString
765427       Units G2 and G3,
207162       None
47568        Unit 766 - 767 Gate 7,
15498        None

Please correct if my understanding is right on the regex. Else what's wrong. I can't figure out whats wrong with this.

SherylHohman
  • 16,580
  • 17
  • 88
  • 94
ds_user
  • 2,139
  • 4
  • 36
  • 71
  • 4
    From your title, are you looking to just split your string on the first comma? If that is all you are trying to do, you can just take your string and call [split](https://docs.python.org/3/library/stdtypes.html#str.split) on the string for ',', but provide the second argument to `split` as the 'maxsplit', which will only split the string that many times. so -> `s.split(',', maxsplit=1)`, then you will be left with a list, and you will just need to get the first element in the list. – idjaw Jun 13 '17 at 23:59
  • Ultimately, [this](https://stackoverflow.com/questions/30636248/split-a-string-only-by-first-space-in-python) answer but you want to pass a comma instead of a space – idjaw Jun 14 '17 at 00:01
  • 1
    Thanks for your help – ds_user Jun 14 '17 at 00:12

1 Answers1

6

As @idjaw suggested above, an easier way to accomplish this is to use the split() function:

my_string = 'Unit 5/165,Elizabeth Palace'
ans = my_string.split(',', 1)[0]  # maxsplit = 1; 
print ans  

Result:
Unit 5/165

You could even get away with leave off the maxsplit=1 parameter, in this case:

ans = my_string.split(',')[0]

Also, note that while not technically an error, it is considered best practice to reserve first-letter capitalization of variable names for classes. See What is the naming convention in Python for variable and function names? and PEP8 variable naming conventions.

regex solution:
I noticed that in your example results, when there was a space following the comma (in the string to be analyzed), you got the expected result.
However, when there was no space following the comma, your regex returned "None".

try using the regex pattern (.*?,) rather than .*?,

Here are a couple online tools for debugging and testing regex expressions:
http://pythex.org/
https://regex101.com/
(has an option to generate the code for you, though it may be more verbose than necessary)

SherylHohman
  • 16,580
  • 17
  • 88
  • 94