0

I'm trying to check the validity of comma-separated strings in python. That is, it's possible that the strings contain mistakes whereby there are more than one comma used.

Here is a valid string:

foo = "a, b, c, d, e"

This is a valid string as it is comma-delimited; only one comma, not several or spaces only.

Here is an invalid string:

invalid = "a,, b, c,,,,d, e,,; f, g"

The invalid string is invalid because (1) it uses more than one comma and (2) it also uses a semicolon ;.

What would be the most effective way to check that the strings are valid?

My first attempt was to try something like:

def check_valid_string(input_string):
    if ",," in input_string or ";" in input_string:
        return "Not valid" ## or False
    else:
        return "Valid" ## or True

however, it's not clear that this would catch all possible invalid strings. It's also not clear to me that this approach is the most computationally efficient (i.e. quick).

EB2127
  • 1,788
  • 3
  • 22
  • 43
  • Please post the code that is giving you trouble. We expect a valid attempt before you post for help. "What is the best way?" is usually a wrapper around "give me code/design to solve this problem" -- which is off-topic for Stack Overflow. – Prune Sep 06 '20 at 00:18
  • @Prune Added code with an edit, and explained why this has given me pause. Thanks – EB2127 Sep 06 '20 at 00:27
  • Your code doesn't catch all cases, unless you're guaranteed that consecutive commas will also be contiguous. Otherwise, simply `a, ,c` will trip your algorithm. We need a *specification* of your valid strings, not just two examples. What about other punctuation? What t about a *missing* comma, such as `a b`? – Prune Sep 06 '20 at 01:20
  • @Prune You are correct that missing commas or situations like `, ,` would also be invalid. I will try to edit this in the question above. – EB2127 Sep 06 '20 at 01:33
  • Yup. Unfortunately, this invalidates the two answers already given. The problem is that you are now within the realm of a simple grammar-based solution. You *can* do this with `regex`, but the expression has to work instead from the standpoint of that simple grammar. Note that you *also* need to specify this well enough to identify your legal tokens (the strings between the commas). This is tantamount to writing your grammar, and is most of the way to making an acceptable regex. – Prune Sep 06 '20 at 01:39
  • 1
    This might be a duplicate question then, though I need to better understand these answers: https://stackoverflow.com/questions/1396084/regex-for-comma-delimited-list – EB2127 Sep 06 '20 at 01:43
  • Right; that would be a good starting point. Depending on what your list elements might be, your request might be a duplicate. – Prune Sep 06 '20 at 02:01
  • @Prune I guess this isn't a duplicate, as this is specifically for a python string... – EB2127 Sep 06 '20 at 02:15
  • What differentiates your "python string" from any other comma-delimited string? How does the Python context make the generic answers not apply to your case? – Prune Sep 06 '20 at 02:22
  • @Prune I think it is just my confusion with the python `re` library. `re.match("(\d+)(,\s*\d+)*", input)` does not give the intended results – EB2127 Sep 06 '20 at 14:37
  • @Prune In any sense, I added the answer below. – EB2127 Sep 06 '20 at 15:20

2 Answers2

2

It appears the best way to accomplish this is with regex:

Here is a valid string:

valid = "a, b, c, foo, bar, dog, cat"

Here are various invalid strings:

## invalid1 is invalid as it contains multiple , i.e. `,,` and :
invalid1 = "a,, b, c,,,,d, e,,; f, g" 

## invalid2 is invalid as it contains `, ,`
invalid2 = "a b, ,c, d, e"

## invalid3 is invalid as it contains spaces between strings
invalid3 = "a, b, d, elephant, f g"

Here is the regex to check whether the string is valid:

import re
pattern = re.compile(r"^(\w+)(,\s*\w+)*$")

def check_valid(input_string):
    if pattern.match(input_string) == None:
        return "Invalid"
    else:
        return "Valid"

Here is the function:

>>> check_valid(invalid)
'Invalid'
>>> check_valid(invalid2)
'Invalid'
>>> check_valid(invalid3)
'Invalid'
>>> check_valid(valid)
'Valid'
EB2127
  • 1,788
  • 3
  • 22
  • 43
1

Here you have some way to check if it's valid:

def is_valid(comma_sep_str):
  if ';' in comma_sep_str or ',,' in comma_sep_str:
    return 'Not valid'
  else:
    return 'Valid'

myString1 = "a,, b, c,,,,d, e,,; f, g"
myString2 = "a, b, c, d, e"

print(is_valid(myString1))
print(is_valid(myString2))

PS: Maybe is not the most effective but it will check whether is valid or not. Note that in all wrong cases you will always have at least one of this two: ",," or ";".

emichester
  • 189
  • 9