1

In column 'a' I have values which are numbers separated by a comma (ranging from 1 to 35). e.g. '1,6,7,3,5,15,6,25,30' and '5,6,7,33' '1,6,29,15'

In a new column 'b', I want the value to say 'yes' whenever the value in column A is 5 or it's variations ,5 (comma 5) or 5, (5 comma). However I don't want values such as 15 or 25 included. Is there a way to include all combinations of 5 with a comma but not anything else?

df.loc[df['a'].str.contains(',5'), 'b'] = 'yes'
df.loc[df['a'].str.contains('5,'), 'b'] = 'yes'

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Sanch
  • 367
  • 2
  • 11
  • `^5$|^5,|,5,|,5$` seems to cover all the bases. https://regex101.com/r/NsqaP6/1 – MonkeyZeus Jan 14 '20 at 13:05
  • 1
    what if value contains both `5` and `15`? like `5,2,15,5` – RomanPerekhrest Jan 14 '20 at 13:08
  • @MonkeyZeus it nearly does, except for consecutive `5`s e.g. `5,5,5,5`. `(^|,)5(?=,|$)` gets those, too. – jofrev Jan 14 '20 at 13:30
  • 2
    @jofrev The goal is a boolean check for existence, not extract all occurrences. – MonkeyZeus Jan 14 '20 at 13:31
  • @MonkeyZeus My bad, you're right! Got hung up on the last sentence (*[...] all combinations of 5 [...]*). – jofrev Jan 14 '20 at 13:34
  • 1
    @jofrev No worries. As with any regex question posted on this site, very seldom is the poster aware of how their choice of words affects the solution presented. Most questions have a lot of incompatible puffery which makes the question unanswerable on the first try. At any rate, your regex would also work for the goal at hand. – MonkeyZeus Jan 14 '20 at 13:40
  • 1
    How about `\b5\b`? – Toto Jan 14 '20 at 14:07
  • This first code seems to have worked for me ```^5$|^5,|,5,|,5$ ``` . Sorry for the confusing question. Thank you for all your comments - I will go through each one to try and understand it all. – Sanch Jan 14 '20 at 14:27

2 Answers2

1

I would suggest converting your comma-separated string into an array (see here for how: How to convert a string to a list in Python?).

Then you can check if the search value (e.g. '5') exists in the array using in, e.g.:

if searchValue in arrayOfNumbers { return True }

(Or you could try a ternary operator, however that's done in Python)

1

I would suggest something like:

# your dataframe
df = pandas.DataFrame({'A': ['1,2,34,5,6', '32,2,4,67,5', '4,3,2,1,']}) 
df['B'] = df['A'].apply(lambda x : True if '5' in x.split(',') else False)

this will add a column B to your dataframe containing True if 5 is there and False otherwise.

           A    B
0   1,2,34,5,6  True
1   32,2,4,67,5 True
2   4,3,2,1,    False
Pierluigi
  • 1,048
  • 2
  • 9
  • 16
  • I think this will include 15 and 25, which I want to avoid – Sanch Jan 14 '20 at 14:36
  • 1
    if you try `True if '5' in ['15', '25'] else False` you'll see that the above solution is correct. The `split` does the trick of converting your string in a list of strings. – Pierluigi Jan 14 '20 at 14:46