3

I have a dataframe that has one column numbers. The column's data are strings of numbers separated by commas.

numbers
-------
1,3,4,5,17,30
5,6,18,37,41,42
1,2,5,14,19,20
1,5,13,20,29,31
1,9,10,11,14,17
2,9,13,25,30,35

How to get all the strings that contain numbers 1 & 5 only?

The desired output:

numbers
-------
1,3,4,5,17,30
1,2,5,14,19,20
1,5,13,20,29,31
HmH
  • 389
  • 3
  • 14

1 Answers1

3

You can create df by split and compare with eq with any for both condition. Last filter by boolean indexing:

df1 = df['numbers'].str.split(',', expand=True).astype(int)
df = df[df1.eq(1).any(1) & df1.eq(5).any(1)]
print (df)
           numbers
0    1,3,4,5,17,30
2   1,2,5,14,19,20
3  1,5,13,20,29,31

Another solution with contains for conditions:

a = df['numbers'].str.contains(',1,|,1$|^1,')
b = df['numbers'].str.contains(',5,|,5$|^5,')

df = df[a & b]
print (df)
           numbers
0    1,3,4,5,17,30
2   1,2,5,14,19,20
3  1,5,13,20,29,31
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • using contains can we filter the data using one regex – HmH Sep 06 '17 at 12:13
  • 1
    yes, if need `1` or `5`. but if need `1` and `5` I think not. – jezrael Sep 06 '17 at 12:16
  • what I meant was, can we write one regular expression that filter all strings that contain 1 and 5 at the same time. – HmH Sep 06 '17 at 12:23
  • `df1.eq(1).any(1) & df1.eq(5).any(1)` could be replaced with `df1.isin([1,5]).any(1)`, no? – IanS Sep 06 '17 at 12:31
  • @IanS - I think not, because isin is `1 or 5`. So get different output. – jezrael Sep 06 '17 at 12:32
  • 1
    @HmH - I think it is not possible check it [same time in regex](https://stackoverflow.com/a/3041326/2901002), because need `AND`, not `OR` – jezrael Sep 06 '17 at 12:35