0

I have a pandas dataframe with a column that has an array of comma delimited strings.

I want to check if all elements in the array contain a specific string. For instance:

string = 'close'
array = ['close',',,,,,,,,,,,,close,,,,,,,,,sub,','sub']

That should be a false, since the 3rd element does not contain 'close'; whereas

string = 'close'
array = ['close',',,,,,,,,,,,,close,,,,,,,,,sub,','sub, close']

should be a true since all elements contain the string.

It's worth mentioning that my dataframe has 250+ columns and 3M+rows, so I'm looking for whatever solution would be the best performance-wise

Thank you!

Daniel Martinez
  • 397
  • 4
  • 20
  • 1
    `all(string in el for el in array)`. However, you really shouldn't be storing arrays of strings in a DataFrame, it's not efficient. – user3483203 Dec 07 '18 at 18:12
  • For the dataframe you can use `apply`: `df["array"].apply(lambda array: all(string_ in el for el in array))`. I don't know if there's a faster (vectorized) way in this case. – pault Dec 07 '18 at 18:18

0 Answers0