I don't think this exact question has been answered yet, so here goes.
I have a Pandas data frame, and I want to select all rows that contain a string in column A or column B.
Say the dataframe looks like this:
d = {'id':["1", "2", "3", "4"],
'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"],
'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"],
'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"],
'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]}
dataframe = pandas.DataFrame(d)
Which gives:
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
2 "Cats are bad" "Cats bad" "horse, cat" 2019-10-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
4 "Turkey are best" "Turkey tome" "turkey, horse" 2016-11-29
Let's say I want to create a new dataframe containing rows with the string horse
(ignoring capitalisation) in the column title
OR the column description
, but not in the column tag
(or any other column).
The result should be (row 2 and 4 get dropped):
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
I have seen a few answers for one column, such as something like:
dataframe[dataframe['title'].str.contains('horse')]
But I am not sure (1) how to add multiple columns to this statement and (2) how to modify it with something like string.lower()
to remove capitals in the column values for the string match.
Thanks in advance!