Most questions here are related to finding a pattern like string in a specific column and do something with it. But what if I don't know the column?
Link to Q/A for a specific column: Link
I try to compare two dataframes, to make sure that they match, no columns have been added or rows deleted. One of those files is like a template. Where a group stands for a value range.
An example:
template = pd.DataFrame(
{'Headline': ['Subheading', '', 'Animal', 'Tiger', 'Bird', 'Lion'],
'Headline2': ['', 'Weight', 2017, 'group1', 'group2', 'group3'],
'Headline3': ['', '', 2018, 'group1', 'group2', 'group3']
})
testfile = pd.DataFrame(
{'Headline': ['Subheading', '', 'Animal', 'Tiger', 'Bird', 'Lion'],
'Headline2': ['', 'Weight', 2017, 150, 15, 201],
'Headline3': ['', '', 2018, 152, 12, 198]
})
Headline Headline2 Headline3
0 Subheading
1 Weight
2 Animal 2017 2018
3 Tiger group1 group1
4 Bird group2 group2
5 Lion group3 group3
Headline Headline2 Headline3
0 Subheading
1 Weight
2 Animal 2017 2018
3 Tiger 150 152
4 Bird 15 12
5 Lion 201 198
If I would do this print((template == testfile).all().all())
, it would be False
.
As a human, I know that row three to five differs, so I want to exclude them from my comparison:
drop_r = [3, 4, 5]
template = template.drop(template.index[drop_r])
testfile = testfile.drop(testfile.index[drop_r])
Then I would get print((template == testfile).all().all())
is True
So how can I get all row numbers into the object drop_r
for the condition that a row contains group[n].
I need to find the rows in template where the substring 'group' appears in any of the columns?