0

I have a data which looks like below

data = [(datetime.datetime(2021, 2, 10, 7, 49, 7, 118658), u'12.100.90.10', u'100.100.12.1', u'100.100.12.1', u'LT_DOWN'),
       (datetime.datetime(2021, 2, 10, 7, 49, 14, 312273), u'12.100.90.10', u'100.100.12.1', u'100.100.12.1', u'LT_UP'),
       (datetime.datetime(2021, 2, 10, 7, 49, 21, 535932), u'12.100.90.10', u'100.100.12.1', u'100.100.22.1', u'LT_UP'),
       (datetime.datetime(2021, 2, 10, 7, 50, 28, 264042), u'12.100.90.10', u'100.100.12.1', u'100.100.32.1', u'LT_DOWN'),
       (datetime.datetime(2021, 2, 10, 7, 50, 28, 725961), u'12.100.90.10', u'100.100.12.1', u'100.100.32.1', u'PL_DOWN'),
       (datetime.datetime(2021, 2, 10, 7, 50, 32, 450853), u'10.100.80.10', u'10.55.10.1', u'100.100.12.1', u'PL_LOW'),
       (datetime.datetime(2021, 2, 10, 7, 51, 32, 450853), u'10.10.80.10', u'10.55.10.1', u'100.100.12.1', u'MA_HIGH'),
       (datetime.datetime(2021, 2, 10, 7, 52, 34, 264042), u'10.10.80.10', u'10.55.10.1', u'10.55.10.1', u'PL_DOWN'),
]

This is how it looks on loading in pandas

df = pd.DataFrame(data)
df.columns = ["date", "start", "end", "end2", "type"]
# drop duplicate rows
df = df.drop_duplicates()

                        date         start           end          end2     type
0 2021-02-10 07:49:07.118658  12.100.90.10  100.100.12.1  100.100.12.1  LT_DOWN
1 2021-02-10 07:49:14.312273  12.100.90.10  100.100.12.1  100.100.12.1    LT_UP
2 2021-02-10 07:49:21.535932  12.100.90.10  100.100.12.1  100.100.22.1    LT_UP
3 2021-02-10 07:50:28.264042  12.100.90.10  100.100.12.1  100.100.32.1  LT_DOWN
4 2021-02-10 07:50:28.725961  12.100.90.10  100.100.12.1  100.100.32.1  PL_DOWN
5 2021-02-10 07:50:32.450853  10.100.80.10    10.55.10.1  100.100.12.1   PL_LOW
6 2021-02-10 07:51:32.450853   10.10.80.10    10.55.10.1  100.100.12.1  MA_HIGH
7 2021-02-10 07:52:34.264042   10.10.80.10    10.55.10.1   100.55.10.1  PL_DOWN

Now I only want to select rows that have end and end2 columns containing same values. So my output would be

                        date         start           end          end2     type
0 2021-02-10 07:49:07.118658  12.100.90.10  100.100.12.1  100.100.12.1  LT_DOWN
1 2021-02-10 07:49:14.312273  12.100.90.10  100.100.12.1  100.100.12.1    LT_UP
2 2021-02-10 07:52:34.264042   10.10.80.10    10.55.10.1    10.55.10.1  PL_DOWN

Now according to this question on stackoverflow Get rows that have the same value across its columns in pandas I could do this to check for similar values across all columns.

df[df.apply(pd.Series.nunique, axis=1) == 1]

But for my case I want this check limited to certain columns only.

How do I do this?

Souvik Ray
  • 2,899
  • 5
  • 38
  • 70

2 Answers2

2

Just use masking.

df[df.end == df.end2]
Robert Axe
  • 396
  • 2
  • 11
1
df = df.loc[(df['end'] == df['end2'])]
Ciaran O Brien
  • 374
  • 3
  • 13