-1

I have a dataframe

import pandas as pd
data={'ip.src':['x.x.x.x','y.y.y.y','z.z.z.z'],
'ip.dst':['a.a.a.a','b.b.b.b','c.c.c.c'],
'src_country':['china','US','china'],
'dst_country':['pakistan','china','india']
}

Data=pd.DataFrame(data)

I want to keep only that value in ip.src and ip.dst columns which has china ,like if china is in src_country then it should only keep the value in ip.src and if china is in dst_country then it should only keep the value in ip.dst.Is there any way to do it?

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44

4 Answers4

1

Something like this?

import numpy as np

Data = Data[(Data['src_country'] == 'china') | (Data['dst_country'] == 'china')]

Data[['src_country', 'dst_country']] = Data[['src_country', 'dst_country']].applymap(lambda x: np.nan if x != 'china' else x)

Data
    ip.src   ip.dst src_country dst_country
0  x.x.x.x  a.a.a.a       china         NaN
1  y.y.y.y  b.b.b.b         NaN       china
2  z.z.z.z  c.c.c.c       china         NaN
Mark
  • 7,785
  • 2
  • 14
  • 34
1

Use DataFrame.loc for modify ip.src/ip.dst columns:

Data['ip.src'] = Data.loc[Data['src_country'] == 'china', 'ip.src']
Data['ip.dst'] = Data.loc[Data['dst_country'] == 'china', 'ip.dst']

print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india

Or:

m = Data[['src_country','dst_country']] == 'china'
Data[['ip.src', 'ip.dst']] = Data[['ip.src', 'ip.dst']].where(m.to_numpy())
print (Data)
    ip.src   ip.dst src_country dst_country
0  x.x.x.x      NaN       china    pakistan
1      NaN  b.b.b.b          US       china
2  z.z.z.z      NaN       china       india
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1
Data['ip.src'] = Data['ip.src'][(Data['src_country'] == 'china')]
Data['ip.dst'] = Data['ip.dst'][(Data['dst_country'] == 'china')]

output

ip.src  ip.dst  src_country dst_country
x.x.x.x NaN     china       pakistan
NaN     b.b.b.b US          china
z.z.z.z NaN     china       india
Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
0

Another possible solution:

Data[['ip.src', 'ip.dst']] = (np.where(
    Data[['src_country', 'dst_country']].eq('china'), 
    np.nan, Data[['ip.src', 'ip.dst']]))

Output:

    ip.src   ip.dst src_country dst_country
0      NaN  a.a.a.a       china    pakistan
1  y.y.y.y      NaN          US       china
2      NaN  c.c.c.c       china       india
PaulS
  • 21,159
  • 2
  • 9
  • 26