1

I need remove some rows for a DataFrame like this:

import pandas as pd
import numpy as np

input_ = pd.DataFrame()
input_ ['ID'] = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
input_ ['ST'] = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
input_ ['V'] = [NaN, NaN, 1, 1, NaN, 1, Nan, 1, NaN, NaN, NaN, NaN]\

And finish with a DataFrame like this one:

output_ ['ID'] = [ 2, 3, 4, 2, 3, 4, 2, 3, 4]
output_ ['ST'] = [ 1, 1, 1, 2, 2, 2, 3, 3, 3]
output_ ['V'] = [NaN, 1, 1, 1, Nan, 1, NaN, NaN, NaN]

Where, I had removed the rows with ID == 1, because, this rows have the column V == float(NaN) [np.isnan(V)] for ALL values in the column ST. How should I selec which rows I erase in Pandas DataFrame with this two conditions?.

Talenel
  • 422
  • 2
  • 6
  • 25
  • Does this answer your question? [Drop rows on multiple conditions in pandas dataframe](https://stackoverflow.com/questions/52456874/drop-rows-on-multiple-conditions-in-pandas-dataframe) – Rishabh Kumar Mar 06 '21 at 02:53
  • following up the approach on the link above you can try something like this: `input_.drop(input_[(input_['ID']==1) & (input_['V'].isna())].index)` – Rishabh Kumar Mar 06 '21 at 02:57
  • @RishabhKumar you seem to misunderstand the question. OP wants to drop `ID=1` because all `V` with that `ID` are `nan`. – Quang Hoang Mar 06 '21 at 02:58
  • @QuangHoang okay got it, it gave the same output, so didn't read the question carefully – Rishabh Kumar Mar 06 '21 at 03:04
  • I'm a bit confused because he says 'with this two conditions' and a different data set might not get the same results for 1 or 2 conditions. – David Moreau Mar 06 '21 at 03:08

3 Answers3

0

Use groupby().transform('any') to check if the group contains some notna:

valids = input_.V.notna().groupby(input_.ID).transform('any')

output = input_[valids]

Output:

    ID  ST    V
1    2   1  NaN
2    3   1  1.0
3    4   1  1.0
5    2   2  1.0
6    3   2  NaN
7    4   2  1.0
9    2   3  NaN
10   3   3  NaN
11   4   3  NaN
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

Try this:

input_ = input_[(input_['ID']!=1) & input_['V'].notnull()]

I'm not sure I fully understand your question and whether you wanted to filter by 1, or if you only did that to get rid of the NaN values. If you don't want to actually filter by ID==1, just do:

input_ = input_[input_['V'].notnull()]

Output for both:

   ID  ST    V
2   3   1  1.0
3   4   1  1.0
5   2   2  1.0
7   4   2  1.0
David Moreau
  • 148
  • 8
0
input_ = pd.DataFrame()

input_ ['ID'] = [1,     2, 3, 4,   1, 2,   3, 4,   1,   2,   3,   4]

input_ ['ST'] = [1,     1, 1, 1,   2, 2,   2, 2,   3,   3,   3,   3]

input_ ['V']  = ['NaN', 'NaN', 1, 1, 'NaN', 1,'Nan', 1, 'NaN', 'NaN', 'NaN', 'NaN']

input_1 = pd.DataFrame(input_)

print(input_1)

input_1.drop(0, inplace = True)
input_1.drop(4, inplace = True)
input_1.drop(8, inplace = True)

print(input_1)