Drop rows from dataset

Question

I have a dataset looks like below:

   Attribute:Value  Support
0            VDM:1        9
1            VDM:2        2
2            VDM:3        0
3            VDM:4        0
4            VDM:5        1
5            MDM:1        2
6            MDM:2        6
7            MDM:3        0
8            MDM:4        3
9            MDM:5        1
10            OM:1        2
11            OM:2        6
12            OM:3        0
13            OM:4        3
14            OM:5        1

Here I want to delete those rows where support is less than or equals to 4 and value of attribute:value pair is 1 or 2 or 3. After removing rows the the dataset will looks like below:

   Attribute:Value  Support
0            VDM:1        9
1            VDM:4        0
2            VDM:5        1
3            MDM:2        6
4            MDM:4        3
5            MDM:5        1
6             OM:2        6
7             OM:4        3
8             OM:5        1

The value part will contains only 1,2,3,4,5.

Possible duplicate of [Deleting DataFrame row in Pandas based on column value](https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value) — PV8, Jun 11 '19 at 06:55

jezrael · Accepted Answer · 2019-06-11T07:06:57.610

Use boolean indexing for remove rows - but conditions is necessary invert - so instead & for AND is used | for OR, for first mask use ~ for invert mask and for second condition is used Series.gt > for invert <=:

Also for values after : is used Series.str.split or Series.str.extract:

mask = ~df['Attribute:Value'].str.split(':').str[1].isin(['1','2','3']) | df['Support'].gt(4)

Because:

The value part will contains only 1,2,3,4,5.

is possible use:

mask = (df['Attribute:Value'].str.extract(':(\d+)', expand=False).astype(int).gt(3) | 
        df['Support'].gt(4))

df1 = df[mask]
print (df1)

   Attribute:Value  Support
0            VDM:1        9
3            VDM:4        0
4            VDM:5        1
6            MDM:2        6
8            MDM:4        3
9            MDM:5        1
11            OM:2        6
13            OM:4        3
14            OM:5        1

alternative solution is impressive :) – Mohamed Thasin ah Jun 11 '19 at 07:06 — Mohamed Thasin ah, Jun 11 '19 at 07:06

score 1 · Answer 2 · answered Jun 11 '19 at 06:59

I think you are looking for this,

s=(df['Attribute:Value'].str.split(':').str[-1]).astype(int)
df=df[(df['Support']>4)|(s>3)]

O/P:

   Attribute:Value  Support
0            VDM:1        9
3            VDM:4        0
4            VDM:5        1
6            MDM:2        6
8            MDM:4        3
9            MDM:5        1
11            OM:2        6
13            OM:4        3
14            OM:5        1

explanation:

split attribute and value
keep rows is value is greater than 3 or Support is greater than 4.

score 1 · Answer 3 · answered Jun 11 '19 at 06:59

You can use:

df[~(df['Attribute:Value'].str.split(':').str[1].isin(['1','2','3'])&df.Support.le(4))]

   Attribute:Value  Support
0            VDM:1        9
3            VDM:4        0
4            VDM:5        1
6            MDM:2        6
8            MDM:4        3
9            MDM:5        1
11            OM:2        6
13            OM:4        3
14            OM:5        1

Drop rows from dataset

3 Answers3