Split Dataframe into two based on null and blocked values

Question

I have a dataframe and want to split the dataframe into two based on multiple columns.

df should have all rows without null column and status yes. Rest should be on df_null

df = vehicle.csv

Status  Country City     Year 
Yes     USA     New York 2001
Yes     Canada           2001
Yes     France  Paris    
No              Rio      1843
No      Germany Berlin   2008
Yes                      2004

# df_null has all the rows with null in the three columns
df_null = df[~df[['Country', 'City', 'Year']].notnull().all(1)]

# df has all columns with not null and status = yes
df = df[df[['Country', 'City', 'Year']].notnull().all(1)]

df = df.loc[df['Status'] == 'Yes']

result = pd.concat([df, df_null])

Row with Germany isnt on result dataframe because its filtered out by Status = Yes.

OtherBarry · Answer 1 · 2021-02-12T04:16:33.457

You can do this by making a binary mask with the code below:

# Import Data
df = pd.DataFrame(
    [
        ["Yes", "USA", "New York", 2001],
        ["Yes", "Canada", None, 2001],
        ["Yes", "France", "Paris", None],
        ["No", None, "Rio", 1843],
        ["No", "Germany", "Berlin", 2008],
        ["Yes", None, None, 2004],
    ],
    columns=["Status", "Country", "City", "Year"],
)

# Create Mask
valid_rows = (df[["Country", "City", "Year"]].notnull().all(1)) & (df["Status"] == "Yes")

df_null = df[~valid_rows]  # Filter by inverse of mask
df = df[valid_rows]  # Filter by mask

This outputs for df as:

	Status	Country	City	Year
0	Yes	USA	New York	2001

And for df_null as:

	Status	Country	City	Year
1	Yes	Canada		2001
2	Yes	France	Paris	nan
3	No		Rio	1843
4	No	Germany	Berlin	2008
5	Yes			2004

Sorry, I missed the .all(1) method. It should work fine now. — OtherBarry, Feb 12 '21 at 04:06

score 0 · Answer 2 · answered Feb 12 '21 at 00:01

0

Is this what you are looking for ?

# Import pandas library 
import pandas as pd 
import numpy as np
# initialize list of lists 
data = [['Yes', 'USA', 'New York' ,2001 ],['Yes', 'Canada','',2001 ], ['Yes', 'France', 'Paris' ,'' ], ['No','' , 'Rio' ,1843 ],['No', 'Germany', 'Berlin' ,2008 ],['Yes', '', '' ,2004 ]] 
# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ["Status","Country","City","Year"]) 

# Adding filter conditions. 
df_new =  df.replace('', np.nan)
df_new = df_new[df_new.Status == 'Yes'].dropna()
df_null =df[(~df.isin(df_new))].dropna()
# Printing the two dataframes
print(df_new)
print(df_null)

answered Feb 12 '21 at 00:01

A DUBEY

806
6
20

not exactly I want to label the rows which the filter is based on because I have multiple columns with null values which is relevant – james Feb 12 '21 at 00:13
may you provide us with a desired output ? that will be really helpful to solve this. – A DUBEY Feb 12 '21 at 01:37

score 0 · Answer 3 · answered Jul 14 '23 at 06:16

if your problem statement is to split the dataframe based on Null values then simply use below code.

DF_null = processed_records_DF[processed_records_DF['ColumnName'].isnull()]

DF_not_null = processed_records_DF[processed_records_DF['ColumnName'].notnull()]

Split Dataframe into two based on null and blocked values

3 Answers3