New pd DataFrame from DataFrame with limited modalities

Question

I am trying to create two smaller dataframes based on a single one ("main Dataframe"). The first dataframe (1) should consist of all the columns of the main DataFrame for which the number of modalities is less than, say 2. The other dataframe should have all the remaining columns.

I tried several things that didn't work. Last one, I tried to return uniques in the DataFrame, sort the values and then only select those I could see matched the criteria of two modalities, in total 11 columns

Try 1:

new_df = df.iloc[df.columns[df.nunique[0] <=2]]
[TypeError: 'method' object is not subscriptable

Try 2:

new_df = df.loc[df.nunique.sort_values()[:11])]

But it returned all the columns without screening.

Could anyone help me solve this?

my tries do not appear as code when I type into the code brackets :/ — MlleBlinie, Jul 16 '23 at 20:28
Please show data for a [reproducible example](https://stackoverflow.com/q/20109391/1422451). *Modalities* is not too clear a term without illustrative data. Are you attempting to check uniqueness across *every* column? — Parfait, Jul 16 '23 at 20:53

score 0 · Answer 1 · answered Jul 16 '23 at 21:25

IIUC, first dataframe should include all columns from the original dataframe where each column has 1 unique value. You could use a mask to do so:

import pandas as pd
import numpy as np

main = pd.DataFrame({
    'col1': ['a', 'a', 'a', 'b', 'b', 'b'],
    'col2': ['thing1', 'thing2', 'thing3', 'thing4', 'thing5', 'thing6'],
    'col3': [11, 11, 33, 44, 2, 3],
    'col4': [11, 11, 11, 11, 56, 77]
})

mask = (main.nunique() <= 2)

Use the mask to create two separate DFs:

# containing columns with <= 2 unique values:
df1 = main.loc[:, mask]
# containing all the remaining columns:
df2 = main.loc[:, ~mask]

New pd DataFrame from DataFrame with limited modalities

1 Answers1