-1

I am trying to create two smaller dataframes based on a single one ("main Dataframe"). The first dataframe (1) should consist of all the columns of the main DataFrame for which the number of modalities is less than, say 2. The other dataframe should have all the remaining columns.

I tried several things that didn't work. Last one, I tried to return uniques in the DataFrame, sort the values and then only select those I could see matched the criteria of two modalities, in total 11 columns

Try 1:

new_df = df.iloc[df.columns[df.nunique[0] <=2]]
[TypeError: 'method' object is not subscriptable

Try 2:

new_df = df.loc[df.nunique.sort_values()[:11])]

But it returned all the columns without screening.

Could anyone help me solve this?

Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60

1 Answers1

0

IIUC, first dataframe should include all columns from the original dataframe where each column has 1 unique value. You could use a mask to do so:

import pandas as pd
import numpy as np

main = pd.DataFrame({
    'col1': ['a', 'a', 'a', 'b', 'b', 'b'],
    'col2': ['thing1', 'thing2', 'thing3', 'thing4', 'thing5', 'thing6'],
    'col3': [11, 11, 33, 44, 2, 3],
    'col4': [11, 11, 11, 11, 56, 77]
})

mask = (main.nunique() <= 2)

Use the mask to create two separate DFs:

# containing columns with <= 2 unique values:
df1 = main.loc[:, mask]
# containing all the remaining columns:
df2 = main.loc[:, ~mask]
dramarama
  • 140
  • 1
  • 9