1

I have a DataFrame in which I have a duplicate column namely weather. As Seen in this picture of dataframe. One of them contains NaN values that is the one I want to remove from the DataFrame. I tried this method

data_cleaned4.drop('Weather', axis=1)

It dropped both columns as it should. I tried to pass a condition to drop method but I couldn't. It shows me an error.

data_cleaned4.drop(data_cleaned4['Weather'].isnull().sum() > 0, axis=1)

Can anyone tell me how do I remove this column. Remember that the second last contains the NaN values not the last one.

  • https://stackoverflow.com/questions/14984119/python-pandas-remove-duplicate-columns Try this one. – Amit Nikhade Jan 09 '21 at 05:45
  • I tried `pandas.read_image` but it just came back as "has no attribute". Could you post the dataframe as code until those slackers implement it? – tdelaney Jan 09 '21 at 06:25

3 Answers3

1

A general solution. (df.isnull().any(axis=0).values) gets which columns have any NaN values and df.columns.duplicated(keep=False) marks all duplicates as True, both combined will give the columns which you want to retain

General Solution:

df.loc[:, ~((df.isnull().any(axis=0).values) & df.columns.duplicated(keep=False))]

Input

    A   B   C   C   A
0   1   1   1   3.0 NaN
1   1   1   1   2.0 1.0
2   2   3   4   NaN 2.0
3   1   1   1   4.0 1.0

Output

    A   B   C
0   1   1   1
1   1   1   1
2   2   3   4
3   1   1   1

Just for column C:

df.loc[:, ~(df.columns.duplicated(keep=False) & (df.isnull().any(axis=0).values)
            & (df.columns == 'C'))]

Input

    A   B   C   C   A
0   1   1   1   3.0 NaN
1   1   1   1   2.0 1.0
2   2   3   4   NaN 2.0
3   1   1   1   4.0 1.0

Output

    A   B   C   A
0   1   1   1   NaN
1   1   1   1   1.0
2   2   3   4   2.0
3   1   1   1   1.0
ggaurav
  • 1,764
  • 1
  • 10
  • 10
  • This one has successfully worked. Thanks for the solution. –  Jan 11 '21 at 06:05
  • Welcome! @Malik As it worked for your problem you may consider accepting, upvoting the answer – ggaurav Jan 11 '21 at 06:15
  • I have already upvoted your answer but it shows votes cast by those with less than 15 reputation are counted but not added as public. Do upvote my Question please it may increase my reputation. –  Jan 11 '21 at 19:23
  • Done. I guess accepting the answer will still work – ggaurav Jan 11 '21 at 19:33
0

Due to the duplicate names you can rename a little bit, that's what the first lien of the code belwo does, then it should work...

data_cleaned4 = data_cleaned4.iloc[:, [j for j, c in enumerate(data_cleaned4.columns) if j != i]]

checkone = data_cleaned4.iloc[:,-1].isna().any()
checktwo = data_cleaned4.iloc[:,-2].isna().any()

if checkone:
    data_cleaned4.drop(data_cleaned4.columns[-1], axis=1)
elif checktwo:
    data_cleaned4.drop(data_cleaned4.columns[-2], axis=1)
else:
    data_cleaned4.drop(data_cleaned4.columns[-2], axis=1)
MaxYarmolinsky
  • 1,117
  • 1
  • 10
  • 16
0

Without a testable sample and assuming you don't have NaNs anywhere else in your dataframe

df = df.dropna(axis=1)

should work

Kenan
  • 13,156
  • 8
  • 43
  • 50