0

Now I have a column in data which looks like this:

Column
'Star Wars: Episode I The Phantom Menace'
'Star Wars: Episode I The Phantom Menace'
NaN
'Star Wars: Episode I The Phantom Menace'
NaN
....

What I tried to do was to convert the string column into a boolean column i.e True for real value and False for NaN.

I tried to classify the value with the following command:

import numpy as np
star_wars[column] = star_wars[column].map(lambda x: True if (x != np.nan) else False)
star_wars[column].value_counts()

It returned that all the rows, either with true value and with nan value, to be true, which should not be the case.

I also tried to get the result through truthy/falsey value:

import numpy as np
star_wars[column] = star_wars[column].map(lambda x: True if (x) else False)
star_wars[column].value_counts()

But interestingly, when I use the hard code:

true_false = {
    "Star Wars: Episode I  The Phantom Menace": True,
    np.nan: False,
}

star_wars[column] = star_wars[column].map(true_false)

Then it works.

What's the issue for my solution? Or is there any document that I should refer to regarding to this issue? Thank you for your help in advance!

Pak Hang Leung
  • 389
  • 5
  • 15

1 Answers1

0

You don't need map at all.

You can simply use df.Column.notna(), like this:

This is the df:

In [504]: df 
Out[504]: 
                                    Column
0  Star Wars: Episode I The Phantom Menace
1  Star Wars: Episode I The Phantom Menace
2                                      NaN
3  Star Wars: Episode I The Phantom Menace
4                                      NaN

In [506]: df = df.Column.notna()     
In [507]: df  
Out[507]: 
0     True
1     True
2    False
3     True
4    False
Name: Column, dtype: bool
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58