0

I'm trying to convert missing data in an xls file to a dataframe of Nan values.

New list=energy.where(energy['Energy Supply']>=0)

I got:

the operator >= can't be used between strings and integer.

My data type is numeric apart from the missing data.

atline
  • 28,355
  • 16
  • 77
  • 113
Sarbol Dipo
  • 21
  • 1
  • 2

1 Answers1

0

You need to use .loc for indexing:

energy.loc[energy['Energy Supply']>=0,:]

Use of : to show all columns is optional. Following should also work:

energy.loc[energy['Energy Supply']>=0]

Above will not include any missing values.

To detect strings, use:

energy['Energy Supply'].apply(lambda x: False if isinstance(x,str) else x>=0)

To replace all strings with NaN:

energy['Energy Supply'].loc[energy['Energy Supply'].apply(lambda x: isinstance(x, str))] = numpy.nan

Also, New keyword is not used in Python.

Another point, since list is a keyword in Python, it should not be used as a variable. Use lst or mylist etc.

rnso
  • 23,686
  • 25
  • 112
  • 234