1

I have a dataset which contains an undesirable string (which indicates a measurement was not able to be made). I want to change these unwanted strings to 'NaN's when the text file of data is read by pandas as the presence of a string is turning the data type of an otherwise int column to string. If there is a better process to this please let me know. When I try and use methods I have found on another question (Change data type of columns in Pandas) they break at the point the unwanted string is found or else they seem to skip over the string altogether.

Code

import pandas as pd 
data = {
    'ID': [1,2,3,4],
    'V': [6.6,2.01,'tND - 7777',7.01],
    'A': [33,31,'tND - 88881',35]    
    } 
df = pd.DataFrame(data, columns = ['ID','V','A'])

print(df)
df.astype({"V": int})
print(df)
# returns ValueError: invalid literal for int() with base 10: 'tND - 7777'

pd.to_numeric(df['V'], errors = 'coerce')
pd.to_numeric(df['A'], errors = 'coerce')
print(df)
# returns original array, unwanted strings still in place

Unwanted strings

'tND - 7777','tND - 88881'

Desired outcome Data in the dataframe columns which are ints (I am assuming a NaN is considered an int, I just need to plot the data once the strings are no longer present).

Windy71
  • 851
  • 1
  • 9
  • 30
  • 1
    `pd.to_numeric` returns the series; it has no effect on the original underlying data. You need to assign the column back. – Karl Knechtel Sep 15 '20 at 08:02

1 Answers1

3

Assign output back:

df['V'] = pd.to_numeric(df['V'], errors = 'coerce')
df['A'] = pd.to_numeric(df['A'], errors = 'coerce')

Another idea is use:

df[['V','A']] = df[['V','A']].apply(pd.to_numeric, errors = 'coerce')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    Smashing my head into the desk, how did I not see that! Thank you jezrael, will accept as soon as it lets me. – Windy71 Sep 15 '20 at 07:46