0

I am trying to cast a string values in a dataframe series into floats. I tried using to_numeric() and astype() functions, but when I check after the type of a series element it gives me string again. I am using NYC SAT Scores Dataset

Here is my code:

    avg_crit_read["SAT Critical Reading Avg. Score"].astype('float')
    type(avg_crit_read["SAT Critical Reading Avg. Score"][0])

Another issue I encountered using this dataset is some columns that suppose to have numbers they have letters instead. For example, for "SAT Critical Reading Avg. Score" I have value 's' as well as values 279,300 etc.. I planned to replace these letters with the mean score of the rest columns. If anyone has experience with this datasets it can be useful to share of knows something about value "s".

Len
  • 43
  • 5
  • 3
    You need to assign it back. `avg_crit_read["SAT Critical Reading Avg. Score"]=avg_crit_read["SAT Critical Reading Avg. Score"].astype('float')`. By default, `astype` returns a copy. – ALollz Apr 10 '18 at 16:15
  • For your second question it really depends on how your raw data looks. If you just want to ignore rows with the value 's', then you could just use `pandas.to_numeric` with the `errors='coerce'` argument, and that will set them to `NaN`. You can then fill them later with the mean of other columns. But if you also have strings with commas, you may instead first want to do a string replace to remove those commas and then convert to a float. If you post some formatted raw data with all of the bad cases and your expected output, I'm sure someone will help. – ALollz Apr 10 '18 at 16:22
  • @ALollz you should make your first comment an answer rather than answering in a comment. – Silenced Temporarily Apr 10 '18 at 16:52
  • Possible duplicate of [Change data type of columns in Pandas](https://stackoverflow.com/questions/15891038/change-data-type-of-columns-in-pandas) – ALollz Apr 10 '18 at 17:30

0 Answers0