Comma containing values being taken as NaN in python read_csv

Question

I wrote the following code for reading a csv file which looks somewhat like this:

"Device","Parent Device","Sensor","Location","Time","Value","Units","Status"
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:15:00 AM","927.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:20:00 AM","940.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:25:00 AM","917.5","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:30:00 AM","1,106.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:35:00 AM","1,075.1","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:40:00 AM","1,078.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:45:00 AM","1,018.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:50:00 AM","1,017.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:55:00 AM","1,036.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 9:00:00 AM","995.0","kW",""

As you can see some values in the "Value" section have a (,) and are being read as NaN. I tried the following code:

import pandas as pd


def get_data(filename):

    #ISO-8859-1
    df=pd.read_csv(filename,thousands=",",sep=",",encoding='ISO-8859-1')

    df["Time"]=pd.to_datetime(df["Time"])
    df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
    df=df.set_index(df["Time"])

    df=df[df["Time"]>="2015-05-01"]
    df=df[df["Time"]<"2016-05-01"]
    df=df[["Value"]]
    df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
    # Set a conditional interpolate in the future based on dataset size and 
    #null values detected
    df=df.interpolate()
    return df

I am not sure if the error is due to the encoding format (I tried the default 'utf-8' too) or something with the way my csv file is formatted. Can anyone point out the mistake or alternate approach. Thanks!

What version of pandas are you using? `pd.read_csv(.., thousands=',')` seems to work correctly for me on 0.18.1 — joris, Jun 24 '16 at 18:15
sys.version output: 3.5.1 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)] — Ayush, Jun 24 '16 at 18:19
Can you show the output of `pd.__version__`? (but I suppose at least 0.17 given your anaconda version) — joris, Jun 24 '16 at 18:22
Does this answer your question? [Get pandas.read\_csv to read empty values as empty string instead of nan](https://stackoverflow.com/questions/10867028/get-pandas-read-csv-to-read-empty-values-as-empty-string-instead-of-nan) — dank8, Mar 03 '23 at 02:49

Comma containing values being taken as NaN in python read_csv

0 Answers0