I have a series of .csv files that I'm reading with pandas.read_csv. From a bunch of columns, I only read 2, (the 2nd and 15th columns).
datafiles = glob.glob(mypath)
for dfile in datafiles:
data = pd.read_csv(dfile,header=6,usecols=['Reading','Value'])
the CSV looks like this, with a few lines of header at the top. Every once in a while pandas reads one of these numbers off as a NaN. Excel has no trouble reading these values, and visually inspecting the file I don't see what causes the problem. Specifically in this case, the row indexed as 265 in this file, 263 in the data frame, the 'Value' column reads a NaN when it should be ~27.4.
>>>data['Value'][264]
nan
This problem is consistent doesn't change with the number of files I read. In many of the files, this problem is not present. In the rest, it will only read one random number as a NaN, in either one of the columns. I've tried changing from the automatic float64 to np.float128 using dtype, but this doesn't fix it. Any ideas on how to fix this?
Update: A grep search shows that the newline character is \M with only 4 exceptions--lines at the beginning of every file before the header. On further inspection, this specific point [264] is treated differently in the failing files: In 5/12 files, it's fine. In 2/12 files it's read out as 27.0, in 3/12 it's read out as nan, and in 2/12 files it's read out as 2.0. One of the files (one that reads out a 27.0) is available for download here