1

wrote a python script in windows 8.1 using Sublime Text editor and I just tried to run it from terminal in OSX Yosemite but I get an error.

My error occurs when parsing the first line of a .CSV file. This is the slice of the code lines is an array where each element is the line in the file it is read from as a string we split the string by the desired delimiter we skip the first line because that is the header information (else condition) For the last index in the for loop i = numlines -1 = the number of lines in the file - 2 We only add one to the value of i because the last line is blank in the file

for i in range(numlines):
    if i == numlines-1: 
        dataF = lines[i+1].split(',')
    else:
        dataF = lines[i+1].split(',') 
    dataF1 = list(dataF[3])
    del(dataF1[len(dataF1)-1])
    del(dataF1[len(dataF1)-1])
    del(dataF1[0])
    f[i] = ''.join(dataF1)
return f

All the lines in the csv file looks like this (with the exception of the header line):

"08/06/2015","19:00:00","1","410"

So it saves the single line into an array where each element corresponds to one of the 4 values separated by commas in a line of the CSV file. Then we take the 3 element in the array, "410" ,and create a list that should look like

['"','4','1','0','"','\n']

(and it does when run from windows) but it instead looks like

['"','4','1','0','"','\r','\n']

and so when I concatenate this string based off the above code I get 410 instead of 410.

My question is: Where did the '\r' term come from? It is non-existent in the original files when ran by a windows machine. At first I thought it was the text format so I saved the CSV file to a UTF-8, that didn’t work. I tried changing the tab size from 4 to 8 spaces, that didn’t work. Running out of ideas now. Any help would be greatly appreciated.

Thanks

Manos Nikolaidis
  • 21,608
  • 12
  • 74
  • 82
Cauchy
  • 71
  • 6
  • 1
    Possible duplicate of [Difference between \n and \r?](http://stackoverflow.com/questions/1761051/difference-between-n-and-r) – Peter Wood Dec 22 '15 at 17:58
  • In my experience, I've found that files that were generated on a windows machine and loaded in *nix machine will create new lines using "\r\n" instead of "\n". No idea why though. You can get around it by replacing "\r" before dping the split though. – tblznbits Dec 22 '15 at 17:59
  • Open the file in text mode and it will make sure you only get `\n` across platforms. You're probably opening the file in binary mode `'b'` or similar. – Peter Wood Dec 22 '15 at 18:00
  • The issue is I am trying to make the script OS independent and so i should be able to run it regardless of using Windows or Linux/Unix based systems. – Cauchy Dec 22 '15 at 18:10
  • According to python documentation open("filename.txt",'r') should open the file in text mode, accomdating an newline characters, and not binary mode. I am just confused as to why the '\r' appears on a unix based machine vs. windows based – Cauchy Dec 22 '15 at 18:12

1 Answers1

2

The "\r" is the line separator. The "\r\n" is also a line separator. Different platforms have different line separators.

A simple fix: if you read a line from a file yourself, then line.rstrip() will remove the whitespace from the line end.

A proper fix: use Python's standard CSV reader. It will skip the blank lines and comments, will properly handle quoted strings, etc.

Also, when working with long lists, it helps to stop thinking about them as index-addressed 'arrays' and use the 'stream' or 'sequential reading' metaphor.

So the typical way of handling a CSV file is something like:

import csv

with open('myfile.csv') as f:
  reader = csv.reader(f)
  # We assume that the file has 3 columns; adjust to taste
  for (first_field, second_field, third_field) in reader:
    # do something with field values of the current lines here
9000
  • 39,899
  • 9
  • 66
  • 104