0

when I dump processed csv to stdin with csv.writer, ^M are appended in the output. why are they coming in?

writer = csv.writer(sys.stdout, delimiter=output_delimiter, quotechar=quotechar)
for row in csv.reader(open(args[0],"U"), delimiter=delimiter, quotechar=quotechar):
   writer.writerow(row)

How I invoke the cmd:

./csvcut -d ',' -q \" -f 2,4,5,6,7,8,9,10 data/listings.csv > data/extracted.csv

The generated file(data/extracted.csv) is:

 name,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price^M
 COZICOMFORT LONG TERM STAY ROOM 2,Francesca,North Region,Woodlands,1.44255,103.7958,Private room,83^M
 Pleasant Room along Bukit Timah,Sujatha,Central Region,Bukit Timah,1.33235,103.78521,Private room,81^M
 COZICOMFORT,Francesca,North Region,Woodlands,1.44246,103.79667,Private room,69^M
 Ensuite Room (Room 1 & 2) near EXPO,Belinda,East Region,Tampines,1.34541,103.95712,Private room,206^M
 B&B  Room 1 near Airport & EXPO,Belinda,East Region,Tampines,1.34567,103.95963,Private room,94^M
 Room 2-near Airport & EXPO,Belinda,East Region,Tampines,1.34702,103.96103,Private room,104^M
 3rd level Jumbo room 5 near EXPO,Belinda,East Region,Tampines,1.34348,103.96337,Private room,208^M

The input file(data/listings.csv) is:

 1024986,Super Host Apartment,5643415,Martin,Central Region,River Valley,1.29349,103.83837,Entire home/apt,140,2,145,2019-08-22,2.04,1,230
 1060046,S$950/mth spacious room for short/long term lease,5748910,Sarah,West Region,Bukit Panjang,1.38123,103.76874,Private room,49,3,15,2019-06-01,0.20,2,131
 1078804,Cozy Studio Room,4602014,F,North-East Region,Hougang,1.36764,103.90228,Private room,31,30,60,2018-09-28,0.78,3,225
 1131162,Cozy Room at Bedok Reservoir,6205166,Lydia,East Region,Bedok,1.33729,103.9298,Private room,72,2,1,2017-04-01,0.03,1,360
v78
  • 2,803
  • 21
  • 44

1 Answers1

1

Typically, ^M signifies carriage return. It is symbolized in different notations across different Operating Systems. Since you are writing to stdout and redirecting the output to another file, python assumes a \r\n carriage return value.

For *nix, carriage return is denoted by \n while for Windows, it is \r\n. ^M is the text editor showing you \r.

From what I see, you have these options:

  1. Write to the file directly in binary mode. (like here)
  2. Replace ^M in the output file with string replacement. (with unix2dos or replace())
S.Au.Ra.B.H
  • 457
  • 5
  • 9