25

So I want to convert a simple tab delimited text file into a csv file. If I convert the txt file into a string using string.split('\n') I get a list with each list item as a string with '\t' between each column. I was thinking I could just replace the '\t' with a comma but it won't treat the string within the list like string and allow me to use string.replace. Here is start of my code that still needs a way to parse the tab "\t".

import csv
import sys

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

in_txt = open(txt_file, "r")
out_csv = csv.writer(open(csv_file, 'wb'))

file_string = in_txt.read()

file_list = file_string.split('\n')

for row in ec_file_list:       
    out_csv.writerow(row)
wilbev
  • 5,391
  • 7
  • 28
  • 30

3 Answers3

48

csv supports tab delimited files. Supply the delimiter argument to reader:

import csv

txt_file = r"mytxt.txt"
csv_file = r"mycsv.csv"

# use 'with' if the program isn't going to immediately terminate
# so you don't leave files open
# the 'b' is necessary on Windows
# it prevents \x1a, Ctrl-z, from ending the stream prematurely
# and also stops Python converting to / from different line terminators
# On other platforms, it has no effect
in_txt = csv.reader(open(txt_file, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))

out_csv.writerows(in_txt)
agf
  • 171,228
  • 44
  • 289
  • 238
  • 1
    -1 You are presuming that the OP is on Python 2.x; in that case the input file should be opened with 'rb' mode. Also not ensuring that at least the output file is closed, preferably both files. – John Machin Apr 19 '12 at 01:34
  • 3
    bikeshedding. Both files are closed as soon as the script terminates. Which is.. immediately. +1. – ch3ka Apr 19 '12 at 01:37
  • 1
    @JohnMachin I didn't presume anything. I changed as little as possible to show how to convert a file. `with` isn't necessary if the program is going to terminate immediately -- the file will be closed. I added a comment to indicate care should be taken if it is a long running program. – agf Apr 19 '12 at 01:37
  • It's nothing to do preserving line endings. Read the docs for csv.reader for both Python 2.7 and 3.2. See also http://stackoverflow.com/questions/5180555/python-2-and-3-csv-reader – John Machin Apr 19 '12 at 01:53
  • "The only difference between the two modes is how newlines are handled on Windows.": Wrong; see my answer. What is "the encoding of the file is ASCII-compatible" supposed to mean? In any case, the sample data file in my answer is encoded in ASCII. – John Machin Apr 19 '12 at 03:28
  • @JohnMachin OK, I didn't see any reference to the Ctrl-z issue anywhere, thanks for pointing it out. Assuming the data represents text, any encoding that uses the same bytes as ASCII for `\r` and `\n` will be unaffected by the newline transformation (other than the character changing). The Ctrl-z issue is obviously totally different. – agf Apr 19 '12 at 03:32
  • "other than the character changing": wrong, look again: the `\r` is deleted – John Machin Apr 19 '12 at 03:53
  • @JohnMachin That's what I meant. I realized it wasn't stated clearly (character instead of line terminator) but it was after the edit deadline. I think it's stated adequately in the code comments in my answer. – agf Apr 19 '12 at 04:18
  • @agf Thanks for the answer. However, I'm trying your exact code and all I'm getting is a blank csv file. I tried making multiple tab delimited txt files and same result, a blank csv. Could this code be missing something? – wilbev Apr 19 '12 at 04:29
  • @wilbev I just tested it; it worked fine. Did you maybe miss the last row, `writer.writerows(in_txt)`? – agf Apr 19 '12 at 04:39
  • @agf: "On other platforms, it has no effect": Wrong, "classic" Mac uses/used `\r` as the line separator. How about you delete that and say instead something like: "rb/wb should be used for portability in all Python 2.x csv read/write code"? – John Machin Apr 19 '12 at 04:45
  • @wilbev: are you sure the output file has been closed (example: you run the afg code at the interactive prompt, and inspect the output file from another window)? – John Machin Apr 19 '12 at 04:48
  • @JohnMachin But Python only does the conversion on Windows, according to the docs. Is that not correct? Leaving for the night, will update tomorrow if necessary. – agf Apr 19 '12 at 04:51
  • @agf I got it to work by closing my IDE and just running it from py file. Must have the output file not closing when running in IDE. Thanks – wilbev Apr 19 '12 at 04:56
  • @agf: "Python only does the conversion on Windows, according to the docs" ... (1) according to what docs? (2) so what, the point is **portability** i.e. one can write code (by using ''rb' and 'wb') that will work on any platform without needing to remember what happens on what platform – John Machin Apr 19 '12 at 09:47
  • @JohnMachin You're certainly right, that's why I put in the `b` in the code. I saw it two places yesterday; the one I see now is the [reading and writing files](http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files) section of the tutorial. – agf Apr 19 '12 at 17:45
  • @agf: Sigh. I don't see the word "only" anywhere in those docs. – John Machin Apr 19 '12 at 22:03
  • I lost my header names while writing into csv. How can this be resolved? – Curious Apr 20 '20 at 07:12
  • @Curious `out_csv.writeheader()` if it knows about your headers. Check out the docs for the CSV module; those can be read in from a CSV or provided as `field_names` when initializing the writer. – agf Apr 21 '20 at 15:58
1

Why you should always use 'rb' mode when reading files with the csv module:

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

What's in the sample file: any old rubbish, including control characters obtained by extracting blobs or whatever from a database, or injudicious use of the CHAR function in Excel formulas, or ...

>>> open('demo.txt', 'rb').read()
'h1\t"h2a\nh2b"\th3\r\nx1\t"x2a\r\nx2b"\tx3\r\ny1\ty2a\x1ay2b\ty3\r\n'

Python follows CP/M, MS-DOS, and Windows when it reads files in text mode: \r\n is recognised as the line separator and is served up as \n, and \x1a aka Ctrl-Z is recognised as an END-OF-FILE marker.

>>> open('demo.txt', 'r').read()
'h1\t"h2a\nh2b"\th3\nx1\t"x2a\nx2b"\tx3\ny1\ty2a' # WHOOPS

csv with a file opened with 'rb' works as expected:

>>> import csv
>>> list(csv.reader(open('demo.txt', 'rb'), delimiter='\t'))
[['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\r\nx2b', 'x3'], ['y1', 'y2a\x1ay2b', 'y3']]

but text mode doesn't:

>>> list(csv.reader(open('demo.txt', 'r'), delimiter='\t'))
[['h1', 'h2a\nh2b', 'h3'], ['x1', 'x2a\nx2b', 'x3'], ['y1', 'y2a']]
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • Do you have a python.org reference for the Ctrl-z behavior? I don't see any mention of it. – agf Apr 19 '12 at 03:38
  • 1
    @agf: No. It's a consequence of CPython 2.X delegating responsibility for deciding what to do to the `C` `stdio` library of the target compiler. – John Machin Apr 19 '12 at 04:37
1

This is how i Do it

import csv

with open(txtfile, 'r') as infile, open(csvfile, 'w') as outfile:
     stripped = (line.strip() for line in infile)
     lines = (line.split(",") for line in stripped if line)
     writer = csv.writer(outfile)
     writer.writerows(lines)
iun1x
  • 1,033
  • 11
  • 12