7

I would like to use python read and write files of the following format:

#h -F, field1 field2 field3
a,b,c
d,e,f
# some comments
g,h,i

This file closely resembles a typical CSV, except for the following:

  1. The header line starts with #h
  2. The second element of the header line is a tag to denote the delimiter
  3. The remaining elements of the header are field names (always separated by a single space)
  4. Comment lines always start with # and can be scattered throughout the file

Is there any way I can use csv.DictReader() and csv.DictWriter() to read and write these files?

Dave
  • 197
  • 1
  • 6

2 Answers2

8

You can parse the first line separately to find the delimiter and fieldnames:

    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]

Note that csv.DictReader can take any iterable as its first argument. So to skip the comments, you can wrap f in an iterator (skip_comments) which yields only non-comment lines:

import csv
def skip_comments(iterable):
    for line in iterable:
        if not line.startswith('#'):
            yield line

with open('data.csv','rb') as f:
    firstline = next(f).split()
    delimiter = firstline[1][-1]
    fields = firstline[2:]
    for line in csv.DictReader(skip_comments(f),
                               delimiter = delimiter, fieldnames = fields):
        print line

On the data you posted this yields

{'field2': 'b', 'field3': 'c', 'field1': 'a'}
{'field2': 'e', 'field3': 'f', 'field1': 'd'}
{'field2': 'h', 'field3': 'i', 'field1': 'g'}

To write a file in this format, you could use a header helper function:

def header(delimiter,fields):
    return '#h -F{d} {f}\n'.format(d = delimiter, f=' '.join(fields))

with open('data.csv', 'rb') as f:
    with open('output.csv', 'wb') as g:
        firstline = next(f).split()
        delimiter = firstline[1][-1]
        fields = firstline[2:]
        writer = csv.DictWriter(g, delimiter = delimiter, fieldnames = fields)
        g.write(header(delimiter,fields))
        for row in csv.DictReader(skip_comments(f),
                                   delimiter = delimiter, fieldnames = fields):
            writer.writerow(row)
            g.write('# comment\n')

Note that you can write to output.csv using g.write (for header or comment lines) or writer.writerow (for csv).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Nice. Now suppose I want to write to a file using this quasi-CSV format (i.e. using the four peculiarities mentioned in the question). How would I use csv.DictWriter to do that? – Dave Feb 07 '12 at 16:30
0

Assume the input file is opened as input. First, read in the header:

header = input.readline()

Parse out the delimiter and field names and use that to construct a DictReader. Now, instead of input, feed the reader the expression

(ln for ln in input where ln[0] != '#')

to skip the comments.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836