1

I have a csv file received in a bad format (do not have control on the app that generates this CSV)

the headers of CSV and the first line are like the following:

"Start Time"
"End Time"
"Service"

"255/06:06:54","255/06:54:42","S2 AVAIL"

This is the code i use to read the csv:

import csv
import os
import sys
rootPath = os.path.abspath(os.path.join(os.path.dirname( __file__ ), '..'))
inputFile = open(rootPath + '\\input\\' + sys.argv[1], 'rt')
sys.path.append(rootPath + '\\common')
    for row in csv.reader(inputFile, dialect='excel'):
        if row:
            print(row)

This is the output I receive:

['"Start Time"']
['End Time']
['Service']
['255/06:06:54', '255/06:54:42', 'S2 AVAIL']

The first problem is that strange character (may an encoding option missing?) also the header is wrong and cannot use DictReader on that format, which for the edit I have to do with the CSV are useful.

I could re-write a new CSV with the header correctly formatted, that is not a problem, but I do not know how to skip the first 3 lines of the CSV!? Or can I read it with the format CSV is coming?

This is the output I wish to obtain with csv.reader:

['Start Time', 'End Time', 'Service']
['255/06:06:54', '255/06:54:42', 'S2 AVAIL']

or with csv.DictReader:

OrderedDict([('Start Time', '255/06:06:54'), ('End Time', '255/06:54:42'), ('Service', 'S2 AVAIL')])
AtomiX84
  • 182
  • 1
  • 12
  • The given file does not look like valid CSV - why are there line breaks within the header? – Nico Haase Sep 12 '18 at 14:02
  • 2
    The strange characters at the start is the Byte Order Mark (BOM). 1 of the comments on https://stackoverflow.com/questions/40310042/python-read-csv-bom-embedded-into-the-first-key has an example for handling a file with BOM at the start. This should solve the first of your problems. – c3st7n Sep 12 '18 at 14:03
  • `inputFile` is not defined anywhere in your sample code? – Tomalak Sep 12 '18 at 14:16
  • I really don't know why they use line break in the header, what I was think about to do is rewrite the CSV and work with my version with no new line in the header and no (BOM) mark; anyway I'll try to read it as per the link you suggest me @c3st7n – AtomiX84 Sep 12 '18 at 14:19
  • I forghet to past it @Tomalak – AtomiX84 Sep 12 '18 at 14:20
  • The file has been created by a Windows application and saved in UTF-8 encoding with a Byte Order Mark (BOM). Open the file in Python using `encoding='utf-8-sig'` (sig for "signature", as the byte order mark sometimes also is called). Compare https://stackoverflow.com/a/49150749/18771 – Tomalak Sep 12 '18 at 14:36

1 Answers1

0

At the end I choose to rewrite the CSV in a correct format then I work with it, in the solution implemented also the BOM mark are ignored in the new CSV, anyway the link suggested to me about the BOM contain the fix for that problem!

here the code of my solution implementation:

import csv
import os
import sys
rootPath = os.path.abspath(os.path.join(os.path.dirname( __file__ ), '..'))
sys.path.append(rootPath + '\\common')
from function import *

inputFile = open(rootPath + '\\input\\' + sys.argv[1], 'r')
outputFile = open(rootPath + '\\input\\formatted.csv', 'w', newline='')
writeFile = csv.writer(outputFile)
writeFile.writerow(['StartTime','EndTime','Service'])
for row in csv.reader(inputFile.readlines()[3:], dialect='excel'):
    if row:
        writeFile.writerow(row)
inputFile.close()
outputFile.close()
AtomiX84
  • 182
  • 1
  • 12