1

I'm trying to parse my csv file using Python. Each row has four element separating by commas. Eeach element is a string, but it may contain commas as well. In case an element contains a comma, that element is double quoted. Following examples show two different cases with and without quotes:

http://data.europa.eu/esco/skill/CTC_43028,"use data extraction, transformation and loading tools","ETL|extract, transform, load","<div>Integrate information from multiple applications, created and maintained by various organisations, into one consistent and transparent data structure.</div>"
http://data.europa.eu/esco/skill/SCG.TS.1.4.m.2,support company plan,follow industry guidelines|follow organisation's vision|monitor policy implementation|support company mission,<div>Act within one&#39;s work role to advance the goals and vision of the organisation.</div>

What I want is to split each row into four elements. I've tried with split function of Python, but not successful. I suppose I'll have to use regular expression, but I'm not familiar with it. Could you please give some helps? Many thanks.

lenhhoxung
  • 2,530
  • 2
  • 30
  • 61

1 Answers1

2

The csv module is what you want:

import csv

with open('file.csv') as f:
    r = csv.reader(f)
    for row in r:
        print row

['http...', 'transformation ...', 'ETL|ext ...', '<div>Integrate ...']
['http:...', 'support ...', 'follow ...', '<div>Act ...']

',' is the default delimiter and '"' the default quotechar.

user2390182
  • 72,016
  • 6
  • 67
  • 89
  • But each part now is quoted, and I'll need to get string inside the single qoutes. Regex again? – lenhhoxung Feb 12 '17 at 16:01
  • 2
    No, they are not quoted. The single quotes are just part of a string's `repr()` representation which `list`'s `str()` uses to display them. If you were to print the strings in each row individually you would see no quotes. – user2390182 Feb 12 '17 at 16:02