1

I'm parsing a csv file using Python.

The CSV file looks like:

value1,value2,value3(a,b,c)

The Python code:

with open(file_path, 'rb') as this_file:
  reader = csv.reader(this_file, delimiter=',')
  for row in reader:
    print row

Obviously the CSV reader interprets this as:

"value1","value2","value3(","a","b","c)"

What is the best way to stop Python breaking value2() into four values?

Thanks.

miken32
  • 42,008
  • 16
  • 111
  • 154
Matt
  • 7,022
  • 16
  • 53
  • 66
  • create the csv file correctly either with the csv module or in excel or open office ... if you do this the csv manager will properly escape nested comma's – Joran Beasley Jul 29 '13 at 21:30
  • 1
    So how do you want it to interpret it? As inconvenient as it may be you can just write something yourself and use split(). That's what I'd do if there's nothing you can do about properly formatting the csv file. – Aleksander Lidtke Jul 29 '13 at 21:30
  • 3
    Your CSV file is *badly* misformatted. Put quotes around `value3` to make it a valid CSV value. – Martijn Pieters Jul 29 '13 at 21:32
  • Joran - I don't have control over the CSV generation. @aleksander-lidtke - I'd like it as val1, val2, val3(a,b,c). I was hoping to avoid split. Thanks for the replies! – Matt Jul 29 '13 at 21:33
  • 2
    And remove the spaces after the commas. In short, that's NOT a CSV file at all. You may need to write your own parser. – Lennart Regebro Jul 29 '13 at 21:35

1 Answers1

1

Here's a code that deals with the given example:

a='value1, value2, value3(a, b, c)'
split=a.split(', ')
result=[]
for ent in split:
    if ent.find('(', 0, len(ent))!=-1:
        temp=''
        for ent2 in split[split.index(ent):]:
            if ent2.find('(', 0, len(ent))!=-1:
                temp=temp+ent2
            else:
                temp=temp+','+ent2
                split.remove(ent2)
            #May need a check whether ) has not been reached yet, in which case don't add the items.
        result.append(temp)
    else:
        result.append(ent)

It will require some small checking if there exist some "normal" entries after the ones surrounded with the parentheses (as indicated in the comment), e.g.

a='value1, value2, value3(a, b, c)', 'value4'

Hope this helps. Apologies, I can't think of any way to use the in-built csv parser since your file is not, in fact, a "proper" csv...

Aleksander Lidtke
  • 2,876
  • 4
  • 29
  • 41
  • Thanks for your detailed reply. I ended up using something similar to this. Value3 was a known string, so I was able to check for that and then use list[:n] and list[n:] to obtain the desired output. Thanks again! – Matt Jul 29 '13 at 22:03