Best way to ignore subsequent commas

Question

I have a text file that I'm reading as a CSV. It has two fields per row, however the second field may or may not have multiple commas. Let's pretend that this is the data:

group-a,cats
group-b,dogs
group-c,snakes, turtles, lizards
group-d,fish, eels
group-e,people

I'm trying to have the txt file generated so that quotes are put around each field, but if that's not possible, what's the best way to reliably parse this so that commas after the first comma (first field never has commas) are effectively ignored?

For what it's worth, I'm using python3.

score 2 · Accepted Answer · answered Apr 18 '18 at 16:16

2

You can pass an optional parameter maxsplit to str.split(), so you can split your lines on only the first comma:

with open("myfile.csv") as f:
    myData = [line.strip().split(",", 1) for line in f]

print(myData)
#[['group-a', 'cats'],
# ['group-b', 'dogs'],
# ['group-c', 'snakes, turtles, lizards'],
# ['group-d', 'fish, eels'],
# ['group-e', 'people']]

answered Apr 18 '18 at 16:16

pault

41,343
15
107
149

This is perfect, but confirm or clarify what I'm finding to be true about the use of the strip method in this example? Is it just there to remove the newline character from each line, which somehow is parsed as whitespace? (I get the same result if I use `line.strip('\n')` or if I leave it blank.) – amoreno Apr 18 '18 at 17:25
1

Yes, using `strip()` is to remove any additional whitespace or newline characters that may be at the start or end of your string. It may not be necessary, but probably won't hurt. I basically have always called `strip()` like this when reading a file- it's almost reflexive. Here's a good post with more info: https://stackoverflow.com/a/12330535/5858851 – pault Apr 18 '18 at 17:46

Austin · Answer 2 · 2018-04-18T16:26:20.753

0

Ignoring all commas after first comma:

How about a simple slicing?

with open("myfile.csv") as f:
    for line in f:
        k = line.find(',')
        print(line[:k+1] + line[k:].replace(',', ''))

Demo:

s = 'group-c,snakes, turtles, lizards'
k = s.find(',')
print(s[:k+1] + s[k:].replace(',', ''))

# group-c,snakes turtles lizards

edited Apr 18 '18 at 16:26

answered Apr 18 '18 at 16:17

Austin

25,759
4
25
48

wolfrevokcats · Answer 3 · 2018-04-18T16:26:12.837

-1

As easy as:

import re
with open('in.txt') as f:
    for line in f:
        print (re.sub(r'^([^,]+),(.*)', r'"\1","\2"', line.strip("\n")))

"group-a","cats"
"group-b","dogs"
"group-c","snakes, turtles, lizards"
"group-d","fish, eels"
"group-e","people"

edited Apr 18 '18 at 16:26

answered Apr 18 '18 at 16:09

wolfrevokcats

2,100
1
12
12

1

Thanks @pault, fixed the answer. Didn't think it would be important for such a small demo. (Put `open` inside `with`, since it takes care of resource deallocation) – wolfrevokcats Apr 18 '18 at 16:27

Best way to ignore subsequent commas

3 Answers3