1

I have a text file that I'm reading as a CSV. It has two fields per row, however the second field may or may not have multiple commas. Let's pretend that this is the data:

group-a,cats
group-b,dogs
group-c,snakes, turtles, lizards
group-d,fish, eels
group-e,people

I'm trying to have the txt file generated so that quotes are put around each field, but if that's not possible, what's the best way to reliably parse this so that commas after the first comma (first field never has commas) are effectively ignored?

For what it's worth, I'm using python3.

amoreno
  • 83
  • 1
  • 8

3 Answers3

2

You can pass an optional parameter maxsplit to str.split(), so you can split your lines on only the first comma:

with open("myfile.csv") as f:
    myData = [line.strip().split(",", 1) for line in f]

print(myData)
#[['group-a', 'cats'],
# ['group-b', 'dogs'],
# ['group-c', 'snakes, turtles, lizards'],
# ['group-d', 'fish, eels'],
# ['group-e', 'people']]
pault
  • 41,343
  • 15
  • 107
  • 149
  • This is perfect, but confirm or clarify what I'm finding to be true about the use of the strip method in this example? Is it just there to remove the newline character from each line, which somehow is parsed as whitespace? (I get the same result if I use `line.strip('\n')` or if I leave it blank.) – amoreno Apr 18 '18 at 17:25
  • 1
    Yes, using `strip()` is to remove any additional whitespace or newline characters that may be at the start or end of your string. It may not be necessary, but probably won't hurt. I basically have always called `strip()` like this when reading a file- it's almost reflexive. Here's a good post with more info: https://stackoverflow.com/a/12330535/5858851 – pault Apr 18 '18 at 17:46
0

Ignoring all commas after first comma:

How about a simple slicing?

with open("myfile.csv") as f:
    for line in f:
        k = line.find(',')
        print(line[:k+1] + line[k:].replace(',', ''))

Demo:

s = 'group-c,snakes, turtles, lizards'
k = s.find(',')
print(s[:k+1] + s[k:].replace(',', ''))

# group-c,snakes turtles lizards
Austin
  • 25,759
  • 4
  • 25
  • 48
-1

As easy as:

import re
with open('in.txt') as f:
    for line in f:
        print (re.sub(r'^([^,]+),(.*)', r'"\1","\2"', line.strip("\n")))

"group-a","cats"
"group-b","dogs"
"group-c","snakes, turtles, lizards"
"group-d","fish, eels"
"group-e","people"
wolfrevokcats
  • 2,100
  • 1
  • 12
  • 12
  • 1
    Thanks @pault, fixed the answer. Didn't think it would be important for such a small demo. (Put `open` inside `with`, since it takes care of resource deallocation) – wolfrevokcats Apr 18 '18 at 16:27