-1

How can I read a csv file without using any external import (e.g. csv or pandas) and turn it into a list of lists? Here's the code I worked out so far:

m = []
for line in myfile:
    m.append(line.split(','))

Using this for loop works pretty fine, but if in the csv I get a ',' is in one of the fields it breaks wrongly the line there.

So, for example, if one of the lines I have in the csv is:

12,"This is a single entry, even if there's a coma",0.23

The relative element of the list is the following:

['12', '"This is a single entry', 'even if there is a coma"','0.23\n']

While I would like to obtain:

['12', '"This is a single entry, even if there is a coma"','0.23']
Robb1
  • 4,587
  • 6
  • 31
  • 60
  • 3
    That's why you need to use a library, it knows how to parse quoted fields and escape sequences. Don't try to do it yourself, it's too difficult. – Barmar Feb 04 '21 at 16:50
  • @Barmar I have a feeling I can solve this with regular expressions, but I am really terrible at it. I still want to try first without imports! – Robb1 Feb 04 '21 at 16:51
  • 5
    Isn't using regular expressions a violation of your constraint regarding "no imports?" –  Feb 04 '21 at 16:53
  • No, I don't think you can do it with regular expressions, at least not very easily. – Barmar Feb 04 '21 at 16:54
  • 1
    @Robb1 look at t[this](https://stackoverflow.com/questions/18144431/regex-to-split-a-csv) answer for regex. Use the csv library for anything in production. – sarartur Feb 04 '21 at 16:54
  • @JustinEzequiel you are right, I didn't know i had to `import re`! Thank you both – Robb1 Feb 04 '21 at 16:54
  • 1
    The regular expression is quite complicated, so not for someone who is "really terrible at it". And I don't think it handles escaped quotes. – Barmar Feb 04 '21 at 16:56
  • Consider also I don't have to include all possible scenarios handled by the `csv` library. I just want to fix that `split()` issue – Robb1 Feb 04 '21 at 16:58
  • 1
    I think you should edit your question and clarify cases need to handled and whether / what `import`s are allowed. – martineau Feb 04 '21 at 18:03

2 Answers2

2

I would avoid trying to use a regular expression, but you would need to process the text a character at a time to determine where the quote characters are. Also normally the quote characters are not included as part of a field.

A quick example approach would be the following:

def split_row(row, quote_char='"', delim=','):
    in_quote = False
    fields = []
    field = []
    
    for c in row:
        if c == quote_char:
            in_quote = not in_quote
        elif c == delim:
            if in_quote:
                field.append(c)
            else:
                fields.append(''.join(field))
                field = []
        else:
            field.append(c)
            
    if field:
        fields.append(''.join(field))
            
    return fields
    
    
fields = split_row('''12,"This is a single entry, even if there's a coma",0.23''')
print(len(fields), fields)

Which would display:

3 ['12', "This is a single entry, even if there's a coma", '0.23']

The CSV library though does a far better job of this. This script does not handle any special cases above your test string.

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
0

Here is my go at it:

line ='12, "This is a single entry, more bits in here ,even if there is a coma",0.23 , 12, "This is a single entry, even if there is a coma", 0.23\n'

line_split = line.replace('\n', '').split(',')

quote_loc = [idx for idx, l in enumerate(line_split) if '"' in l]
quote_loc.reverse()

assert len(quote_loc) % 2 == 0, "value was odd, should be even"

for m, n in zip(quote_loc[::2], quote_loc[1::2]): 
  line_split[n] = ','.join(line_split[n:m+1])
  del line_split[n+1:m+1]


print(line_split)
Maxwell Redacted
  • 569
  • 4
  • 10