13

I need to import a CSV file in Python on Windows. My file is delimited by ';' and has strings with non-English symbols and commas (',').

I've read posts:

Importing a CSV file into a sqlite3 database table using Python

Python import csv to list

When I run:

with open('d:/trade/test.csv', 'r') as f1:
    reader1 = csv.reader(f1)
    your_list1 = list(reader1)

I get an issue: comma is changed to '-' symbol.

When I try:

df = pandas.read_csv(csvfile)

I got errors:

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 2.

Please help. I would prefer to use pandas as the code is shorter without listing all field names from the CSV file.

I understand there could be the work around of temporarily replacing commas. Still, I would like to solve it by some parameters to pandas.

Community
  • 1
  • 1
Alex Martian
  • 3,423
  • 7
  • 36
  • 71

5 Answers5

17

Pandas solution - use read_csv with regex separator [;,]. You need add engine='python', because warning:

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.

import pandas as pd
import io

temp=u"""a;b;c
1;1,8
1;2,1
1;3,6
1;4,3
1;5,7
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep="[;,]", engine='python')
print (df)

   a  b  c
0  1  1  8
1  1  2  1
2  1  3  6
3  1  4  3
4  1  5  7
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

Pandas documentation says for parameters:

pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

sep : str, default ‘,’

    Delimiter to use. If sep is None, will try to automatically determine this.

Pandas did not parse my file delimited by ; because default is not None denoted for automatic but ,. Adding sep parameter set to ; for pandas fixed the issue.

Alex Martian
  • 3,423
  • 7
  • 36
  • 71
1

Unless your CSV file is broken, you can try to make csv guess your format.

import csv

with open('d:/trade/test.csv', 'r') as f1:
    dialect = csv.Sniffer().sniff(f1.read(1024))
    f1.seek(0)
    r = csv.reader(f1, dialect=dialect)
    for row in r:
        print(row)
totoro
  • 2,469
  • 2
  • 19
  • 23
0

Try to specify the encoding, you will need to find out what is the encoding of file one is trying to read.

I have used ASCII for this example, but it could be different.

df = pd.read_csv(fname, encoding='ascii')
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
0

To avoid below warning in your code,

ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'

Use property names inside of your read_csv function. Check the example for both cases where this warning comes and where it won't.

CODE THAT THROW WARNING:

selEncoding = "ISO-8859–1"

dfCovid19DS = pd.read_csv(dsSrcPath, selEncoding)

CODE WITHOUT WARNING:

selEncoding = "ISO-8859–1"

dfCovid19DS = pd.read_csv(dsSrcPath, encoding = selEncoding)
Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
Ashish
  • 1
  • Credit where due: This is essentially the same answer as @Santosh-Pathak gave two years ago ([reference](https://stackoverflow.com/a/53459498/3025856)). – Jeremy Caney Jun 15 '20 at 22:24