How to handle multiple delimiters in the pd.read_csv function in python?

Question

I have a CSV file with delimiter as ';'.

The below code only handles the CSV with ';' as the delimiter.

dataset = pd.read_csv('/Data/formatted1.csv',header=0, delimiter = ';', encoding = 'unicode_escape')

But, if the CSV has the delimiter as ',' or 'tab' or '\n' then it should handle it dynamically. Is there a way to handle all of them dynamically?

I have tried the below code but it doesn't work.

df = pd.read_csv('file.csv', delim_whitespace=True)

E. Zeytinci · Accepted Answer · 2020-01-30T13:35:18.090

1

IMHO, you can use regex,

import pandas as pd

df = pd.read_csv(PATH, sep=r'[,|;\t"]+(?=\S)', engine='python')

Here are some examples,

from io import StringIO
import pandas as pd

s1 = """a|b|c
1|2|3
"""

s2 = """a;b;c
1;2;3
"""

print(pd.read_csv(StringIO(s1), sep=r'[,|;\t"]+(?=\S)', engine='python'))
print()
print(pd.read_csv(StringIO(s2), sep=r'[,|;\t"]+(?=\S)', engine='python'))

   a  b  c
0  1  2  3

   a  b  c
0  1  2  3

edited Jan 30 '20 at 13:35

answered Jan 30 '20 at 10:09

E. Zeytinci

2,642
1
20
37

Thank you so much for your answer. I tried your code. But, I am getting the below error "Expected 17 fields in line 58, saw 27. Error could possibly be due to quotes being ignored when a multi-char delimiter is used." – P H Jan 30 '20 at 12:15
Refer this https://stackoverflow.com/questions/15242746/handling-variable-number-of-columns-with-pandas-python it resolves your error. – chimbu Jan 30 '20 at 12:55
@E.Zeytinci Awesome. The new answer works perfectly. Thank you so much! – P H Jan 30 '20 at 15:05
Hi @PH if this or any answer has solved your question please consider [accepting it](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. – E. Zeytinci Feb 14 '20 at 21:01
1

@E.Zeytinci I have done it :) I didn't know about this option. – P H Feb 18 '20 at 12:44

chimbu · Answer 2 · 2020-01-30T12:58:05.677

0

Hope this helps. You can use either Delimiter or Sep. Refer Python Documentation.

\s - Matches any whitespace character; this is equivalent to the set [ \t\n\r\f\v].

import pandas as pd
my_cols = [str(i) for i in range(45)] # create some row names
dataset = pd.read_csv('/Data/formatted1.csv', sep="\s+|;|,", names=my_cols, header=None, engine="python")

Thanks to @Tai

edited Jan 30 '20 at 12:58

answered Jan 30 '20 at 12:52

chimbu

104
1
11

Thank you @chimbu. The below answer from E. Zeytinci worked for me. – P H Jan 30 '20 at 15:09

How to handle multiple delimiters in the pd.read_csv function in python?

2 Answers2