0

I have a CSV file with delimiter as ';'.

The below code only handles the CSV with ';' as the delimiter.

dataset = pd.read_csv('/Data/formatted1.csv',header=0, delimiter = ';', encoding = 'unicode_escape')

But, if the CSV has the delimiter as ',' or 'tab' or '\n' then it should handle it dynamically. Is there a way to handle all of them dynamically?

I have tried the below code but it doesn't work.

df = pd.read_csv('file.csv', delim_whitespace=True)

P H
  • 294
  • 1
  • 3
  • 16

2 Answers2

1

IMHO, you can use regex,

import pandas as pd

df = pd.read_csv(PATH, sep=r'[,|;\t"]+(?=\S)', engine='python')

Here are some examples,

from io import StringIO
import pandas as pd

s1 = """a|b|c
1|2|3
"""

s2 = """a;b;c
1;2;3
"""

print(pd.read_csv(StringIO(s1), sep=r'[,|;\t"]+(?=\S)', engine='python'))
print()
print(pd.read_csv(StringIO(s2), sep=r'[,|;\t"]+(?=\S)', engine='python'))

   a  b  c
0  1  2  3

   a  b  c
0  1  2  3
E. Zeytinci
  • 2,642
  • 1
  • 20
  • 37
  • Thank you so much for your answer. I tried your code. But, I am getting the below error "Expected 17 fields in line 58, saw 27. Error could possibly be due to quotes being ignored when a multi-char delimiter is used." – P H Jan 30 '20 at 12:15
  • Refer this https://stackoverflow.com/questions/15242746/handling-variable-number-of-columns-with-pandas-python it resolves your error. – chimbu Jan 30 '20 at 12:55
  • @E.Zeytinci Awesome. The new answer works perfectly. Thank you so much! – P H Jan 30 '20 at 15:05
  • Hi @PH if this or any answer has solved your question please consider [accepting it](https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. – E. Zeytinci Feb 14 '20 at 21:01
  • 1
    @E.Zeytinci I have done it :) I didn't know about this option. – P H Feb 18 '20 at 12:44
0

Hope this helps. You can use either Delimiter or Sep. Refer Python Documentation.

\s - Matches any whitespace character; this is equivalent to the set [ \t\n\r\f\v].

import pandas as pd
my_cols = [str(i) for i in range(45)] # create some row names
dataset = pd.read_csv('/Data/formatted1.csv', sep="\s+|;|,", names=my_cols, header=None, engine="python")

Thanks to @Tai

chimbu
  • 104
  • 1
  • 11