0

I need to read different CSV files without knowing any details about them beforehand. Some of these are separated by actual commas, but some actually separated by semicolons. I understand that I can specify the delimiter/separator as parameters, as in the following examples:

import pandas as pd

data = pd.read_csv(file_path, encoding='utf-8', sep=',')
# or
data = pd.read_csv(file_path, encoding='utf-8', sep=';')
import csv

with open(file_path, newline='\n', encoding='utf-8') as f:
    reader = csv.reader(f, delimiter=',')
    # or
    reader = csv.reader(f, delimiter=';')
    # ...

However, I haven't found a straightforward way of finding out which of the delimiters I should use. For the moment I settled on parsing the header and essentially counting the amount of commas and semicolons to decide which one I should use, but that just feels like a workaround... Is there another way to identify the separator in a CSV file?

Thank you for the help!!

Victor
  • 33
  • 5
  • are you using pandas or the csv library in the final output? – bherbruck Sep 08 '20 at 19:06
  • Check this answer: https://stackoverflow.com/questions/3952132/how-do-you-dynamically-identify-unknown-delimiters-in-a-data-file – IoaTzimas Sep 08 '20 at 19:08
  • Maybe you can use `csv.Sniffer` class? https://docs.python.org/3.8/library/csv.html#csv.Sniffer In the documentation there's example how to use it. – Andrej Kesely Sep 08 '20 at 19:08

0 Answers0