Recently I had a similair problem, I ended up using a different method but I explored using the Sniffer Class from the CSV standard library.
I haven't used this in production but only to help find what file types are for testing prototyping, use at your own risk!
from the documentation
"Sniffs" the format of a CSV file (i.e. delimiter, quotechar) Returns
a Dialect object.
you can return the dialect object then pass dialect.delimiter to the sep
arg in pd.read_csv
'text_a.csv'
cola|colb|col
A|B|C
E|F|G
A|B|C
E|F|G
'text_b.csv'
cola\tcolb\tcol
A\tB\tC
E\tF\tG
A\tB\tC
E\tF\tG
A\tB\tC
from csv import Sniffer
sniffer = Sniffer()
def detect_delim(file,num_rows,sniffer):
with open(file,'r') as f:
for row in range(num_rows):
line = next(f).strip()
delim = sniffer.sniff(line)
print(delim.delimiter) # ideally you should return the dialect object - just being lazy.
#return delim.dedelimiter
detect_delim(file='text_a.csv',num_rows=5,sniffer=sniffer)
'|'
detect_delim(file='text_b.csv',num_rows=5,sniffer=sniffer)
'\t'