8

how to detect list separator in users machine with Python?

CSV file needs to be created on users machine and the list separator must be detected automatically (so that excel can read the CSV file).

I've found out that Excel takes CSV elements separator from "Regional Options -> Numbers -> List separator". locale module in Python is used to detect cultural settings, but it (locale.localeconv) does not contain list separator. Opening CSV writer with dialect='excel' does not help. Any idea how to get the correct separator?

EDIT

The following code seems to work (but can't accept any upvotes as the solution is not mine)

import locale

langlocale = locale.getdefaultlocale()[0]
locale.setlocale(locale.LC_ALL, langlocale)
dp = locale.localeconv()['decimal_point']
delimiter = ','
if dp == ',':
    delimiter = ';'
bitman
  • 547
  • 2
  • 6
  • 23
  • You don't need the call to `getdefaultlocale` --- see my answer. – John Machin Oct 13 '11 at 20:49
  • As an aside, Excel has this all wrong. With CSV, commas are always the field delimiter, regardless of locale; otherwise it wouldn't be called CSV! The proper way to read and write CSV containing commas in fields is to simply enclose the fields in double quotes. Any CSV reader/writer doing otherwise is faulty. – Beejor Aug 27 '18 at 22:32

3 Answers3

4

Provided the idea to read the list separator symbol from the Windows registry.

Provided the code to access Windows Registry values.

Using the _winreg package, the Windows list separator value can be retrieved from the registry as follows:

from _winreg import *

def getListSeparator():
    '''Retrieves the Windows list separator character from the registry'''
    aReg = ConnectRegistry(None, HKEY_CURRENT_USER)
    aKey = OpenKey(aReg, r"Control Panel\International")
    val = QueryValueEx(aKey, "sList")[0]
    return val

print getListSeparator()
4

Write an XLS file with xlwt.

Take 2: Use the locale module and some heuristics:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, '') # set to user's locale, not "C"
'English_Australia.1252'
>>> dec_pt_chr = locale.localeconv()['decimal_point']
>>> if dec_pt_chr == ",":
...     list_delimiter = ";"
... else:
...     list_delimiter = ","
...
>>> print repr(dec_pt_chr), repr(list_delimiter)
'.' ','
>>> locale.setlocale(locale.LC_ALL, 'French_France.1252')
'French_France.1252'
>>> dec_pt_chr = locale.localeconv()['decimal_point']
>>> if dec_pt_chr == ",":
...     list_delimiter = ";"
... else:
...     list_delimiter = ","
...
>>> print repr(dec_pt_chr), repr(list_delimiter)
',' ';'
>>>
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • This logic concludes that only if the comma is used as decimal point then excel uses the semicolon as csv separator. However this is not correct in every case. In German speaking Switzerland for example, we use the point ('.') as decimal point but the semicolon as csv separator. This is probably 'inherited' due to proximity to Germany where the comma is used as decimal point. Sadly I did not find any alternative to your proposed solution that would work on all systems and in all locales. – Svenito Jul 22 '22 at 08:13
2

I use sniff to autodetect it:

import csv

dialect = csv.Sniffer().sniff(file.readline())
file.seek(0)
file.readline()
file.seek(0)
fieldnames=( 'assignatura', 'professor', 'grup',  )
reader = csv.DictReader(file, fieldnames=fieldnames, dialect=dialect )

for row in reader:

        codiFranja = unicode(row['assignatura'],'iso-8859-1')
        ...

EDITED:

If you are planning to create a csv from python and read it from excel, then you need to create file with locale delimiter. But python csv module don't use locale: http://mail.python.org/pipermail/csv/2003-May/000507.html

Andreas> This could be really simple to implement using the locale module. But I took a short look at the locale module and it seems like there is no way to get the list separator sign (probably it's not POSIX complaint).

A workaround may be to write delimiters on a configuration file on each python user client instalation based on locale desktop computer and read this delimiters when python make csv file.

Also you can write vbscript code that creates and excel file and exports it to csv then look for delimiter each time that you need to create your python csv file.

Perhaps the most elegant solution is to use schema.ini: http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx . You can generate the csv file and, in the same foleder, schema.ini file.

dani herrera
  • 48,760
  • 8
  • 117
  • 177
  • csv does not exist in advance. It must be created in users machine and the separator in csv file (probably either . or ;) must be based on regional settings in users machine (not in my pc) – bitman Oct 13 '11 at 08:31
  • I understand. You want to create csv file from python with user machine cvs delimited configuration. You need this behavior in order to file can be read from excel. Is this? – dani herrera Oct 13 '11 at 09:01
  • 2
    Wikipedia says that change .csv to .txt may be a solucion: Microsoft Excel will open .csv files, but depending on the system's regional settings, it may expect a semicolon as a separator instead of a comma, since in some languages the comma is used as the decimal separator. Also, many regional versions of Excel will not be able to deal with Unicode in CSV. One simple solution when encountering such difficulties is to change the filename extension from .csv to .txt; then opening the file from an already running Excel with the "Open" command. (http://en.wikipedia.org/wiki/Comma-separated_values) – dani herrera Oct 13 '11 at 09:09
  • .txt seems to be a workaround, but preferably I cannot ask user to go through the text import wizard (especially as the new csv files will be created often). – bitman Oct 13 '11 at 09:22
  • ok, here my second workaround. Use a schema.ini file to overwrite regional setting csv delimited: http://msdn.microsoft.com/en-us/library/windows/desktop/ms709353(v=vs.85).aspx. Python csv module don't use locale: http://mail.python.org/pipermail/csv/2003-May/000507.html – dani herrera Oct 13 '11 at 09:28