1

This script makes a that looks good in a dbf viewer, but OpenOffice Calc defaults to Western Europe (DOS/OS2-850 International) character set. When opened the non-ASCII characters are incorrect. When Western Europe (Windows-1252/WinLatin 1) character set is used it looks fine. MS Access 2000 interprets like DOS 850. Am I missing a setting or parameter somewhere?

import sys, os.path, string 
import dbf as dbfpp
print 'stdin:', sys.stdin.encoding, 'stdout:', sys.stdout.encoding # cp437 both if run in win cmd.exe, None, utf-8 in pywin

def main(): 
    flddef = 'SITE_CODE N(19,6);HW_CODE C(13);PROGRAM C(10);SITE_NAME C(85);START_DATE D;DESCRIPT M;ACRES N(19,6);'
    dbffile = 'X:/Import/ny/esrd/esrd2015test_ny/outdbf/test.dbf'
    if os.path.isfile(dbffile):
        os.remove(dbffile)
    dbfmake = dbfpp.Table(
        dbffile, 
        flddef, 
        codepage='cp1252',
        on_disk=True,
        )
    dbfmake.open()
    print dbfmake.codepage # cp1252 (Windows ANSI)
    descvalinit = 'The site was on Cayuga Lake ¼ mile from shore (these non-gremlins): µ©®æ§. A gremlin phrase:“only perch”.'
    print 'descvalinit in list:', [descvalinit] # ['The site was on Cayuga Lake \xbc mile from shore (these non-gremlins): \xb5\xa9\xae\xe6\xa7. A gremlin phrase:\x93only perch\x94.']
    descval = unicode(descvalinit, '1252')
    print 'descval in list:', [descval] # [u'The site was on Cayuga Lake \xbc mile from shore (these non-gremlins): \xb5\xa9\xae\xe6\xa7. A gremlin phrase:\u201conly perch\u201d.']
    datum = (33567.000000, '100000B ', 'HW', 'Fishing spot', None, descval, 18)
    print 'datum:', datum # (33567.0, '100000B ', 'HW', 'Fishing spot', None, u'The site was on Cayuga Lake \xbc mile from shore (these non-gremlins): \xb5\xa9\xae\xe6\xa7. A gremlin phrase:\u201conly perch\u201d.', 18)
    dbfmake.append(datum)
    dbfmake.close()

if __name__ == "__main__":
    main()

Using python 2.7.3 32 bit on Windows 7 x64

1 Answers1

0

dbf files have the encoding specified in a meta field contained inside the dbf file itself. Well-written programs should look at that to determine the encoding.

However, according to this answer in GIS there are some programs that look in a separate .cpg or .cst file to get the encoding.

I found (but can't refind) a forum post from 2008 discussing Calc's failure to correctly decode dbf contents.

So at this point I would try creating a test.cpg file with the single line cp1252 and see if that works (and try test.cst if it doesn't).

Community
  • 1
  • 1
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • Neither file changed the behavior for Calc or Access. BTW I mentioned Calc because I thought it demonstrated an encoding issue. I really care about getting the data into MS Access. – Dave Jorgensen Oct 26 '15 at 20:59
  • You might want to convert to csv and import that; check out [my answer here](http://stackoverflow.com/a/32772924/208880). – Ethan Furman Oct 26 '15 at 21:02
  • The data starts as csv, is slightly edited programmatically and loaded into a postgres table, so I could probably get it into Access from there. Was hoping to have one dbf that a legacy GIS program and Access could both read. – Dave Jorgensen Oct 26 '15 at 21:29