18

GEDCOM is a standard for exchanging genealogical data.

I've found parsers written in

but none so far written in Python. The closest I've come is the file libgedcom.py from the GRAMPS project, but that is so full of references to GRAMPS modules as to not be usable for me.

I just want a simple standalone GEDCOM parser library written in Python. Does this exist?

BioGeek
  • 21,897
  • 23
  • 83
  • 145

6 Answers6

10

A few years ago I wrote a simplistic GEDCOM to XML translator in Python as part of a larger project. I found that dealing with the GEDCOM data in an XML format was much easier (especially when the next step involved XSLT).

I don't have the code online at the moment, so I've pasted the module into this message. This works for me; no guarantees. Hope this helps though.

import codecs, os, re, sys
from xml.sax.saxutils import escape

fn = sys.argv[1]

ged = codecs.open(fn, encoding="cp437")
xml = codecs.open(fn+".xml", "w", "utf8")
xml.write("""<?xml version="1.0"?>\n""")
xml.write("<gedcom>")
sub = []
for s in ged:
    s = s.strip()
    m = re.match(r"(\d+) (@(\w+)@ )?(\w+)( (.*))?", s)
    if m is None:
        print "Error: unmatched line:", s
    level = int(m.group(1))
    id = m.group(3)
    tag = m.group(4)
    data = m.group(6)
    while len(sub) > level:
        xml.write("</%s>\n" % (sub[-1]))
        sub.pop()
    if level != len(sub):
        print "Error: unexpected level:", s
    sub += [tag]
    if id is not None:
        xml.write("<%s id=\"%s\">" % (tag, id))
    else:
        xml.write("<%s>" % (tag))
    if data is not None:
        m = re.match(r"@(\w+)@", data)
        if m:
            xml.write(m.group(1))
        elif tag == "NAME":
            m = re.match(r"(.*?)/(.*?)/$", data)
            if m:
                xml.write("<forename>%s</forename><surname>%s</surname>" % (escape(m.group(1).strip()), escape(m.group(2))))
            else:
                xml.write(escape(data))
        elif tag == "DATE":
            m = re.match(r"(((\d+)?\s+)?(\w+)?\s+)?(\d{3,})", data)
            if m:
                if m.group(3) is not None:
                    xml.write("<day>%s</day><month>%s</month><year>%s</year>" % (m.group(3), m.group(4), m.group(5)))
                elif m.group(4) is not None:
                    xml.write("<month>%s</month><year>%s</year>" % (m.group(4), m.group(5)))
                else:
                    xml.write("<year>%s</year>" % m.group(5))
            else:
                xml.write(escape(data))
        else:
            xml.write(escape(data))
while len(sub) > 0:
    xml.write("</%s>" % sub[-1])
    sub.pop()
xml.write("</gedcom>\n")
ged.close()
xml.close()
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
7

I've taken code from mwhite's answer, extended it a bit (OK, more than just a bit) and posted at github: http://github.com/dijxtra/simplepyged. I take suggestions about what else to add :-)

dijxtra
  • 2,681
  • 4
  • 25
  • 37
5

I know this thread is pretty old, but I found it in my searches as well as this project https://github.com/madprime/python-gedcom/

The source is squeeky clean and very functional.

iLoveTux
  • 3,552
  • 23
  • 31
2

A general-purpose GEDCOM parser in Python is linked from http://ilab.cs.byu.edu/cs460/2006w/assignments/program1.html

mwhite
  • 2,041
  • 1
  • 16
  • 21
1

You could use the SWIG tool for including C libraries though the native language interface. You'll have to make calls against the C api from within Python, but the rest of your code can be Python only.

May sound a bit daunting, but once you get thing setup, using the two together won't be bad. There may be some quirks depending how the C library was written, but you'd have to deal with some no matter which option you used.

Dana the Sane
  • 14,762
  • 8
  • 58
  • 80
0

Another basic parser for the GEDCOM 5.5 format: https://github.com/rootsdev/python-gedcom-parser

BioGeek
  • 21,897
  • 23
  • 83
  • 145
  • Please don't post answers on obviously off-topic questions! [See: **Should one advise on off topic questions?**](http://meta.stackoverflow.com/q/276572/1768232) Off-topic questions can be closed and deleted, which could nullify your contribution. Here, the question is asking for an off-site resource and is on its way to closure. – Kyll Mar 07 '16 at 12:08