6

I'm looking for information on how to read GnuCash files using python. I have read about this python-gnucash which provides Python bindings to the GnuCash library, but it takes a lot of work at the moment (e.g. dependencies, headers, etc.). The instructions are tailored for the Linux environment, and a rather old GnuCash version (2.0.x). I am running GnuCash 2.2.9. Though I can operate the Linux command line, I am running GnuCash on Windows XP.

My main objective is to read (no plans to write yet) my GnuCash files so that I can create my own visual dynamic reports using matplotlib and wxpython. I'm not yet in the mood to learn Scheme.

I hope someone can point me to a good start on this. As far as I know about GnuCash and Python, I think someone probably knows solutions of the following types:

  1. More recently updated documentation aside from this one from the GnuCash wiki
  2. Some workaround, like exporting to a certain file format for which there is a more mature Python library that can read it.

You guys might have better suggestions in addition to those mentioned.

Kit
  • 30,365
  • 39
  • 105
  • 149

6 Answers6

8

I published piecash, a python interface to SQL saved GnuCash books that uses SQLAlchemy as basis (https://github.com/sdementen/piecash).

With it you can easily access all the information contained in a book.

For instance, to iterate over all accounts in the book:

from piecash import open_book

# open a book
with open_book("some_book.gnucash", open_if_lock=True) as mybook:
    # iterate over all accounts of the book
    for account in mybook.accounts:
        print(account)

or to iterate over all the splits in the "Asset" account:

# open the book
with open_book("some_book.gnucash", open_if_lock=True) as mybook:
    # retrieve the account by its fullname
    asset = mybook.accounts(fullname="Asset")
    # iterate over all its splits
    for split in asset.splits:
        print(split)

Recent versions also allows to extract the split information directly to pandas DataFrames for easy plotting/analysis with

from piecash import open_book

# open a book
with open_book("some_book.gnucash", open_if_lock=True) as mybook:
    # extract all split information to a pandas DataFrame
    df = mybook.splits_df()

    # print for account "Asset" some information on the splits
    print(df.loc[df["account.fullname"] == "Asset",
                 ["transaction.post_date", "value"]])
sdementen
  • 412
  • 6
  • 8
  • 1
    That is not the basis for a downvote. Consider that the user is new to SO. @sdementen, take note of Jason's comments and provide details/examples of how the data can be extracted using your library. – crafter Mar 04 '15 at 07:13
  • So far, this seems to be the only option available on Windows. And a very good one, at that. – Alen Siljak Nov 24 '17 at 15:47
6

GNUCash 2.4 is out.

Can export to SQL so it's very much easier than parsing XML.

Sqlite, MySQL and PostgreSQL are supported (how cool is that!)

Chop Suey
  • 61
  • 1
  • 1
  • 1
    Will you please tell me how to activate `export to sql` function? I can't find it in the menu, only xml export (File->Export->Export Accounts). I'm using Gnucash 2.4.12. – Dennis Golomazov Jun 21 '13 at 19:25
  • 1
    I'm running gnucash 2.6 as packaged by Ubuntu 14.04. I didn't see the export to SQL option, but I noticed that the gnucash package recommended libdbd-{mysql,pgsql,sqlite} packages which I didn't have installed. Adding those made the export options appear in the interface. So there's a compiled executable with Python templating, and perl data bindings. Wierd. – mc0e Jun 05 '15 at 14:38
  • 2
    `sudo apt-get install libdbd-sqlite3` in Ubuntu to get mysql features in GnuCash 2.6. Then open your book in GnuCash and select "File" -> "Save As ..." and select "mysql" as the data format. – Dave Sep 27 '16 at 07:30
4

Are you talking about the data files? From there wiki, it looks like they are just compressed XML files. WIth Python, you can decompress them with the gzip module and then parse them with any of the available XML parsers.

ElementTree Example

>>> import xml.etree.cElementTree as ET
>>> xmlStr = '''<?xml version="1.0" encoding="UTF-8" ?>
<painting>
<img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/>
<caption>This is Raphael's "Foligno" Madonna, painted in
     <date>1511</date>?<date>1512</date>.
</caption>
</painting>
'''
>>> tree = ET.fromstring(xmlStr)  #use parse or iterparse to read direct from file path
>>> tree.getchildren()
[<Element 'img' at 0x115efc0>, <Element 'caption' at 0x1173090>]
>>> tree.getchildren()[1].text
'This is Raphael\'s "Foligno" Madonna, painted in\n    '
>>> tree.getchildren()[0].get('src')
'madonna.jpg'
Mark
  • 106,305
  • 20
  • 172
  • 230
  • +1 Thanks! That looks good for a start. I have managed to decompress it with the `gzip` module. I tried the first XML parser I saw with an example (`Expat`), but unfortunately, I couldn't parse out the tags and contents. Can you recommend which XML parser I should use, or at least get started with? – Kit Aug 05 '10 at 12:09
  • 1
    @Kit, my favorite in the standard library is cElementTree (http://docs.python.org/library/xml.etree.elementtree.html). Make sure to use the cElementTree instead of ElementTree (the former is written in C and later is pure python) for extra speed. See edits above for a little quick start. – Mark Aug 05 '10 at 13:29
  • This is great. I'm still quite stumped at dealing with namespaces in the `{URI}tag` format. Anyway, that would be a topic for another question. Thanks for your help :) – Kit Aug 06 '10 at 00:52
  • 1
    I just found out that `lxml.etree` does a better job at [handling namespaces](http://stackoverflow.com/questions/3428792/xml-and-python-get-the-namespaces-declared-in-root-element/3428820#3428820). – Kit Aug 07 '10 at 14:05
1

As Chop Suey said, GnuCash 2.4 has its own database format. If you still want to use the XML files, you can use the following script to convert from XML to a database, and then write your reports on that (e.g. gnucashconvert filename.gnucash sqlite3:////home/username/export.sqlite):

#!/usr/bin/env python

import os
import gnucash

def convert_gnucash(src_uri, target_uri):
    """Converts gnucash databases at the given uris from src to target"""
    session = gnucash.Session(src_uri)
    try:
        new_session = gnucash.Session(target_uri, is_new=True)
        try:
            new_session.swap_data(session)
            new_session.save()
        finally:
            new_session.end()
            new_session.destroy()
    finally:
        session.end()
        session.destroy()

if __name__ == "__main__":
    import sys
    if len(sys.argv) > 2:
        src_uri, target_uri = sys.argv[1], sys.argv[2]
        src_uri = ("xml://%s" % os.path.abspath(src_uri) if "://" not in src_uri else src_uri)
        target_uri = ("xml://%s" % os.path.abspath(target_uri) if "://" not in target_uri else target_uri)
        convert_gnucash(src_uri, target_uri)
    else:
        print >>sys.stderr, "Syntax %s src target" % (sys.argv[0])
David Fraser
  • 6,475
  • 1
  • 40
  • 56
1

I just published some python code that can read and interpret the sqlite3 file format used in gnucash 2.6 and higher:

https://github.com/MatzeB/pygnucash

MatzeBraun
  • 420
  • 4
  • 8
1

I've taken the sqlite approach in a django app I've written to do a similar thing (although for budgeting). See https://github.com/evandavey/OpenBudget/blob/master/openbudgetapp/management/commands/gnucash-import.py for the code.

In terms of the data itself, I've used the pandas library to handle its time series nature.

Evan Davey
  • 430
  • 1
  • 3
  • 12