3

I have been given an ambiguous task of automating a data extraction from various Visual FoxPro tables.

There are several pairs of .DBF and .CDX files. With the Python dbf package, I seem to be able to work with them. I have two files, an ABC.DBF and an ABC.CDX. I can load the table file using,

>>> import dbf
>>> table = dbf.Table('ABC.DBF')
>>> print(table[3])
  0 - table_key : '\x00\x00\x04'
  1 - field_1   : -1
  2 - field_2   : 0
  3 - field_3   : 34
  4 - field_ 4  : 2
  ...

>>>

It's my understanding that .cdx files are indexes. I suspect that corresponds to the table_key field. According to the author, dbf can read indexes:

I can read IDX files, but not update them. My day job changed and dbf files are not a large part of the new one. – Ethan Furman May 26 '16 at 21:05

Reading is all I need to do. I see that four classes exist, Idx, Index, IndexFile, and IndexLocation. These seem like good candidates.

The Idx class reads in a table and filename, which is promising.

>>> index = dbf.Idx(table, 'ABC.CDX')

I'm not sure how to make use of this object, though. I see that it has some generators, backward and forward, but when I try to use them I get an error

>>> print(list(index.forward()))
dbf.NotFoundError: 'Record 67305477 is not in table ABC.DBF'

How does one associate the .cdx index file to the .dbf table?

Lorem Ipsum
  • 4,020
  • 4
  • 41
  • 67
  • 1
    .idx and .cdx are not the same, and keeps multiple index "tags" within. When the .cdx has the same name as the .dbf file, then it is called "structural index" and automatically updated, used when needed. If your dbf is not doing the things low level then you wouldn't really need to do something special. Otherwise maybe you can find a library that connects via OleDb (also check Dabo dev, the developers are from VFP community). – Cetin Basoz May 31 '18 at 07:30

1 Answers1

3

.idx and .cdx are not the same, and dbf cannot currently read .cdx files.

If you need the table to be sorted, you can create an in-memory index:

my_index = table.create_index(key=lambda r: r.table_key)

You can also create a full-fledged function:

def active(rec):
    # do not show deleted records
    if is_deleted(rec):
        return DoNotIndex
    return rec.table_key

my_index = table.create_index(active)

and then loop through the index instead of the table:

for record in my_index:
    ...
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • what a hell is active variable where does it come from? – joash Jun 13 '21 at 01:25
  • @joash: In the code above, `active` is not a variable, it is the name of the function (that was just defined) that `table.create_index` will use to filter the records. – Ethan Furman Jun 14 '21 at 18:50