1

I am trying to convert a specific DBF file into a CSV format, but my script can't convert a specific value. I'm using the ydbf package in Python.

In total I have 598 DBF files to be converted, where I standardly used this code:

with ydbf.open(filename, encoding=encoding) as dbf:
   for record in dbf:
      records[index] = record
      index += 1
   df = pd.DataFrame.from_dict(records, orient='index')
df.to_csv(filename[:-4] + ".csv", index=False)

For 6 files there are some errors remaining, 5 out of those six files were solved using the code below, but the last file keeps returning the value error. I tried several encoding methods: latin, cp1250, cp1251, cp1252 and ascii, but they all return the same error:

Error occured (ValueError: invalid literal for int() with base 10: b'RE') while reading rec #0.

I also tried:

dbf = DBF(file)
dataResult = pd.DataFrame(iter(dbf))

But this returns the same error.

A sample of my dbf file:

*  Á '                    ALUPROF    C                  XB         N
                  YB         N                    LAAG       N                  VOLGNR     N!                  
 EK_BDA20-114   3.0000  20.0000RE   100 EK_BDA20-114   1.5000  19.5981RE   200 EK_BDA20-114   0.4019  18.5000RE   300 EK_BDA20-114   0.0000  17.0000RE   400 EK_BDA20-114   0.0000   0.0000RE   500 EK_BDA20-114 114.0000   0.0000RE   600 EK_BDA20-114 114.0000  17.0000RE   700 EK_BDA20-114 113.5981  18.5000RE   800 EK_BDA20-114 112.5000  19.5981RE 
try:
    df = pd.DataFrame()
    for file in dbf3_file_errors['filename']:
        with ydbf.open(file, encoding="ascii") as dbf:
            for record in dbf:
                updated_record = []
                for value in record:
                    try:
                        float_value = float(value)
                        updated_record.append(float_value)
                    except ValueError:
                        try:
                            int_value = int(value)
                            updated_record.append(int_value)
                        except ValueError:
                            try:
                                str_value = str(value)
                                updated_record.append(str_value)
                            except:
                                if value == 'RE':
                                    updated_record.append(np.nan)
                df = df.append(pd.DataFrame([updated_record]), ignore_index=True)
                break
    df.to_csv(file[:-4] + ".csv", index=False)
except Exception as e:
    print("Error occured:", e)
Luuk_148
  • 11
  • 3
  • Which line of this code is the error message coming from? (Show the traceback) – mkrieger1 Feb 12 '23 at 15:14
  • It's returned from the final exception line. Which I think means that the either float, integer or string conversion didn't work. – Luuk_148 Feb 12 '23 at 15:21
  • @mkrieger1 traceback: ValueError: invalid literal for int() with base 10: b'RE' – Luuk_148 Feb 12 '23 at 15:32
  • Why do you try to convert the string "RE" to an integer? Or why did you think that this won't fail? – mkrieger1 Feb 12 '23 at 15:52
  • @mkrieger1, this part is only error handling code. And the value (20.0000RE) in the dbf file should normally be converted without any error. So I try a float first, then an integer and then a string. But I did expect an error in the first block, but assumed that it can be converted to a string. But that doesn't work either. – Luuk_148 Feb 12 '23 at 15:59
  • Might want to see if the accepted answer [here](https://stackoverflow.com/questions/41898561/pandas-transform-a-dbf-table-into-a-dataframe) works for you – Jon Clements Feb 12 '23 at 16:21
  • On a side note... if you actually want integers were integers, then you need to do the int conversion attempt before the float... as anything that can convert to an int can be a float... so you'll never have ints in the current ordering – Jon Clements Feb 12 '23 at 16:22
  • The docs also say: *Each record is a dict, which keys are names of fields.* - so you might just be able to leave the type inference to pandas and just do: `df = pd.DataFrame(dbf)` – Jon Clements Feb 12 '23 at 16:24
  • @JonClements, thanks for your suggestions, I modified the question to provide a bit more background. I tried most methods that were available in the link you've send me, but for some reason it still doesn't work. – Luuk_148 Feb 12 '23 at 16:41

1 Answers1

0

The primary problem is that your dbf table has a numeric field with the characters "RE" in them.

The secondary problem is due to ydbf converting the values in the record before giving it to you -- so your entire chain of try/excepts is useless.

One possible solution is to edit the file in binary mode and change the REs to 0s.

Another possible solution is to use my dbf library with a custom numeric function to handle the invalid data:

import dbf

NaN = float('nan')

def fix_integer(bytes, fielddef, *ignore):
    try:
        return int(bytes)
    except ValueError:
        return NaN


table = dbf.Table(
        'path/to/table.dbf',
        field_data_types={'bad_field_name_here': fix_integer},
        )

dbf.export(table)  # which defaults to dbf.export(table, header=True, dialect='dbf')
Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • Thanks for the suggestion, valid comment about the try/except chain. I installed the dbf libary. But Table doesn't seem to be an attribute? – Luuk_148 Feb 13 '23 at 20:49
  • @Luuk_148: The module is `dbf`. `Table` is a class inside the `dbf` module. Side note: you could return `0` or `None` or whatever works best for your data (instead of `NaN`). – Ethan Furman Feb 13 '23 at 21:33