Encoding errors when reading .dat file (DBISAM table) into Pandas data frame

Question

Thanks in advance for any assistance y'all can offer.

I'm attempting to create a Pandas data frame from a .dat file (DBISAM table) generated by the Retail Edge POS software. My question was similar enough to this and when using their code, I was able to get a result, where other efforts to load the data failed entirely.

with open(fname, "rb") as f:  # binary mode
data = pd.DataFrame(
    [e.decode("latin-1") if e != b'\xa0' else None for e in l.strip().split()]
    for l in f
)
print(data.shape)
print(data.ndim)
print(data.head())

The results:
DF shape: (31626, 115)
DF dimensions: 2
Sample returned data: 0 É9☺ ♠¾Y#dË@=qÒã¼dÐ☺

In the Database System Utility I use to query store data, this table should have 27 columns and 39,310 rows, as of my latest check.

I used Chardet to try determining the correct encoding, which identifies it as Windows-1254. When I swap that in for Latin-1, I get a different error: 'charmap' codec can't decode byte 0x8e in position 11: character maps to <undefined>

Similarly, when I swap in UTF-8 encoding: 'utf-8' codec can't decode byte 0xc9 in position 0: invalid continuation byte

I have worked comfortably with Pandas on CSV and txt files, but I feel way out of my depth here. I've also tried using the StringIO and BytesIO methods, but haven't managed to retrieve the data in a meaningful form. This is my first step toward visualizing inventory and sales data for a farmer-owned grocery store, so I'm not bringing professional IT/coding abilities to the table. I'm grateful for any suggestions.

`.dat` doesn't imply a particular structure. If you can't get Pandas to connect to the database directly then you may need to find a tool that can convert the file to a format that Pandas can handle, such as csv for xlsx. Also worth checking whether the database can provide extracts in these formats. — snakecharmerb, Dec 13 '21 at 08:17
Thank you. The nuance was lost on me and I was wasting hours on the wrong problem. Exported the data as txt and now able to do the analysis I wanted. Thanks for the help! — sccx, Dec 13 '21 at 17:08

Encoding errors when reading .dat file (DBISAM table) into Pandas data frame

0 Answers0