Python Convert to CSV with encoding type

Question

Someone helped me with a program so that I can convert PDF files from that format to csv but they didn't specify an encoding type, Here is the code:

import os
import glob
import tabula

path="/Users/username/Downloads/"
for filepath in glob.glob(path+'*.pdf'):
    name=os.path.basename(filepath)
    tabula.convert_into(input_path=filepath, 
                        output_path=path+name+".csv",
                        pages="all")

How can I get the CSV files to be converted with the encoding to be utf-8 or cp1252

Thanks for helping

Error I'm getting

PDFs are binary files. You can't expect to be able to decode them with any text encoding, because they're not strictly text. — Brian61354270, Jan 25 '23 at 02:10

Lahcen YAMOUN · Answer 1 · 2023-01-25T02:21:27.127

0

You can use chardet library to get the resulted encoding of the file generated by tabula, and then pandas to convert to the encoding you want.

import chardet
import pandas as pd

for filepath in glob.glob(path+'name.csv'):
    with open(filepath, 'rb') as f:
        result = chardet.detect(f.read())
    df = pd.read_csv(filepath,encoding=result['encoding'])
    df.to_csv(filepath,index=False,encoding='utf-8')

edited Jan 25 '23 at 02:21

answered Jan 25 '23 at 02:07

Lahcen YAMOUN

657
3
15

Thanks for your answer this seems promising but I'm getting a ParserError on line 7 `df = pd.read_csv(filepath,encoding=result['encoding'])` – Kenny Jan 25 '23 at 02:21
Can you give me the exact error please ? – Lahcen YAMOUN Jan 25 '23 at 02:22
I just uploaded – Kenny Jan 25 '23 at 02:32
Try this: `df = pd.read_csv(filepath,encoding=result['encoding'], encoding_errors='replace')` – Lahcen YAMOUN Jan 25 '23 at 02:47
Received error `TypeError: read_csv() got an unexpected keyword argument 'errors'` but when I run it on your updated version no errors but nothing seems to be updated – Kenny Jan 25 '23 at 02:51
It's `encoding_errors` and not `errors` – Lahcen YAMOUN Jan 25 '23 at 02:53
Received another error with the original code but the edited same result nothing seems to be updated – Kenny Jan 25 '23 at 02:57

Python Convert to CSV with encoding type

1 Answers1