I have created a folder with 158 pdf files. I want to extract data of each file. Here is what I have done so far.
Importing modules
from itertools import chain
import pandas as pd
import tabulate
from tabula import read_pdf
Reading data file
data_A = read_pdf('D:\\Code\\Scraping\\DMKQ\\A.pdf', pages='all',encoding='latin1')
data_B = read_pdf('D:\\Code\\Scraping\\DMKQ\\B.pdf', pages='all',encoding='latin1')
# Generating Dataframe and print(len) for each file.
data_A_c = chain(*[data_A[i].values for i in range(0,len(data_A))])
headers=chain(data_A[0])
df_A = pd.DataFrame(data_A_c,columns=headers)
df_A.set_index('Name', inplace=True)
print(len(df_A.index))
data_B_c = chain(*[data_B[i].values for i in range(0,len(data_B))])
headers=chain(data_B[0])
df_B = pd.DataFrame(data_B_c,columns=headers)
df_B.set_index('Name', inplace=True)
print(len(df_B.index))
At the moment, I have to copy the code and change the name for each new file respectively, which is time consuming and almost impossible to perform, given that my folder has 158 files in total. Does anybody knows how to execute this entire process more efficiently?