Reading Tables from PDF and converting them into Pandas Dataframe

Question

I am trying to extract tabular data from pdf and storing them as data frame. But tabular data is not coming in a proper format.

Below is the data frame i am getting :

But I want that data frame into the below format.

Please help me how should I write a generalised code in order to do this.

kindly provide [minimal-reproducible-example](https://stackoverflow.com/help/minimal-reproducible-example) — Anurag Dabas, Jun 10 '21 at 17:57
Please include any relevant information [as text directly into your question](https://stackoverflow.com/editing-help), do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/15497888) — Henry Ecker, Jun 10 '21 at 17:58
If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). — Henry Ecker, Jun 10 '21 at 17:58

score 0 · Answer 1 · answered Jun 10 '21 at 18:22

0

Rename your columns with:

df.columns = ['Colour', 'Size', 'Base Size', 'Value', 'Base Amount', 'Absolute', 'Approx']

And drop the first two rows with:

df.drop([0, 1], inplace=True)

answered Jun 10 '21 at 18:22

Ollie in PGH

There's a deleted answer that is a duplicate of this. The comment from OP on that answer was "this will not work in case I don't know the column names already." – Henry Ecker Jun 10 '21 at 18:24

1 Answers1