0

I am trying to extract tabular data from pdf and storing them as data frame. But tabular data is not coming in a proper format.

Below is the data frame i am getting :

enter image description here

But I want that data frame into the below format.

enter image description here

Please help me how should I write a generalised code in order to do this.

  • 3
    kindly provide [minimal-reproducible-example](https://stackoverflow.com/help/minimal-reproducible-example) – Anurag Dabas Jun 10 '21 at 17:57
  • 3
    Please include any relevant information [as text directly into your question](https://stackoverflow.com/editing-help), do not link or embed external images of source code or data. Images make it difficult to efficiently assist you as they cannot be copied and offer poor usability as they cannot be searched. See: [Why not upload images of code/errors when asking a question?](https://meta.stackoverflow.com/q/285551/15497888) – Henry Ecker Jun 10 '21 at 17:58
  • 1
    If you need assistance formatting a small sample of your DataFrame as a copyable piece of code for SO see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker Jun 10 '21 at 17:58

1 Answers1

0

Rename your columns with:

df.columns = ['Colour', 'Size', 'Base Size', 'Value', 'Base Amount', 'Absolute', 'Approx']

And drop the first two rows with:

df.drop([0, 1], inplace=True)
Ollie in PGH
  • 2,559
  • 2
  • 16
  • 19
  • There's a deleted answer that is a duplicate of this. The comment from OP on that answer was "this will not work in case I don't know the column names already." – Henry Ecker Jun 10 '21 at 18:24