0

I want to print this data in csv so that i can loop many companies for my web scraping code.

I am getting this code with the help of stackoverflow itself and want to get this printed format to excel or csv with or without Rs 149 each column .

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.zaubacorp.com/documents/KAKDA/U01122MP1985PTC002857'
res = requests.get(url)
soup = bs(res.content,'lxml')
headers = [header.text for header in soup.select('h3.pull-left')]
tables = pd.read_html(url)
items = zip(headers,tables)
for header, table in items:
    print(header)
    print(table)

**

Certificates
         Date                         Title   ₨ 149 Each
0  2006-04-24  Certificate of Incorporation  Add to Cart
1  2006-04-24  Certificate of Incorporation  Add to Cart
Other Documents Attachment
         Date Title   ₨ 149 Each
0  2006-04-24   AOA  Add to Cart
1  2006-04-24   AOA  Add to Cart
2  2006-04-24   MOA  Add to Cart
3  2006-04-24   MOA  Add to Cart
Annual Returns and balance sheet Eform
         Date                    Title   ₨ 149 Each
0  2006-04-24  Annual Return 2002_2003  Add to Cart
1  2006-04-24  Annual Return 2003_2004  Add to Cart

**

1 Answers1

0

It's really unclear exactly what you want as your expected output. But you can use pandas to write it to csv once you combine the dataframes.

import pandas as pd
import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.zaubacorp.com/documents/KAKDA/U01122MP1985PTC002857'
res = requests.get(url)
soup = bs(res.content,'lxml')
headers = [header.text for header in soup.select('h3.pull-left')]
tables = pd.read_html(url)

tables = [ table[1:] for idx, table in enumerate(tables) ]

df = pd.concat(tables)   
df.columns = headers 
df = df.reset_index(drop=True)


df.to_csv('path/to/filename.csv', index=False)
chitown88
  • 27,527
  • 4
  • 30
  • 59