0

I got a python script that scrapes domain names. I just parsed the json response because it renders html code. I used pandas to read the html and get the body which is the html content. I tried to print it and then i got the correct values. Now that i got it, I want to save every result in a mysql database. How could I achieve it?

Here is my script

mydb = mysql.connector.connect(
   host="localhost",
   user="root",
   passwd="",
   database='domainscrape'
)

mycursor = mydb.cursor()
print(mydb)


pageNumber = 0
while True:
    driver.implicitly_wait(3)
    driver.get('https://reversewhois.domaintools.com/?ajax=mReverseWhois&call=ajaxGetPreviewPage&q=%5B%5B%5B%22whois%22%2C%222%22%2C%22VerifiedID%40SG-Mandatory%22%5D%5D%5D&o='+str(pageNumber))
    time.sleep(3)
    pre = driver.find_element_by_tag_name("pre").text
    data = json.loads(pre)
    if data['body']:
        table = data['body']
        tables = pd.read_html(table,skiprows=1)
        df = tables[-1]
        print(df.to_string(index=False))
        pageNumber += 1
        continue
    else:
        break

I got a result like this

  0vh-cl0ud.sg  2017-10-12                                 KEY-SYSTEMS GMBH
  0vh-cloud.sg  2017-10-12                                 KEY-SYSTEMS GMBH
  0vhcloud.sg   2017-10-12                                 KEY-SYSTEMS GMB

Tried saving it to an csv file and i got good results

df.to_csv('Domains.csv', mode='a', sep=',',index=False)

but i dont want to import csv to mysql. I just want to directly insert the rows in an existing mysql table in it.

How could i format it like 0vh-cl0ud.sg is the domain 2017-10-12 is the date and KEY-SYSTEMS GMBH is the company? I dit not include the header since in every iteration it prints the header and i dont want it.

draw134
  • 1,053
  • 4
  • 35
  • 84
  • 1
    What I understand is that you have a dataframe and you want to either create a new table or write (update) to an existing table in a mysql db. If that's the case, I would strongly suggest you to trim down the problem to focus on this aspect only. Problems that are reproducible and focus on a specific aspect, get answered quicker and often more people try answering them. – CypherX Mar 30 '20 at 07:01
  • okay got it sir @CypherX – draw134 Mar 30 '20 at 07:02
  • Take a look at this: https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database – CypherX Mar 30 '20 at 07:05
  • Probable **DUPLICATE** of [this](https://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database). – CypherX Mar 30 '20 at 07:09

1 Answers1

1

It should be something like this:

mycursor = mydb.cursor()
mycursor.execute("INSERT INTO table_name(domain, date, company) VALUES ('0vh-cl0ud.sg', '2017-10-12', 'KEY-SYSTEMS GMBH'))"

This piece should be put in the loop after the data is scraped. Please go through the aforementioned links in comments to have a better understanding of the process.

halfer
  • 19,824
  • 17
  • 99
  • 186
TulakHord
  • 422
  • 7
  • 15
  • but how can i get the values for it? like an index – draw134 Mar 30 '20 at 08:42
  • 1
    create a table with the columns domain, date and company. create another column 'index' with auto_increment properties. This way you can keep adding companies to the table. I hope i got the right understanding of the 'values' you were asking about – TulakHord Mar 30 '20 at 08:51