0

I am trying to fix the following error. But i am not finding any solution. can anyone help me with this? When i run this code sometimes it runs the code, but sometimes it displays the below error. Below is the code with the error

import requests
from bs4 import BeautifulSoup
import mysql.connector

mydb = mysql.connector.connect(host="localhost", user="root",passwd="", database="python_db")
mycursor = mydb.cursor()
#url="https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN=U01224KA1980PLC003802"
#query1 = "INSERT INTO csr_details(average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES()"
mycursor.execute("SELECT cin_no FROM tn_cin WHERE csr_status=0")
urls=mycursor.fetchall()
#print(urls)

def convertTuple(tup):
   str =  ''.join(tup)
   return str
for url in urls:
    str = convertTuple(url[0])
    headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}
    csr_link = 'https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN='
    link = csr_link+str
    #print(link)
    response=requests.get(link, headers=headers) 
    #print(response.status_code)
    bs=BeautifulSoup(response.text,"html.parser")
    div_table=bs.find('div', id = 'colfy4')
    if div_table is not None:
        fy_table = div_table.find_all('table', id = 'employee_data')
        if fy_table is not None:
            for tr in fy_table:
                td=tr.find_all('td')
                if len(td)>0:
                    rows=[i.text for i in td]
                    row1=rows[0]
                    row2=rows[1]
                    row3=rows[2]
                    row4=rows[3]
                    #cin_no=url[1]
                    #cin=convertTuple(url[1])
                    #result=cin_no+rows
                    mycursor.execute("INSERT INTO csr_details(cin_no,average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES(%s,%s,%s,%s,%s)",(str,row1,row2,row3,row4))
                    #print(cin)
                    #print(str)
                    #var=1
                    status_update="UPDATE tn_cin SET csr_status=%s WHERE cin_no=%s"
                    data = ('1',str)
                    mycursor.execute(status_update,data)
                    #result=mycursor.fetchall()
                    #print(result)
                    mydb.commit()

I am getting following error after running the above code

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35

1 Answers1

0

The error

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

is often an error caused on the server-side with the error normally classified under the status code of 5xx. The error simply suggests an instance in which the server is closed before a full response is delivered.

I believe it's likely caused by this line

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}

which in some cases has issues with the header values. You may simply try to set the header as

response=requests.get(link, headers={"User-Agent":"Mozilla/5.0"})

and see if that solves your problem.

See this answer for user-agents for a variety of browsers.

AzyCrw4282
  • 7,222
  • 5
  • 19
  • 35
  • Thanks for the help. But this has not solved the error. Still the same error is showing – K Sreejith Puranik Jun 11 '20 at 05:01
  • You mentioned that it works sometimes - are you able notice anything different in your running environment at the time when it works and doesn't? You may also want to debug your code line by line to locate the exact point. – AzyCrw4282 Jun 11 '20 at 16:22