0

I am building a simple scraper that connects to the azure server and executes a sql script based on the data collected. I have built my pipelines using pymssql, however I get the following error up to this point:

pymssql._pymssql.OperationalError: (2714, b"There is already an object named 'idP' in the database.DB-Lib error message 20018, severity 16:\nGeneral SQL Server error: Check messages from the SQL Server\n")

What happens I think is because scrapy is single-threaded, then it creates a new database each time. How can I create a database and keep updating it?

Here's my scraper:

import scrapy
from sqlAzure.items import SqlazureItem

class AzureSpider(scrapy.Spider):
    name = 'azure'
    #allowed_domains = ['x']
    start_urls = ['https://books.toscrape.com']

    def parse(self, response):
        items = SqlazureItem()
        container = response.xpath("//ol[@class = 'row']")
        for data in container:
            for ab in data.xpath('.//p[@class = "price_color"]//text()').getall():
                items = SqlazureItem()
                items['price'] = int(ab.split('£')[-1].split('.')[0])
                yield items

Pipelines:

from itemadapter import ItemAdapter
import pymssql

class SqlazurePipeline:
    def __init__(self):
        self.conn  = pymssql.connect("xxx.database.windows.net", "xxx", "xxx", "xxx")
        self.cursor = self.conn.cursor()

    def process_item(self, item, spider):
        self.cursor.execute("CREATE TABLE idP (price int)",
        (item['price']))
        self.conn.commit()

        return item
dollar bill
  • 258
  • 1
  • 9
  • 1
    Easiest approach is only create the table if it doesn't exist, simliar to this https://stackoverflow.com/questions/6520999/create-table-if-not-exists-equivalent-in-sql-server. But you'd be better off restructuring your code to only run the create table once – Nick.Mc Apr 04 '22 at 22:24
  • 1
    Looks like you meant to `INSERT` not `CREATE` – Charlieface Apr 04 '22 at 23:49

0 Answers0