0

First Check the website for link, then get all the links. I need help to the check the mysql, if the links is already there, if those exist then don't insert them, if some of them doesnt exist, then insert them.

  created_at = time.strftime("%Y/%d/%m/ %H:%M:%S")
afdelings = 'it-support'

url = 'www.careerjet.dk/sog/jobs?s=L%C3%A6rling&l=Danmark'
r  = requests.get("http://" +url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
side1 = "http://www.careerjet.dk/"
cur = connect.cursor()

for link in soup.select('.title > a'):
  linkfrom = side1 + (link.get('href'))
  f = string.split(linkfrom, '\n')
  for line in f:
    if ("""SELECT count(*) from jobtest WHERE link = %s""", (line)) == 0:
      cur.execute("""INSERT INTO jobtest (afdeling, dato, link) VALUES (%s, %s, %s)""", (afdelings, created_at, line))

with connect:
  connect.commit()

connect.close()

please any help is deeply appreciated.

user1979911
  • 3
  • 1
  • 3

1 Answers1

1

You need to execute the select first.

Some thing like this

 created_at = time.strftime("%Y/%d/%m/ %H:%M:%S")
 afdelings = 'it-support'

 url = 'www.careerjet.dk/sog/jobs?s=L%C3%A6rling&l=Danmark'
 r  = requests.get("http://" +url)
 data = r.text
 soup = BeautifulSoup(data, "html.parser")
 side1 = "http://www.careerjet.dk/"
 cur = connect.cursor()

 for link in soup.select('.title > a'):
   linkfrom = side1 + (link.get('href'))
   f = string.split(linkfrom, '\n')
   for line in f: 

     #-------ADDED CODE
     data_tmp = """SELECT count(*) from jobtest WHERE link = %s""", (line)
     data_tmp = cur.fetchall()
     #-------END ADDED CODE

     if (data_tmp == 0 ) :
       cur.execute("""INSERT INTO jobtest (afdeling, dato, link) VALUES (%s, %s, %s)""", (afdelings, created_at, line))

 with connect:
   connect.commit()

 connect.close()
ji-ruh
  • 725
  • 1
  • 7
  • 24
  • It came up with "NameError: name 'cursor' is not defined." Then I tried with "cur.fetchall(data_tmp)" and it comes with the error: "fetchall() takes exactly 1 argument (2 given)". Thanks for your help btw. Any chance you have idea about this? – user1979911 Oct 09 '15 at 06:43
  • Ok i put data_tmp = cur.execute("""SELECT count(*) from jobtest WHERE link = %s""", (line)) but still no data is inserted into the mysql. – user1979911 Oct 09 '15 at 06:48
  • Maybe because that link was already inserted based on your requirements. What is the result from `data_tmp` variable? – ji-ruh Oct 09 '15 at 08:07
  • When i print "%s" % data_tmp: I get (0L,) – user1979911 Oct 09 '15 at 08:11
  • Change `data_tmp = cursor.fetchall()` to `data_tmp = cur.fetchall()` – ji-ruh Oct 09 '15 at 08:14
  • Already did that. but then i have to change data_tmp = """SELECT count(*) from jobtest WHERE link = %s""", (line) to data_tmp = cur.execute("""SELECT count(*) from jobtest WHERE link = %s""", (line)) – user1979911 Oct 09 '15 at 08:26
  • I there any error showing? Please print cur.mogrify() – ji-ruh Oct 09 '15 at 08:29
  • It doesn't come with any errors: Where specific should i put it? I get; AttributeError: 'Cursor' object has no attribute 'mogrify' i – user1979911 Oct 09 '15 at 08:35
  • Add this `print cur.mogrify()` after `cur.fetchall()` function and after `cur.execute` – ji-ruh Oct 09 '15 at 08:46
  • data_tmp = cur.execute("""SELECT count(*) from jobtest WHERE link = %s""", (line)) print cur.mogrify() data_tmp = cur.fetchall() print cur.mogrify() like this? I still get: AttributeError: 'Cursor' object has no attribute 'mogrify' Do i need to add any library to do this? – user1979911 Oct 09 '15 at 08:52
  • Could it be, if there isnt anything in mysql that it can checkup on, thats the fault? – user1979911 Oct 09 '15 at 09:02
  • Oopps . I thought You are using postgres. Please see reference to see executed command: http://stackoverflow.com/questions/5266430/how-to-see-the-real-sql-query-in-python-cursor-execute – ji-ruh Oct 09 '15 at 09:07