0

I am fairly new to programming and I've been trying to implement this code in Python, using PyCharm. I'm running the code via a remote server, using PyCharm on my local computer. It was written by a former colleague, and has been giving a lot of encoding issues since we updated the packages like MySQL and the Python interpreter to 3.8. The MySQL version is 8.0, but this is an update. That was not the version installed originally when the code was written.

This is the full error that I am getting:

findBestMatch *** WARNING: FoundCity[i] = {'Country': 'Austria', 'Page': 'Contact', 'Confidence': 10, 'Mentions': 1} WriteToDB Problem <class 'MySQLdb._exceptions.ProgrammingError'> Traceback (most recent call last): File "/remotepath/TextMining/NER/FindLocationStoreSQL.py", line 399, in WriteToDB(c.title(), cn, idProject, 10, "Contact", "v1", cursor, db, database_country) File "/remotepath/TextMining/NER/FindLocationStoreSQL.py", line 286, in WriteToDB cursor.execute(sql,values) File "/home/localhost/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute res = self._query(query) File "/home/localhost/.local/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query db.query(q) File "/home/localhost/.local/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query _mysql.connection.query(self, query) MySQLdb._exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'City text mined, Country from datasource'',Confidence='10',FoundWhere=''Contact'' at line 1") Process finished with exit code -1

The code sample of the function it is trying to run is below:

def findBestMatch(FoundCity,FoundCountry,database_country):
pair_candidates = []
for i in range(0,5):
    for j in range(0,5):
        if len(FoundCity)>i and len(FoundCountry)>j:
            city_i = FoundCity[i].get('City')
            country_j = FoundCountry[j].get('Country')
            if city_i != None and country_j != None:
                #sql = "SELECT City,Country_CountryName,Longitude,Latitude FROM Semanticon.City where city like '{0}' and Country_CountryName like '{1}' and Population>0 order by Population desc".format(FoundCity[i]['City'].encode('utf-8'),FoundCountry[j]['Country'].encode('utf-8'))
                sql = "SELECT City,Country_CountryName,Longitude,Latitude FROM Semanticon.City where city like '{0}' and Country_CountryName like '{1}' and Population>0 order by Population desc".format(
                    FoundCity[i]['City'], FoundCountry[j]['Country'])
              
                try:
                   cursor.execute(sql)
                except:
                    db = MySQLdb.connect(host, username, password, database, charset='utf8')
                    db.set_character_set("utf8")
                    cursor = db.cursor()
                    cursor.execute(sql)
                resul = cursor.fetchall()
                if len(resul)>0:
                    pair_candidates.append({"City":FoundCity[i]['City'],"Country":FoundCountry[j]['Country'],"Score":(FoundCity[i]["Confidence"]+FoundCountry[j]["Confidence"]+0.5*(FoundCity[i]["Mentions"]+FoundCountry[j]["Mentions"]))})
                    #return FoundCity[i]['City'],FoundCountry[j]['Country'],FoundCity[i]['Confidence']
            else:
                if city_i == None:
                    print("*** WARNING: FoundCity[i] = ", FoundCity[i])
                else:
                    print("*** WARNING: FoundCountry[j] = ", FoundCountry[j])

I had to take the encoding out, hence the commented out 'sql' line. The encoding was causing problems and adding an extra 'b' to the string to be read from the database.

The 'WriteToDB' function that it's complaining about is below:

def WriteToDB(City,Country,ProjectId,Confidence,Location,Version,cursor,db,database_country):
sql = None
if database_country!="":
    if database_country == Country:
        if City != "":
            #sql = "UPDATE  ProjectLocation SET City='{0}',DataTrace='{1}',Confidence={2},FoundWhere='{3}' WHERE Projects_idProjects={4} and Country='{5}';".format(City," City text mined, Country from datasource",Confidence,Location,ProjectId,original_database_cntry)
            sql = "UPDATE  ProjectLocation SET City='%s',DataTrace='%s',Confidence='%s',FoundWhere='%s' WHERE Projects_idProjects='%s' and Country='%s';"
            values = (City, " City text mined, Country from datasource", Confidence, Location, ProjectId,
                original_database_cntry)
    if database_country!=Country:
        if Country.encode('utf-8') in database_country:
            #sql = "UPDATE  ProjectLocation SET City='{0}',DataTrace='{1}',Confidence={2},FoundWhere='{3}' WHERE Projects_idProjects={4} and Country='{5}';".format(
               # City, " City text mined, Country from datasource", Confidence, Location, ProjectId,
                #original_database_cntry)

            sql = "UPDATE  ProjectLocation SET City='%s',DataTrace='%s',Confidence='%s',FoundWhere='%s' WHERE Projects_idProjects='%s' and Country='%s';"
            values = (City, " City text mined, Country from datasource", Confidence, Location, ProjectId,
                original_database_cntry)
        else:
            print("Country conflict in project:"+str(ProjectId))
else:
    #sql = "Insert into ProjectLocation (Type,City,Country,Projects_idProjects,Original_idProjects,IsLocationFromDataset,Confidence,FoundWhere,Version,DataTrace)" \
     # "Values ('{0}','{1}','{2}',{3},{4},0,{5},'{6}','{7}','{8}')".format("Main",City,Country,ProjectId,ProjectId,Confidence,Location,Version,"Both minded from text v0.1")

    sql = "Insert into ProjectLocation (Type,City,Country,Projects_idProjects,Original_idProjects,IsLocationFromDataset,Confidence,FoundWhere,Version,DataTrace)" \
          "Values ('%s','%s','%s','%s',0,'%s','%s','%s','%s','%s')"
    values = ("Main", City, Country, ProjectId,ProjectId, Confidence, Location, Version, "Both minded from text v0.1")
if sql!=None:
    cursor.execute(sql,values)
    db.commit()

I commented out the SQL queries as shown and tried to bind them instead, because it was giving a lot of encoding errors. I'm not sure how to get rid of this error and not end up with the encoding errors yet again.

Can someone help?

UPDATE to the question.

I reverted back all the sql queries, and used the queries with the encoding now (previously commented out) and I am getting 'b's in the output.

Any suggestions on how to properly encode these SQL queries so the binary encoding does not come out as b's in the output?

Here is a sample of the output:

ProjectID,ProjectName,FoundCity,FoundCountry,DatabaseCity,DatabaseCountry,Confidence,FoundWhere,Website 2542, Migrantour Country,,b'',b'',b'',10,Contact,link 3938,GeoSmartCity,,,b'',b'',10,Contact,link

Phoenix
  • 35
  • 7
  • What were the encoding errors that you were getting? Is the data ASCII or some sort of unicode variant? – Ben Nov 26 '20 at 22:14
  • @Ben Hi, I was able to get the code to run without the encoding errors, but I keep getting 'b's in the output. The data is ASCII. Here is a sample of the output: – Phoenix Nov 30 '20 at 16:55
  • @Ben Hi, I was able to get the code to run without the encoding errors, but I keep getting 'b's in the output. The data is ASCII. Here is a sample of the output: _ProjectID,ProjectName,FoundCity,FoundCountry,DatabaseCity,DatabaseCountry,Confidence,FoundWhere,Website 2542, Migrantour Country,,b'',b'',b'',10,Contact,http://www.mygrantour.org/ 3938,GeoSmartCity,,,b'',b'',10,Contact,http://www.geosmartcity.eu/ 4461,Museomix community,,b'',b'',b'',10,Contact,http://community.museomix.org/ 4632,Patients like me,,,b'',b'',10,Contact,https://www.patientslikeme.com_ – Phoenix Nov 30 '20 at 16:56
  • 1
    I believe those b characters indicate that the data is actually in bytes, rather than ASCII, format. When you say 'output', is that from a particular print statement? Is it shown in your code sample? This link might help: https://stackoverflow.com/questions/14010551/how-to-convert-between-bytes-and-strings-in-python-3?answertab=votes#tab-top – Ben Nov 30 '20 at 17:41
  • 1
    Thanks @Ben. I guess I should have given a little explanation on what the code is doing. It takes crawled webpages , analyses them, and using NER identifies locations. These locations are then mapped to a database to return the City, or Country, etc. So there is a two way entry and retrieval from the database, as updates to the database are also made. I'm guessing this is where the encoding and decoding problem originate from. I will try the suggestions on the link. Thanks! – Phoenix Dec 01 '20 at 16:44
  • Thanks for the info, @Phoenix. BTW, do you know if the b characters are stored in the db or are they being produced by Python? A quick way to check would be to query the DB directly and see if they appear in the output. If they're stored in the DB, the problem may be in the scraping, rather than the reporting. Best of luck! – Ben Dec 01 '20 at 18:33

0 Answers0