I'm working on a program where I want to compare addresses I read from a csv file to a postgres Database. (Its a Plugin for QGis) I can successfully establish a connection and also read data from the database as long as I send queries without my own parameters.
So what I do: I read a csv file and store it in a list. Then i select an output file. Next I click a button, that on click, should compare the entries in the csv file to entries in my database. If an entry from the csv file (postal code, town, address) has the exact same properties in the database, I write it into a list "Successful Matches", if one doesnt match, I write it into a List "Error List)
My problem now occurs, when i execute a statement with my own parameters. The Sql Error Message I get back says:
Invalid Byte-Sequence for Encoding UTF8: 0xdf 0x65
I think, that the error is in the first list I fill from the csv file. My addresses have special characters like öäüß...
Here is the code that is used:
This Method writes the succesfully matched addresses to a file, the failed ones to a lineEdit
def write_output_file(self):
compare_input_with_database()
try:
with open(self.outputfile, 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=';', quoting=csv.QUOTE_MINIMAL)
for row in geocoded_list:
writer.writerow(row)
if len(error_list) > 0:
self.writefailedaddresses()
raiseInformation("Es konnten nicht alle Adressen geocodiert werden!")
else:
raiseInformation("Adressen erfolgreich geocodiert!")
except csv.Error:
raiseException("Fehler beim schreiben der Datei")
This method, compares a row entry from the list/csvfile to the database.
def compare_input_with_database():
dbcursor = database_connection.open_connection()
for row in addressList:
entry = str(row[0])
addresssplit = entry.split(';')
try:
resultset = database_connection.select_specific_address(dbcursor, int(addresssplit[0]), addresssplit[1], addresssplit[2])
geocoded_list.append(resultset)
except psycopg2.DatabaseError, e:
raiseException(e)
error_list.append(addresssplit)
database_connection.close_connection()
def select_specific_address(cursor, plz, town, address):
cursor.execute("SELECT plz,ort,strasse,breitengrad,laengengrad from addresses where plz=%s AND ort=%s AND strasse=%s", (plz, town, address))
resultset = cursor.fetchone()
return resultset
This Method reads a csv file and populates it in a list
def loadFileToList(addressfile, dlg):
del addressList[:]
if os.path.exists(addressfile):
if file_is_empty(addressfile):
raiseException("Ungueltige Quelldatei! Quelldatei ist leer!")
return -1
else:
with open(addressfile, 'rb') as csvfile:
filereader = csv.reader(csvfile, delimiter=';')
for row in filereader:
addressList.append(row)
return addressList
else:
raiseException("Pfad der Quelldatei nicht gefunden!")
return -1
Thanks!
EDIT: When I display the address containing the special charachter it shows as "Hauptstra\xdfe" instead of "Hauptstraße Sorry im Bad with Encoding, is this unicode? Does that mean, that it get sends to the database like that and i nead to encode it differently?
EDIT 2: I took a look at the orkaround and tried to implement it:
def loadFileToList(addressfile, dlg):
del addressList[:]
if os.path.exists(addressfile):
if file_is_empty(addressfile):
raiseException("Ungueltige Quelldatei! Quelldatei ist leer!")
return -1
else:
#with open(addressfile, 'rb') as csvfile:
#filereader = csv.reader(csvfile, delimiter=';')
reader = unicode_csv_reader(open(addressfile))
for row in reader:
addressList.append(row)
return addressList
else:
raiseException("Pfad der Quelldatei nicht gefunden!")
return -1
def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
But now i get the following Error Message when executing the code: for row in reader:
File "C:/Users/Constantin/.qgis2/python/plugins\Geocoder\logic.py", line 46, in unicode_csv_reader yield [unicode(cell, 'utf-8') for cell in row] UnicodeDecodeError: 'utf8' codec can't decode byte 0xdf in position 19: invalid continuation byte
I just dont get why it just cant decode it -.-
UPDATE:
Some lines from my csv file:
1190;Wien;Weinberggasse
1190;Wien;Hauptstraße
1190;Wien;Kärnterstraße