1

I have a Pandas DataFrame with many columns out of which one column is 'value' containing an HTML web page. I am doing an Upsert query for each row of the DataFrame but I am getting the following error:

I have tried escaping the HTML with the following methods:

  1. df.value = df.value.apply(lambda x: re.escape(x))
  2. df.value = df.value.apply(lambda x: MySQLdb.escape_string(x))

Here is my function:

non_key_cols = df.columns.tolist()
    non_key_cols.remove(primary_key)

#    df.value = df.value.apply(lambda x: re.escape(x))    
    df.value = df.value.apply(lambda x: MySQLdb.escape_string(x))

    enclose_with_quote = [True if type_name.name=='object' else False for type_name in df.dtypes]
    all_cols = df.columns.tolist()
    #enclose df columns in inverted commas
    for i in range(len(enclose_with_quote)):
        if enclose_with_quote[i]:
            df[all_cols[i]] = df[all_cols[i]].apply(lambda x: '"' + x + '"')
        else:
            df[all_cols[i]] = df[all_cols[i]].apply(lambda x: str(x))


    sql = "INSERT INTO " \
    + tablename \
    + "(" + ", ".join([col for col in df.columns]) + ")" \
    + " VALUES " \
    + ", ".join(["(" + ", ".join(list(row)) + ")" for row in df.itertuples(index=False, name=None)]) \
    + " ON CONFLICT (" + primary_key + ") DO UPDATE SET " \
    + ", ".join([col + "=EXCLUDED." + col for col in non_key_cols])

    conn = _getpostgres_connection()
    cur = conn.cursor()
    cur.execute(sql)
    cur.close()
    conn.commit()
    conn.close()

This is the error I get:

 ProgrammingError: syntax error at or near "margin" LINE 1:
 ...t_of_nums_not_in_table_regex) VALUES ("<p style=\"margin: 0p...
JGH
  • 15,928
  • 4
  • 31
  • 48
  • Is this postgresql or mysql? In either case, you shouldn't be concatenating your own values as it's exceptionally unsafe. Use parameterized queries. – Jeremy Jun 13 '19 at 16:52
  • It's Postgresql. I'll look into parameterized queries. Though do you know how to escape the HTML? Using MySQLdb.escape_string(x), I was able to do for my similar MySQL project – Shreyans Jasoriya Jun 13 '19 at 20:57
  • Don't escape the HTML yourself. Use parameterized queries, which should handle the escaping for you. – Jeremy Jun 14 '19 at 12:10

1 Answers1

0

The issue that you write string within doubles quotes. In Postgres, doubles quotes means a column/table name. You would have to use single quotes for strings.

if enclose_with_quote[i]:
        df[all_cols[i]] = df[all_cols[i]].apply(lambda x: "'" + x + "'")

That being said, you will then get an error if your string contains single quotes. The safest - and simplest - way is to use a parameterized query, which will handle the quote escaping by itself. Else have a look at this post for using a custom string delimiter.

JGH
  • 15,928
  • 4
  • 31
  • 48