0
<xmlnode id=2 text='&quot;Hello World!&quot;' />

How do i read the "Hello World" without unescaping in python. I'm using ElementTree like this

xmlstr = """\
         <?xml version="1.0" encoding="utf-8"?>
         <nodes>
         <xmlnode id=1 text='&quot;Hello World!&quot;' />
         <xmlnode id=2 text='&quot;Hello World!&quot;' />
         </nodes>
         """
elem = ElementTree.fromstring(xmlstr)
nodes = elem.findall('xmlnode')
for row in nodes:
    print str(row.get('id'))+ ": " + row.get('text')

output

1: "Hello World"
2: "Hello World"

I want

1: &quot;Hello World&quot;
2: &quot;Hello World&quot;
vikki
  • 2,766
  • 1
  • 20
  • 26
  • 1
    The real question is *why* do you want this? Because I'm pretty sure you actually don't. – Tomalak Feb 03 '12 at 09:57
  • To insert the values into a database – vikki Feb 03 '12 at 10:09
  • And *why* would you want to insert XML-escaped values into a database? In my experience this is begging for trouble. Always put the *actual* data into the database. Escape it only when you put it back to XML/HTML/whatever. – Tomalak Feb 03 '12 at 10:13
  • assume the text is actually '"Hello World!" &quotThis is a string"' This would return; "Hello World" "This is a string. MySql would then give you an error since it thinks the string ends after the first quot it finds, which is actually before Hello – vikki Feb 03 '12 at 10:18
  • Yes. What difference does that make? :) Again, why do you want `'"Hello World!"'` in the database? In what way do you want to use that particular value? – Tomalak Feb 03 '12 at 10:21
  • mySQL gives you an error because you do not use a parameterized query to insert the string value. This is the root cause of your problem, and *this* is the issue you should address. – Tomalak Feb 03 '12 at 10:23
  • using a parameterized string means mysql will escape the values for me, i'm dealing with a huge xml file, this becomes slow as python will first unescape the strings and later the same will be escaped by mysql, i'd like to skip both steps. – vikki Feb 03 '12 at 10:46
  • Do you merely assume that or have you tried? Parameterized queries are *very* efficient, especially when used in a loop. Also, there won't be any escaping with parameters *at all*; the raw strings are fed to the database. – Tomalak Feb 03 '12 at 10:51
  • the problem is that the row string contains unescaped data like quotes(") which i dont want in the db – vikki Feb 03 '12 at 10:58

1 Answers1

0

Try this, using prepared statements:

# set up database cursor...
sql   = "INSERT foo (id, text) VALUES (%d, %s)"
elem  = ElementTree.fromstring(xmlstr)
nodes = elem.findall('xmlnode')
for row in nodes:
    valueList = (row.get('id'), row.get('text'))
    cursor.execute(sql, valueList)

As indicated, I don't think storing the escaped value in the database will do you any good. Use the actual values in the DB, it will make your life very much easier down the road.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thats exactly what i'm doing, the problem is `row.get('text')` fetches the value of text and unescapes it. The query will fail the moment it encounters `"` which is now a quote (") , as it thinks this is the end of the string. Whatever else follows is considered a part of the query and not a value. – vikki Feb 03 '12 at 11:08
  • @vikki Note the comma. You are probably using %, which is the wrong thing to do. – Tomalak Feb 03 '12 at 11:17
  • Ok the comma thing worked. Thats fine. But this does escape the string by adding backslashes. Just for the sake of speed, i'd prefer skipping the unescaping and escaping parts, since the xml already has sanitized strings. Thanks for the help so far. – vikki Feb 03 '12 at 12:19
  • @vikki: Apparently, the MySQLdb module does not support *true* parameterization (I was unaware of that). [This answer](http://stackoverflow.com/a/2424562/18771) suggests a module that does, namely [oursql](https://launchpad.net/oursql). Maybe you could give that a shot as well. If you want to continue to use MySQLdb, I'm afraid you are out of luck (but I see your point now). – Tomalak Feb 03 '12 at 12:30
  • Nope, no reason to continue using it. Thanks again, i'll have a look at oursql – vikki Feb 03 '12 at 12:40
  • @vikki If you got it working it would be nice if you shared your code and maybe some insight into the performance differences. Actual prepared statemens that use the `MYSQL_STMT` API *should* be blazingly fast. – Tomalak Feb 03 '12 at 14:44