I have a script to get an XML file from an SQL database. Here is how I do this:
library(RODBC)
library(XML)
myconn <- odbcConnect("mydsn")
query.text <- "SELECT xmlfield FROM db WHERE id = 12345"
doc <- sqlQuery(myconn, query.text, stringsAsFactors=FALSE)
doc <- iconv(doc[1,1], from="latin1", to="UTF-8")
doc <- xmlInternalTreeParse(doc, encoding="UTF-8")
However, the parsing didn't work for a particular database row, although it worked when I copied the content of this field into a separate file and parsed from the file. After two days "trial-and-error" I identified the main problem. It seems that querying short XML files this way doesn't cause any problems, but when I query larger files, the string gets chopped off after 65534 characters. Therefore, the end of the XML file is missing and the file can't be parsed.
I thought this might be an overall restriction of the ODBC connections on my computer. However, another programme that also uses ODBC to get the same XML field from the same database does this without any problems. So I guess it's an R
-specific problem.
Any ideas how to fix it?