I have also found some of the RODBC documentation unhelpful if you are new to R or databases or both. Hopefully this can get you started.
As the comment above suggests, the first thing you need to do is establish an ODBC connection to your database. Since you did not specify, I'll assume you are using some version of MS-SQL Server. Instructions for making a connection to MS-SQL server are available here.
Once you have established a connection on your machine (let's call it textSQL
), you are going to want to assign that connection to an object in R. Here, I call the connection con
.
require(RODBC)
con <- odbcConnect("textSQL")
If you truly want to insert each value into your TextData
from vectors (it seems like combining everything to a dataframe and then updating the db via sqlSave()
would be be preferred), then you'll just need to run the SQL to do that. Before you can do that, however, you are going to need to convert your FileName
vector into something that can be received by your integer database field. Assuming that all your filesnames all have a three character suffix (plus a period) something like the following should probably work.
# as stated in your question
FileName <- "234567.txt"
FullTextstring <- "Complete text here"
# convert FileName to int
require(stringr)
#filename length
l <- nchar(FileName)
#filename suffix
s <- 4
FileName <- as.numeric(str_sub(FileName,1,l-s))
Now that you have your filename stored as a numeric in R, we can move to populating your table on the database server. For this, we can simply make use of the sqlQuery()
function in conjunction with the paste()
function as shown below. Note that the first parameter of the function is the connection object (con
) specified above.
sqlQuery(con
,paste("insert into TextData values ("
,FileName[1]
,",\'"
,FullTextstring[1]
,"\')"))
If you have multiple filenames and strings in a given vector, just update the [1]
accordingly.
Also, please be advised that there is an upper limit on field lengths built into RODBC (I think it is at 64k as mentioned here). If you are dealing with a large corpus of text, it's a good bet that you'll run up against this limit. If this happens, you can modify the source code (it's just one line) and recompile from source.
Update per request in comment
In order to make this change, you are going to need to first download the tar ball from CRAN here.
Next, you are going to need to navigate to the C code located in the \RODBC\src
directory.
Around line 735 or so, you are going to see a comment and line of code that looks something like this:
/* sanity check as the reports are sometimes unreliable */
if (datalen > 65535) datalen = 65535;
You are going to want to change it to something like this (the new value may be bigger or smaller depending on your needs):
/* sanity check as the reports are sometimes unreliable */
if (datalen > 10000000) datalen = 10000000;
Save your changes, and then re-install your modified RODBC package using the instructions provided here.
NOTE: This solution worked for me when I was trying to bring information into R from an RODBC connection. I am a novice C programmer at best and do not completely understand all of Ripley's code. This solution will not break RODBC but it may not allow you to pipe extremely large fields from R into your database. The truncation you are experiencing may be a problem with some of your SQL server settings as opposed to the datalen
cap in RODBC.