0

I am using RQDA which is a package in to code text manually. The final rqda-file is a database. I have coded statements in texts and used different codes and subsumed them in code categories (for instance: code category "actor_party" and then the related codes "socialist", "liberal", "conservative" etc.). I finished the coding and want to perform a social network analysis with it. For this purpose, I want to create a sql database, so that each code category gets its own column with the specific codes in each row. Every code can be identified by the following attributes: catid (=code category number), fid (file identification number) & selfirst (start of each code). By doing so, the specific catid, the fid & selfirst are selected for each coded statement so that sqlite can identify each coding as unique (additionally and as you can see below in the R script, status=1 for each valid coding has to be elected too).
I use in the version 0.99.879, in the version 0.2-7 and 1.0.0.

So, the following R code is used:

library(RSQLite) # load Package RSQLite
setwd("C:/...")

system("ls *.rqda", show=TRUE)
sqlite <- dbDriver("SQLite")
#specifing the file
qdadb <- dbConnect(sqlite,"My_data.rqda")


dbListTables(qdadb)
dbListFields(qdadb, "coding") # that's where the codings are stored


catid <- dbGetQuery(qdadb, "select distinct(catid) from treecode where status = 1 ORDER BY catid")
i <- 1
table <- dbGetQuery(qdadb, "select fid, selfirst from coding where status = 1 GROUP BY fid, selfirst")
while(i <= max(catid)) {
   ids <- dbGetQuery(qdadb, paste("select cid from treecode where (catid = ",i," and status = 1)", sep=""));
   t <- dbGetQuery(qdadb, paste("select cid, fid, selfirst from coding where (cid in (", paste(as.character(ids$cid), sep="' '", collapse=","), ") and status = 1)", sep=""));
   table <- merge(table, t, by = c("fid","selfirst"), all.x = T);
   i <- i + 1;
   }
# warnings are created because of the same columns which are duplicated by the merging

colnames(table) <- c("fid", "selfirst", dbGetQuery(qdadb, "select name from codecat where status = 1")[,1]) #each code has attributed a unique f(ile)id and selfirst (it's the unique starting point of each coding)

# see below for an example of such a created table

library(car) # Companion to Applied Regression package

# years - catid = 1
table$A00_time_frame <- recode(table$A00_time_frame, '1 = 2010; 2 = 2011; 3 = 2012; 4 = 2013; 5 = 2014; 6 = 2015')

# Sources - catid = 2
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 2 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$B00_source <- recode(table$B00_source, paste0("'", paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

# Claimant type - catid = 3
ids <- dbGetQuery(qdadb, "select cid from treecode where (catid = 3 and status = 1)")[,1]
values <- dbGetQuery(qdadb, paste("select name from freecode where (id in(", paste(ids, collapse = ","), ") and status = 1)"))[,1]
table$C00_claimant_type <- recode(table$C00_claimant_type, paste0("'", 
paste(ids,"'='", values, collapse = "';'", sep=""),"'", sep=""))

and so until "catid = 20"

This works out and looks like this: example_table [and this table goes on until row 844 - only the fid is ascending]

Even though, this works out and the created table matches with the total number of codings, some mistakes are happening. Some codes aren't linked to the correct statement (even though they are linked to the correct code category, but not to the correct and coded statement)

I am still a beginner in R(studio) and can't explain what went wrong.

Has anybody an idea what could be the problem or mistake here and how it could be fixed?

Upon request, I am happy to share my file :)

Any advice or help is very welcome!!

Edit: Here is a link to a subset of my data that you can reproduce it (the file is in the rqda-format, because I think, the conversion of it might be the problem itself).
Moreover and to give you two example where to look at.

By creating the 'table' in R, the following row can be identified

1. - fid 95, selfirst 4553 and then the codings 'Welt' and then 'E02_European_Commission' + 'G10_Cameroon' later
However, if you check the coding in the original rqda-file, the code 'Cameroon' is not in this file, rather in fid 70, selfirst 5082 and in 'Welt' in year '2010'

    • fid 90, selfirst 959 and year '2011' shows the codes 'CDU' and the last row 'special claimant' displays the name 'Martin Schulz'.
      However, if you check the coding in the original rqda-file, there is no coding attached to the code 'Martin Schulz' in the subset.

I hope, these two examples illustrate the problem and give you an idea where to look at respectively what the problem is.

Sorry that I haven't provided it in the first place!

www
  • 38,575
  • 12
  • 48
  • 84
Stefan_W
  • 163
  • 3
  • 12
  • The question is too long, could you please post a reproducible example? http://stackoverflow.com/q/5963269/946850 – krlmlr May 15 '17 at 14:49

1 Answers1

1

Perhaps simplify the code first, to see better what might be going wrong? Personally I would more heavily rely on SQL rather than R to collate all information:

t <- dbGetQuery(qdadb, "SELECT codecat.name, coding.cid, coding.fid, coding.selfirst 
       FROM treecode, coding, codecat 
       WHERE treecode.cid = coding.cid 
       AND treecode.catid = codecat.catid
       AND treecode.status = 1
       AND coding.status = 1")
head(reshape(t, idvar = c("fid", "selfirst"), timevar = "name", direction = "wide"))

Not sure this is what you are looking for or whether it works any better. But it seems simpler code to evaluate.

Jos Elkink
  • 26
  • 2
  • Thanks for the help. The simplification worked well and made it easier to re-think the "SELECT" categories. As far as I can tell now, the code above by @JosElkink has to be added for 'coding.selend' cloumn. So, `t <- dbGetQuery(qdadb, "Select codecat.name, coding.cid, coding.fid, coding.selfirst, coding.selend FROM ... =1")` Selend has to be added to the `head()` as well. This addition has helped me to identify tiny coding mistakes that I haven't deleted before! – Stefan_W May 16 '17 at 12:59