0

I am trying to produce a graph similar to this:

enter image description here

from a database with many tables (as x axis).

The Data:

I have several tables in the database as follow: Table1, Table2, Table3, etc Each table has 500+ rows and 10+ columns (attributes)

The Problem

one of those columns contains the condition (fair, good, very good and etc) of that message.

the query that returns this in the database:

SELECT message_condition Condition,
COUNT(message_id) NumMessage
FROM `table1` GROUP message_condition

this will return :

------------------------
Condition |  NumMessage
------------------------
          |  80 
Fair      |  20
Good      |  60
Ideal     |  50
Great     |  80

UPDATE: There are always 4 conditions with one null condition (for messages with no condition). So if I run the query for all tables, I will get same table as above but with different numbers.

Now, I want to apply these query to all tables in the database so I can produce the above graph ( with table as x axis).

I tried to use this method:

doCountQuerys <- function(con, table) {
    query <- paste('
        SELECT message_condition Condition,
        COUNT(message_id) NumMessage FROM`', table, '` GROUP message_condition', sep = '')
    ts     <- dbGetQuery(con, query)
    return(ts)
}

lists <- dbListTables(con)  # get list of all tables in the database

countz <- numeric(0)       # store the counts for all tables

for (i in 1:length(list)) {
    counta <- doCountQuerys(con, lists[i])
    countz[i] <- counta[[1]]
    #print(countz[[1]])
}

but I get this error:

## Warning in countz[i] <- counta[[1]]: number of items to replace is not a
## multiple of replacement length

I don't think I am doing this correctly,any idea how to run that query through all tables in R and produce that graph?

Rain Man
  • 1,163
  • 2
  • 16
  • 49
  • Will your query produce the same number of message conditions and their counts for each table in your database? – Tim Biegeleisen Jan 31 '16 at 03:12
  • @TimBiegeleisen no, the message count is different for each condition for each table. For example for table1, NumMessage for fair condition is 20 when for table2, NumMessage for fair condition is 80. And there are always 4 conditions with a null condition. – Rain Man Jan 31 '16 at 03:14

1 Answers1

3

A few tips.

First off, you need your dataframe to include the table name so you can group by this during plotting. Easiest way is to just add that into your query as a constant so it becomes something like

SELECT 'table1' TableName, etc etc

Just paste it into your existing query in your function:

query <-    paste0("SELECT '", table,"' TableName, COALESCE(NULLIF(message_condition, ''), 'default') message_condition, COUNT(message_id) NumMessage FROM '", table, "' GROUP BY message_condition",   sep = '')

You should also add a default category name for when your condition is null. You can do this with COALESCE or ISNULL as shown.

edit Thinking about it, you just need to rbind each resutlset to the end of your overall dataframe in your for loop. R - Concatenate two dataframes?

(BTW, apply is usually used instead of for loops)

Something like (untested...):

df <- data.frame(TableName=character(), message_condition=character(), NumMessage=integer())
for (i in 1:length(lists)) {
  rbind(df, doCountQuerys(con, lists[i]))
}

So, you should end up with a dataframe that looks a like:

TableName, message_condition, NumMessage
table1, default, 30
table1, fair, 20
table1, good, 60
table1, ideal, 50
table2, default, 15
table2, fair, 10
table2, good, 30
table2, ideal, 60
table3, default, 10
table3, fair, 5
table3, good, 25
table3, ideal, 40

You can simply plot this:

ggplot(df, aes(x=TableName, y=NumMessage, fill=message_condition)) + geom_bar(stat="identity")

enter image description here

Hope this helps and is what you were after

Community
  • 1
  • 1
Liesel
  • 2,929
  • 2
  • 12
  • 18
  • yes, the problem is that the loop doesn't work :( also if I add `SELECT 'table1'` to the query, isnt that going to be the same for all tables? – Rain Man Jan 31 '16 at 03:42
  • Edited - your loop can be replaced with a single `rbind` and added full query – Liesel Jan 31 '16 at 03:44
  • the thing is that I just start learning/using R and I am not sure how to use rbind – Rain Man Jan 31 '16 at 03:47
  • the mysql query selects the table name as if it is a column, I have no such column with table name in the data set – Rain Man Jan 31 '16 at 03:55
  • You're passing it into your function so it's coming from somewhere, no? – Liesel Jan 31 '16 at 03:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/102117/discussion-between-les-h-and-rain-man). – Liesel Jan 31 '16 at 03:57