1

Thanks for this wonderful community and expert responses. This is my first question on stackoverflow. I did research but couldn't find what I am trying to do. How to write an efficient code in r that will create a chart with secondary Y and also does the groupby for total counts based on a certain variable? I want groupby to operate within the code rather than having to create a separate dataframe for every variable that I want to plot on X. I have thousands of rows and hundreds of columns in an r dataframe. My sample data looks like this. (20 x 5)

tv = c(0,   1,  1,  1,  0,  0,  1,  0,  1,  1,  0,  0,  1,  1,  0,  0,  0,  1,  0,  0)
pr1 =c("AA",    "AB",   "ZH",   "AA",   "ZA",   "AB",   "ZA",   "ZA",   "AA",   "AA",   "ZA",   "AA",   "ZG",   "AA",   "ZF",   "AB",   "AA",   "AB",   "AA",   "AA")
pr2 =c("B", "F",    "F",    "J",    "E",    "E",    "J",    "B",    "J",    "F",    "B",    "B",    "J",    "B",    "F",    "J",    "B",    "F",    "B",    "E")
pr3 =c(13,  13, 25, 13, 13, 13, 13, 1,  13, 13, 13, 13, 25, 13, 25, 1,  13, 13, 13, 13)
sample_data = data.frame("SN"= c(1:20),"Target_Vbl"=tv,Predictor_1=pr1,Predictor_2=pr2,Predictor_3=pr3)

From this sample data, I can create the chart I am looking for in excel but am lost when it comes to plotting it in r. I want to re-use the code for any other predictor variable but my Y axes will always remain the same i.e. primary Y is total count of Target_Vbl and secondary Y is % of one's for a given category of Predictor variable plotted on X axis.

The chart should look like below...currently plotted for Predictor_1(drawn in excel)

enter image description here

Edit - After trying the plotrix

Continuing with the sample_data I created a summary data to utilize the plotrix package. (Thanks lawyeR) The twoord.plot takes me closer to what I am looking for however there are few discrepancies as below - 1. am not getting the bars for the tc (total count of predictor_1) for left Y axis...I did try mentioning the "bar" in "type" option but it did not work. 2. The X axis labels don't show the values from the data but defaults to numbers. It should show "AA", "AB", "ZA" etc...and not 1,2,3... 3. Is there a way to make the overall process more concise. I feel my code is crude at best. Any pointers would be helpful.

library(sqldf)
smry = sqldf("Select Predictor_1, count(Target_Vbl) as tc, sum(Target_Vbl) 
as conv from sample_data Group by Predictor_1")
smry$ratio = round((smry$conv/smry$tc),2)
library(plotrix)
twoord.plot(smry$Predictor_1, smry$tc,
        smry$Predictor_1, smry$ratio, 
        type= c("l", "l"), lcol=3,rcol=4,do.first="plot_bg(\"gray\")")

The graph now looks like this - output of twoord.plot

staggle
  • 11
  • 3

0 Answers0