1

I am fairly new to R but trying to learn by doing

I am trying to plot a categorical (channel) against a continuous variable (sales).

Here is my data

print(columnValues)

channel_final   tot_sales_year
1           Texas        5000.00
2           Mexico        8951.55
3           Mexico           0.23
4           Mexico          12.00
5           Mexico      250094.00
6           Texas      388859.38

Here is the code I am using to produce the graph

plot(columnValues[,1],columnValues[,2],xlab="independentColumnName",ylab="Test") 

However I get an error

Error in plot.window(...) : need finite 'xlim' values 

and some warnings

4: In min(x) : no non-missing arguments to min; returning Inf
5: In max(x) : no non-missing arguments to max; returning -Inf

What am I doing wrong? How do I fix this?

Thanks in advance for your help

Update #1 I have updated my plotting code to

boxplot(columnValues[,2]~columnValues[,1])

This works now.

Abhi
  • 399
  • 2
  • 7
  • 21
  • what is str(columnValues) return? I suspect your variable tot_sales_year has character values and not numeric – infominer Feb 08 '14 at 23:53
  • What sort of plot do you want? A barplot: barplot(df$tot_sales_year, names=df$channel_final , cex.names=0.7). Do you want to show the total or mean sales by country? – user20650 Feb 09 '14 at 00:00
  • @infominer Here is the output $ tot_sales_year: num 5.00e+03 8.95e+03 2.30e-01 1.20e+01 2.50e+05 ... Also I have outputted my dataset in the original post – Abhi Feb 09 '14 at 00:00
  • @user20650 I am trying to visualize correlation. hence some thing like a scatter plot – Abhi Feb 09 '14 at 00:01
  • I'm not sure about your use of a scatter plot. Maybe a boxplot is more applicable: boxplot(df$tot_sales_year ~ df$channel_final) – user20650 Feb 09 '14 at 00:08
  • @user20650 That is giving me an error as well Error in x[floor(d)] + x[ceiling(d)] : non-numeric argument to binary operator This is the third graph that has failed (scatter, box, bar). I am beginning to feel this is a data issue. I have printed the data in the original post. Anything obvious I am doing wrong – Abhi Feb 09 '14 at 00:16
  • @user20650 I have an update. When I plotted boxplot(columnValues$tot_sales_year~columnValues$channel_final) I did not get any exceptions. I have another question though. Why does boxplot(columnValues[,1],columnValues[,2]) fail ? – Abhi Feb 09 '14 at 00:22
  • 1
    @Abhi, read ?boxplot to understand why your `boxplot(columnValues[,1],columnValues[,2])` fails. I see your data in the post, but that doesn't give the whole picture. best way to share data is use dput. Read this on how to http://stackoverflow.com/a/5963610/2747709 – infominer Feb 09 '14 at 00:47
  • Abhi, I think you have extra "crud" in your data. I copied what you pasted and then cleaned it up. here's the dput of your data. Copy beginning structure upto the last parantheses structure(list(channel_final = structure(c(2L, 1L, 1L, 1L, 1L, 2L), .Label = c("Mexico", "Texas"), class = "factor"), tot_sales_year = c(5000, 8951.55, 0.23, 12, 250094, 388859.38)), .Names = c("channel_final", "tot_sales_year"), class = "data.frame", row.names = c(NA, -6L )). Save this to a file, and use dget(file) to read it back in. Now use all your plotting commands to see what happens – infominer Feb 09 '14 at 00:52
  • @Abhi; As informiner says, look at ?boxplot to see how to define it - particularly the formula argument to see why passing two vectors separately does not work. The original plot didn't work as the plot command did not know what method to use - your x-variable (channel_final) is not numeric. – user20650 Feb 09 '14 at 01:40
  • @infominer Here is the dput for my test data structure(list(channel_final = c("Mexico", "Texas", "Mexico", "Mexico", "Texas", "Texas"), tot_sales_year = c(600, 2855296, 4982.49, 1108690.76, 42954, 97170.48)), .Names = c("channel_final", "tot_sales_year"), row.names = c(NA, 6L), class = "data.frame") 2 questions (1) This is different from what you have in your comment (2) Can you point out what the "CRUD" is? I just dont see it – Abhi Feb 09 '14 at 01:51
  • 1. It's different because you have the first column as characters do str() on the data you gave me and the data I gave you (spot the difference) 2. No crud in what you gave me via dput, but I "suspected" it from your original post, by looking at the alignment of numbers in the second column. 3. do plot on the data i gave you and the data you gave me and see what happens. If you do what I outlined in 1 and 3, you should be able to figure out what's going on. – infominer Feb 09 '14 at 04:14
  • @user20650 thanks for pointing it out. It makes sense now. One more quick question, it seems I cannot create a scatter plot for categorical variables using the plot command. Is my understanding correct? – Abhi Feb 09 '14 at 04:27
  • @Abhi; here yes, but I don't think a scatterplot is the way to visualise your data; if you have more points i would go with a boxplot. If you only have the few observations the dotplot below is a good option (note: lattice package also has a nice dotplot function) – user20650 Feb 10 '14 at 16:32

1 Answers1

2

With so many comments it's hard to know what's been covered, but here's a "scatterplot" by category using ggplot. Is this what you had in mind?

library(ggplot2)
ggplot(columnValues)+
  geom_point(aes(x=channel_final, y=tot_sales_year),size=3)

jlhoward
  • 58,004
  • 7
  • 97
  • 140