2

I have a dataset of 2 variables and over 30k observations. one variable is country and the other is price. I want to plot the countries on the x axis but I only want to include certain rows(countries) such as "UK" & "USA" and not all the 20 countries listed in the column.

I am using ggplot but I am not sure how I would subset the dataset to include only those countries and their prices.

one_plot <- subset(origin_price$product_origin == c["USA", "UK", "Australia", "China"])

I tried to subset using the above code which is wrong, but Im struggling to find any solutions online to this particular problem.

J_F
  • 9,956
  • 2
  • 31
  • 55
rkras
  • 121
  • 4
  • 16
  • 1
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Axeman Mar 13 '17 at 18:58
  • Subsetting your data would probably be the easiest way to avoid surprises. – Dan Slone Mar 13 '17 at 19:12
  • `subset( origin_price , product_origin %in% c( "USA", "UK", "Australia", "China" ) )` – Dan Slone Mar 13 '17 at 19:29
  • @DanSlone Thank you very much – rkras Mar 13 '17 at 19:36

1 Answers1

3
y = sample(1:1000,1000) #price
x = sample(letters, 1000, replace = T) #country names

library(ggplot2)
d = data.frame(x,y)
d = subset(d, x == "a"| x == "b")

Use subset to subset dataframe and | to separate the countries you want to plot.

u = ggplot(d, aes(x = x, y = y))
u + geom_point()  # that is it. 
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
programandoconro
  • 2,378
  • 2
  • 18
  • 33
  • 1
    `x %in% c("a", "b")` can be nicer (and generalizes nicely to more possibilities) than `x == "a" | x == "b"`. And inside `subset` you don't need `d$` - that's what the `data` argument is for. – Gregor Thomas Mar 14 '17 at 00:08