0

I want to do a scatter plot for two datasets with different sizes.

Imagine I have two data.frames: df1, df2. The row size of df1 is 100, and of df2 is 50. Is there a way to do a scatter plot with ggplot2? I've searched but couldn't find anything. The online tutorials always assume that the datasets are of the same size, with equal value for the x-axis. Also, I want to plot the two datasets in the same graph, not side-by-side.

Here's some exemplifying data:

df1<-data.frame(X1=1:10,Y11=11:20,Y12=21:30,Y13=31:40)

df2<-data.frame(X2=1.5:10.5,Y21=1.5:10.5)

Let's imagine X1 is a column with values measuring distance in km. Y11 is the vector with values for fuel consumption for Car1, Y12 is the vector with values for fuel consumption for Car2, and so on. Now X2 is still vector with values measuring distance in km, different from X1, but in the same range. Y21 is the fuel consumption for a modified Car1. I want to put them in the same scatter plot, with x-axis being distance(km) and y-axis being fuel consumption

An old man in the sea.
  • 1,169
  • 1
  • 13
  • 30
  • 2
    This shouldn't be any problem. Just pass the data directly to the `geom_point()` layers. It would helpful if you provided a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the code you tried. We could then offer specific fixes. – MrFlick Mar 10 '17 at 17:13
  • @MrFlick I've edited the question and included some data. I didn't write the code I've used since I fail right at the beginning where we have to give just one data.frame to the ggplot command. – An old man in the sea. Mar 10 '17 at 18:18
  • Exactly which values do you want to plot from the sample data? It's unclear how you would want to visualize `df1` with it's 4 columns. – MrFlick Mar 10 '17 at 18:45
  • @MrFlick I want to plot the points of the type (X1,Y11),..., (X1, Y13), and also those of the type (X2,Y21) in the same graph – An old man in the sea. Mar 10 '17 at 19:42
  • But you want to treat all those values in df1 as "one" scatter plot? You you basically want to ignore the columns? I'm getting more confused as to what the desired output here. – MrFlick Mar 10 '17 at 19:49
  • @MrFlick let's imagine X1 is a column with values measuring distance in km. Y11 is the vector with values for fuel consumption for Car1, Y12 is the vector with values for fuel consumption for Car2, and so on. Now X2 is still vector with values measuring distance in km, different from X1, but in the same range. Y21 is the fuel consumption for a modified Car1. I want to put them in the same scatter plot, with x-axis being distance(km) and y-axis being fuel consumption – An old man in the sea. Mar 10 '17 at 19:58
  • @MrFlick Thanks I think I got it. I posted an answer. It works with my original dataset. At least I don't get any error. Your comment, plus some trial and error helped. ;) – An old man in the sea. Mar 10 '17 at 20:06

4 Answers4

0

This would be much easier to answer if you gave an example data set. but here is what you can do (make sure each data.frame has the same column names):

df1 <- data.frame(x = 1:50, y = 1:50)
df2 <- data.frame(x = 100:1, y = 1:100)

df1$cat <- "df1"
df2$cat <- "df2"

df <- rbind(df1, df2)

library(ggplot2)
ggplot(df, aes(x, y, color = cat))+
  geom_point()

and that gives you this: enter image description here

tbradley
  • 2,210
  • 11
  • 20
  • Many thanks for the answer. I've edited the question so to reflect more precisely my problem. Sorry for not having done so previously. In my situation I have two data.frames with different columns and rows... – An old man in the sea. Mar 10 '17 at 18:19
0

If you want to plot all the data together, then It's best to reshape your data. Here's an example using other tidyverse functions

library(tidyr)
library(dplyr)

dd <- bind_rows(
  df1 %>% gather(car, mpg, -X1) %>% rename(X=X1),
  df2 %>% gather(car, mpg, -X2) %>% rename(X=X2)
)

ggplot(dd, aes(X, mpg, color=car)) + geom_point()
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

It's an old question, but recently I have solved a similar problem using a quicker approach than the ones here

ggplot with 2 y axes on each side and different scales

Maybe you can first scale down one of the datasets, and then try out the "dual y-axes" function in ggplot2, namely,

p <– ggplot2(dataframe, ...)+...
p + scale_y_continuous(name, ..., sec.axis = sec_axis(...))

where sec.axis means "the second axis" Please refer to https://www.r-graph-gallery.com/line-chart-dual-Y-axis-ggplot2.html for details. ?sec_axis in R also helps.

-1

Thanks to a comment by MrFrick, I think I got it.

ggplot(data=df1) + geom_point(aes(x=X1,y=Y11, 
                          color="Car1"))+
  geom_point(data=df2,aes(x=X2,y=Y21),color="ModCar2"))
An old man in the sea.
  • 1,169
  • 1
  • 13
  • 30
  • OK, yeah. if you just wanted to plot 2 cols from each of those datasets that looks fine. I might even move the `data=df1` out of the `ggplot()` and into the first `geom_point()` to make it more clear what's happening. – MrFlick Mar 10 '17 at 20:07
  • @MrFlick I tried that but it gave a error warning «Cannot handle data uneval». So I put it inside. – An old man in the sea. Mar 10 '17 at 20:10