0

I have a dataframe df that summarizes the activity Activity of a fish species and the current's intensity C.I and current's direction C.D associated to this activity in the water column. As an example of dataframe:

df<- data.frame(C.D=c(5,5,5,10,10,10,20,20,20,40,40,40,80,80,80,100,100,100,130,130,130,160,160,160,190,190,190,220,220,220,250,250,250,280,280,280,310,310,310,340,340,340,359,359,359),
                Activity=c(1.1,1.6,0.6,1.2,1.8,1.3,1.3,1.4,1.88,0.99,1.8,2.1,1.75,1.5,2.4,1.55,0.9,2.4,1.4,1.5,3.2,1.7,2.1,3.8,2.8,3.9,2.1,3.4,2.6,4.1,2.3,3.6,4.3,3.0,2.4,1.8,2.5,1.6,1.1,0.5,1.4,2.3,0.8,2.1,1.5),
                C.I=c(0.05,0.21,0.11,0.2,0.15,0.28,0.24,0.18,0.33,0.11,0.22,0.13,0.16,0.31,0.23,0.15,0.28,0.36,0.25,0.31,0.58,0.42,0.36,0.52,0.58,0.82,0.71,0.64,0.51,0.4,0.54,0.55,0.68,0.32,0.21,0.23,0.37,0.22,0.15,0.21,0.24,0.18,0.04,0.6,0.12))

df

   C.D Activity  C.I
1    5     1.10 0.05
2    5     1.60 0.21
3    5     0.60 0.11
4   10     1.20 0.20
.    .       .    .
.    .       .    .
.    .       .    . 

I want to explore if the current's direction C.D affects the activity of my fish species. For instance, if the activity is higher with some C.D than with others. However, since C.D and C.I might be very related (for some C.D the intensity of the currents C.I might be higher than for others), I need to add in my plot info about C.I to interpret if what I see is due to the effect of the variable C.D or is due to the third variable C.I.

As a first approximation, I plotted the points of the relationship between C.D and Activity and I added a smooth line to see the general trend. I also coloured the points depending on the C.I to see if the colours follow some pattern (for instance if specific colours are concentrated in specific C.D which would mean that some C.I only occur with specific C.D). In the example, high C.I ara associated with C.D between 140 and 250 grades. The code and the image are below:

P<- ggplot(df, aes(C.D, Activity)) +
  geom_point(aes(C.D, Activity, color = C.I)) + scale_colour_gradientn(colours=c("green","black")) + theme_bw()
P<- P +  geom_smooth()  +
  ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
  theme(plot.title = element_text(hjust = 0.5)) 

enter image description here

My problem arises when I have to plot thousands of points, since then, the use of colours for the points to show any C.I pattern associated with C.D is not appropriate. Here I show a real plot of my data:

enter image description here

My question is how could I add a second smooth line scaled with regard the first y-axis that shows the relationship between C.D and C.I. I've got this so far:

P<- P + geom_smooth(aes(C.D, C.I), color="red", se=FALSE)
P

enter image description here

Is it possible to scale the 2nd y-axis to improve the interpretation?.

Dekike
  • 1,264
  • 6
  • 17
  • Does `P + geom_smooth(aes(C.D, C.I), se=FALSE)` not give you what you want? – deepseefan Nov 14 '19 at 12:06
  • Thanks for your comment @deepseefan. I have changed the question a little bit since you pointed out very well how to draw the second smooth line in my plot. I wonder how could I scale the red line since without scaling it, it is in a very narrow range. – Dekike Nov 14 '19 at 12:46

1 Answers1

1

First, I'd like to point out the usual warnings that go with secondary axis expressed in this answer elsewhere.

Is simply transforming your data and inversely transforming the secondary axis not appropriate?

Note that 6 is an arbitrary number for the transformations in order to make the data look reasonable.

ggplot(df, aes(C.D, Activity)) +
  geom_point(aes(C.D, Activity, color = C.I)) + 
  scale_colour_gradientn(colours=c("green","black")) + 
  theme_bw() + 
  geom_smooth()  +
  ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_smooth(aes(C.D, C.I * 6), se=FALSE, colour = "red", show.legend = TRUE) +
  scale_y_continuous(sec.axis = sec_axis(trans = ~ . / 6, name = "CI"))

enter image description here

EDIT: For proper legends for the lines, I'm afraid you would have to do a bit of manual specification (unless someone else has a more elegant solution):

ggplot(df, aes(C.D, Activity)) +
  geom_point(aes(C.D, Activity, color = C.I)) + 
  scale_colour_gradientn(colours=c("green","black")) + 
  theme_bw() + 
  geom_smooth(aes(linetype = "Activity"))  +
  ggtitle("Mean activity as a function of C.D.20m for winter from hourly data") +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_smooth(aes(C.D, C.I * 6, linetype = "C.I."), se=FALSE, colour = "red", show.legend = TRUE) +
  scale_y_continuous(sec.axis = sec_axis(trans = ~ . / 6, name = "CI")) +
  scale_linetype_manual(
    values = c(1,1), 
    guide = guide_legend(override.aes = list(colour = c("blue", "red")))
  )

enter image description here

teunbrand
  • 33,645
  • 4
  • 37
  • 63
  • That's perfect @teunbrand. Thank you so much for your time – Dekike Nov 15 '19 at 08:06
  • Do you know if it is possible to show a legend indicating that the red line correspond to the C.I. variable? – Dekike Nov 15 '19 at 08:26
  • Since the colour aesthetic is already mapped to a continuous variable, it would be easiest to map a linetype to the red line, e.g. `geom_smooth(aes(C.D, C.I * 6, linetype = "C.I."), se=FALSE, colour = "red", show.legend = TRUE)` – teunbrand Nov 15 '19 at 08:38
  • Thanks @teunbrand !! And my last question (I promise). If I want to include a legend for both lines (Activity for the blue one and C.I for the red one), how should I do it? I added linetype= "Activity" in the line geom_smooth () but it gets something different. The legend shows two lines in red, one continuous line and one discontinuous line. What It would be appropiate is two continous lines, one red for C.I and one blue for Activity. Do you know how should I write it? Thanks in advance... – Dekike Nov 15 '19 at 09:51
  • I've put a suggestion in an edit to my answer, but I don't know if it is the optimal solution. – teunbrand Nov 15 '19 at 10:08