0

I am currently trying to reproduce a plot that looks like this: VIP vales vs. weighted sum of absolute regression coefficients vs masses

Ignoring the graduated scales on the right side, there are two y-axes on the graph. X is the VIP score, and the y scale is determined by the weighted sum of absolute regression coefficients, however this scale is NOT visible, and masses are seen on the left y axis. The masses are categorical variables, in this case, that match each value in the continuous variable of weighted sum of absolute regression coefficients.

My question is how do I use ggplot2, or another R package, to reproduce this? Labelling the points directly using ggrepel is not an option as there are too many masses in my dataset. Is there a way to create a scatterplot with two y axes BUT the second y axis is a categorical variable?

Sample data:

        Masses      Overall        VIP1      
1     82.07010  38.26669006 1.484957089
2     84.08570  34.22745192 1.328724766 
3     95.08570  38.65684978 1.500047945
4     96.08571  13.13685100 0.512968559
5     98.10140  36.07639404 1.400239372
6     99.04410  17.37079280 0.676731759
7    105.07530   9.38047849 0.367677099 
8    110.10130  36.66816959 1.423128458
9    111.10160  13.64197654 0.532506138
10   113.06040  10.09391101 0.395271714
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
MBell
  • 57
  • 1
  • 8
  • 3
    Totally confused. What are the two y-axes on the plot you posted? All I see is a dotplot with a numerical label. (dotplot meaning the Cleveland style, [like this](https://stackoverflow.com/questions/20197118/dotplot-with-error-bars-two-series-light-jitter), see [Wikipedia](https://en.wikipedia.org/wiki/Dot_plot_(statistics)) for comparison. – Aaron left Stack Overflow May 16 '18 at 18:55
  • You can use `scale_y_discrete` to label the y axis however you want. I would be happy to demonstrate if you share the data you want plotted (or clarify... I think that you want the x position determined by the `VIP1` columns, you want the y labels to be the numeric `Masses` column that you say is categorical, and you want the y positions to be determined by the "weighted sum of absolute regression coefficients". Is that the `Overall` column? – Gregor Thomas May 16 '18 at 19:03
  • 1
    I'm also not sure you are using "categorical" and "continuous" correctly. Your masses look numeric, not categorical. In your example graph, you say the Y scale is continuous, but the exactly even vertical spacing makes it look categorical.... – Gregor Thomas May 16 '18 at 19:08
  • Thank you for the replies! @Gregor The masses are accurate masses, so while they can be used a numerical variables, they can also be considered descriptive as they belong to one compound. So one mass here is the equivalent of a name for an organic molecule. So your description, is accurate, VIP1 needs to the x-axis, and I need the Overall (i.e. weighted sums) to determine the scale of the y-axis, but label the y-axis with the Masses (i.e. equivalent to names of compounds). How would I use scale_y_discrete to do this? – MBell May 17 '18 at 12:26

1 Answers1

0

This seems terrible, but it's what you are asking for. Calling your data dd:

ggplot(dd, aes(x = VIP1, y = Overall)) +
    geom_point() +
    scale_y_continuous(breaks = dd$Overall, labels = dd$Masses)

enter image description here

We use scale_y_continuous because the variable you want to define the y axis positions, Overall, is continuous.


Using this data:

dd = read.table(text = "        Masses      Overall        VIP1      
1     82.07010  38.26669006 1.484957089
2     84.08570  34.22745192 1.328724766 
3     95.08570  38.65684978 1.500047945
4     96.08571  13.13685100 0.512968559
5     98.10140  36.07639404 1.400239372
6     99.04410  17.37079280 0.676731759
7    105.07530   9.38047849 0.367677099 
8    110.10130  36.66816959 1.423128458
9    111.10160  13.64197654 0.532506138
10   113.06040  10.09391101 0.395271714", header = TRUE)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you! You are right, I think I am going to have to find a different way to represent these results, but this is exactly what I wanted to do. – MBell May 17 '18 at 18:14