0

Would you mind taking a look at this?

https://docs.google.com/spreadsheets/d/14vVWxhaQynPmnAsZHlrkkdeJTt0XlDzHc5JSd4DNF-Y/edit?usp=sharing

I have three variables; first one for Year from 2000 - 2017, second one for each country's GDP over the 2000-2017 and the third for soccer ranking over the 2000-2017.

I would like to draw one giant scatter plot; Year 2000-2017 on X-axis, Rank reversed starting from 200 on bottom to 1 on top on Y-axis while each scatter point size vary with GDP size.

All I can come up with is plotting a scatter plot for one country only:

rank <- read.csv("Test1.csv", sep=",", header=TRUE)

library(ggplot2)

qplot(Year, Rank , data = rank, size = Aruba)

But I would like to fit all the countries into one scatter plot while y-axis being reversed and draw a linear regression of all scatter points if possible.

Can someone help me on this?

Community
  • 1
  • 1
GGANG
  • 3
  • 3
  • 1
    You need to reorganize your data to long form. Paste some reproducible data please. – Bing Nov 01 '18 at 16:21
  • 2
    Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Nov 01 '18 at 16:22
  • https://docs.google.com/spreadsheets/d/14vVWxhaQynPmnAsZHlrkkdeJTt0XlDzHc5JSd4DNF-Y/edit?usp=sharing Thanks guys! – GGANG Nov 01 '18 at 17:03
  • It's definitely preferred if you can post a representative sample of data *in your question*, such as with `dput` or one of the packages suggested above, rather than linking folks to a third-party download – camille Nov 01 '18 at 20:10

1 Answers1

0

I am not sure how you want the regression done. But here is the graph.

Edits: Because there is a country named "Rankmibia" which I never heard of, select by prefix won't work, I used position this time.

rank <- read.csv("Test1.csv", sep=",", header=TRUE)

library(tidyr)
library(ggplot2)
library(dplyr)

r=rank %>% select(seq(3,ncol(rank),2)) %>% gather(id,rank)
g=rank %>% select(1,seq(2,ncol(rank),2)) %>% gather(country,GDP,-Year)

df=cbind(g, rank=r$rank)
g=qplot(Year, rank , data = df, size = GDP, color=country)+scale_y_reverse()
ggsave("fig.png",g,width=40,height=20)

enter image description here

Bing
  • 1,083
  • 1
  • 10
  • 20
  • Thanks a lot!! I tried your source code with the full data I have and keep getting 'Error in data.frame arguments implying differing number of rows with your source code.. Would you please help me with the case of full data as shown? : https://docs.google.com/spreadsheets/d/14vVWxhaQynPmnAsZHlrkkdeJTt0XlDzHc5JSd4DNF-Y/edit#gid=485969225 – GGANG Nov 02 '18 at 13:18
  • Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 3366, 3402 # This error appears – GGANG Nov 02 '18 at 13:27
  • Because there is a country named "Rankmibia" which I never heard of, messed up with prefix of "Rank". I have fixed the problem. Please see edited code. – Bing Nov 03 '18 at 02:30
  • I think it was meant to be Namibia but somehow it got merged with Rank :( – GGANG Nov 03 '18 at 12:11