0

my df2:

          League freq
18       England  108
27         Italy   79
20       Germany   74
43         Spain   64
19        France   49
39        Russia   34
31        Mexico   27
47        Turkey   24
32   Netherlands   23
37      Portugal   21
49 United States   18
29         Japan   16
25          Iran   15
7         Brazil   13
22        Greece   13
14         Costa   11
45   Switzerland   11
5        Belgium   10
17       Ecuador   10
23      Honduras   10
42   South Korea    9
2      Argentina    8
48       Ukraine    7
3      Australia    6
11         Chile    6
12         China    6
15       Croatia    6
35        Norway    6
41      Scotland    6
34       Nigeria    5

I try to select europe.

europe <- subset(df2, nrow(x=18, 27, 20) select=c(1, 2))

What is the most effective way to select europe, africa, Asia ... from df2?

Mark Miller
  • 12,483
  • 23
  • 78
  • 132
Teletubbi-OS X
  • 391
  • 1
  • 3
  • 13
  • Yes I know ;) America Argentina <- df2[2, c("freq")] Brazil <- df2[7, c("freq")] Canada <- df2[10, c("freq")] Chile <- df2[11, c("freq")] Colombia <- df2[13, c("freq")] Ecuador <- df2[17, c("freq")] Costa <- df2[14, c("freq")] Honduras <- df2[23, c("freq")] Mexico <- df2[31, c("freq")] Paraguay <- df2[36, c("freq")] United_States <- df2[49, c("freq")] Uruguay <- df2[50, c("freq")] But with this procedure, I'll lost all the urgent "freq" information. What I'm heading for: a ggplot for continents. continents <- c(europe, america, ...) – Teletubbi-OS X Jun 06 '14 at 09:47

2 Answers2

5

You either need to identify which countries are on which continents by hand, or you might be able to scrape this information from somewhere:

(basic strategy from Scraping html tables into R data frames using the XML package)

library(XML)
theurl <- "http://en.wikipedia.org/wiki/List_of_European_countries_by_area"
tables <- readHTMLTable(theurl)
library(stringr)
europe_names <- str_extract(as.character(tables[[1]]$Country),"[[:alpha:] ]+")
head(sort(europe_names))
## [1] "Albania"    "Andorra"    "Austria"    "Azerbaijan" "Belarus"     
## [6] "Belgium"   
## there's also a 'Total' entry in here but it's probably harmless ...
subset(df2,League %in% europe_names)

Of course you'd have to figure this out again for Asia, America, etc.

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
3

So here's a slightly different approach from @BenBolker's, using the countrycode package.

library(countrycode)
cdb <- countrycode_data         # database of countries

df2[toupper(df2$League) %in% cdb[cdb$continent=="Europe",]$country.name,]
#         League freq
# 27       Italy   79
# 20     Germany   74
# 43       Spain   64
# 19      France   49
# 32 Netherlands   23
# 37    Portugal   21
# 22      Greece   13
# 45 Switzerland   11
# 5      Belgium   10
# 48     Ukraine    7
# 15     Croatia    6
# 35      Norway    6

One problem you're going to have is that "England" is not a country in any database (rather, "United Kingdom"), so you'll have to deal with that as a special case.

Also, this database considers the "Americas" as a continent.

df2[toupper(df2$League) %in% cdb[cdb$continent=="Americas",]$country.name,]

so to get just South America you have to use the region field:

df2[toupper(df2$League) %in% cdb[cdb$region=="South America",]$country.name,]
#       League freq
# 7     Brazil   13
# 17   Ecuador   10
# 2  Argentina    8
# 11     Chile    6
jlhoward
  • 58,004
  • 7
  • 97
  • 140