picking 10 highest values from each column in data.frame and plot in descending order

Question

Based on suggestions by Jon Spring, I have tried to reformulate most of the code as follows.

library(tidyverse)
ka2 <- ka %>%
  gather(Year, Export, -c(Economy, Partner)) %>%
  group_by(Year) %>%
  arrange(Year, -Export) %>%
  top_n(10, wt = Export) %>%
  ungroup()

ka2$Year <- gsub("X", "", ka2$Year)
ka2$Economy <- NULL

ka2 <- droplevels(ka2)
sapply(ka2, class)
ka2$Year <- as.integer(ka2$Year)

library(ggplot2)
library(scales)


ggplot(ka2, aes(x=reorder(Partner, -Export), y = Export/1000000, fill = Partner)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = comma) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), 
        legend.position = "none") +
  labs(title = "Kazhakhstan Exports to Largest Partners, 2000-2015", 
       y = "Bln USD", x = element_blank()) +
  facet_wrap(~ Year, scales = 'free_x')

The data.frame generated in due course is as follows:

    structure(list(Partner = c("Switzerland", "Italy", "Russia", 
"China", "France", "Iran", "Netherlands", "Israel", "Azerbaijan", 
"Spain", "Switzerland", "Italy", "Russia", "France", "China", 
"Iran", "Netherlands", "USA", "Israel", "Canada", "Italy", "Switzerland", 
"Russia", "China", "France", "Iran", "Netherlands", "UK", "Spain", 
"Romania", "Italy", "Switzerland", "China", "Russia", "France", 
"Netherlands", "Iran", "UK", "Ukraine", "Israel", "Italy", "Switzerland", 
"China", "Russia", "France", "Netherlands", "Israel", "Iran", 
"Ukraine", "Turkey", "Italy", "China", "Russia", "France", "Switzerland", 
"Netherlands", "Canada", "Ukraine", "Iran", "UK"), Year = c(2004L, 
2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 
2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 2005L, 
2005L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 2006L, 
2006L, 2006L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 2007L, 
2007L, 2007L, 2007L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 
2008L, 2008L, 2008L, 2008L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L), Export = c(3760396L, 3108975L, 
2836286L, 1966911L, 1468224L, 712011L, 464600L, 322423L, 287125L, 
281365L, 5509511L, 4190531L, 2926578L, 2665146L, 2422507L, 886118L, 
877836L, 666028L, 661641L, 528132L, 6891644L, 6721180L, 3730037L, 
3592514L, 3346969L, 2077598L, 1704555L, 1143876L, 968365L, 747116L, 
7774224L, 7475877L, 5635914L, 4658919L, 3982705L, 2464262L, 2451368L, 
1133234L, 1113097L, 1058817L, 11920317L, 11281326L, 7676609L, 
6227049L, 5388682L, 4638669L, 2226504L, 2039530L, 2003343L, 1903764L, 
6686756L, 5888593L, 3546967L, 3381509L, 2668219L, 2222452L, 1385352L, 
1289161L, 1279004L, 1235083L)), .Names = c("Partner", "Year", 
"Export"), row.names = c(NA, -60L), class = c("tbl_df", "tbl", 
"data.frame"))

And, the image I have got is as this.

The result

But, you can see even though each year displays the top 10 export destinations, they are not arranged in descending order. The data.frame was secured through arrangment of values in descending order, but display is not so. Hope this can be solved.

Based on the links and suggestions by Uwe, I have furthered my code as below.

ka2$ord <- rep(10:1,len=120)

ggplot(ka2, aes(x = -ord, y = Export/1000000, fill = Partner)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = comma) +
  scale_x_discrete(labels = ka2[, setNames(as.character("Partner"), "ord")]) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), 
        legend.position = "none") +
  labs(title = "Kazhakhstan Exports to Largest Partners, 2004-2015", 
       y = "Bln USD", x = "Partner") +
  facet_wrap(~ Year, scales = "free_x")

It gives me the result2 as follows.

The result 2

The issue is that of axis.x labels.

Question 2, if I can interpret your need correctly, are you looking to find the top 10 responses for each year and blend everything together to end up with a two column data frame (columns Partner and Year)? It looks like you're trying to identify the top 10 partners for each year. It could be useful to be more descriptive in telling us what you're trying to accomplish. — Dale Kube, Jan 06 '19 at 04:39
Please, ask a new question instead of completely changing the original question. Now, all the answers do no longer match the modified question. Even the original test data which were used to prepare these answers have been removed. Therefore, please, consider to roll-back your latest edits and ask a new question. Thank you. — Uwe, Jan 06 '19 at 13:46
Related question by the same OP: https://stackoverflow.com/q/54060076/3817004 — Uwe, Jan 06 '19 at 14:06
A solution to the edited question can be found here [ggplot2: reorder bars from highest to lowest in each facet](https://stackoverflow.com/a/43177953/3817004) — Uwe, Jan 06 '19 at 14:47
Sorry for inconvenience, but I won't compromise my learning curve, be what my scores. But, I thank all the contributiors. — , Jan 06 '19 at 16:37

score 1 · Answer 1 · answered Jan 06 '19 at 05:25

Using base R , we could select the columns for which we want to find top 10 entries (for this example, it is 3:5) and then for each column we use order with decreasing = TRUE to get entries in descending order and select only top 10. We then create a dataframe using these entries as indices and subset it from Partner column to get countries and finally cbind all the list of dataframes using do.call

do.call("cbind", lapply(df[3:5], function(x) {
     inds <- head(order(x, decreasing = TRUE), 10)
    data.frame(Partner = df$Partner[inds], year = x[inds])
}))


#   X2000.Partner X2000.year X2001.Partner X2001.year X2002.Partner X2002.year
#1         Russia    1710262        Russia    1733412        Russia    1497738
#2          Italy     917604         Italy     956196         China    1018680
#3          China     672549         China     646651         Italy     904222
#4        Germany     546887       Germany     495533   Switzerland     773596
#5    Switzerland     463171       Ukraine     490215           UAE     472277
#6        Ukraine     254241   Switzerland     378243        Poland     320482
#7    Netherlands     219459           UAE     330340          Iran     309882
#8             UK     219420            UK     271586       Ukraine     291469
#9           Iran     203270          Iran     208925       Germany     219732
#10           USA     176198        Poland     164157            UK     131824

Thanks! can you provide an additional column for the year also. — , Jan 06 '19 at 05:45
@AmbrishDhaka not sure, what you mean by that. Every alternate column is value for year. If you have more columns you need to change `3:5` to the column numbers say `3:ncol(ka)` in your case. — Ronak Shah, Jan 06 '19 at 05:49

Jon Spring · Answer 2 · 2019-01-06T05:53:13.820

Here's an approach using the tidyverse and purrr.

First part is straightforward, converting the wide data into a long list and collecting the top ten for each year.

library(tidyverse)
df2_long <- df %>%
  select(-Economy) %>%
  gather(year, value, X2000:X2002) %>%
  group_by(year) %>%
  arrange(year, -value) %>%
  top_n(10, wt = value) %>%
  ungroup()

To get the output into wide format with repeating headers, I first make a function that takes year as an input and outputs the countries and values for that year. Then I use purrr::map to run that function for each year, and then bind those together.

library(purrr)
year_cols <- unique(df2_long$year)
grab_chunk <- function(year_col) {
  df2_long %>%
  filter(year == year_col) %>%
  mutate(Partner = fct_reorder(Partner, -value)) %>%  # Added to keep order
  spread(year, value)
}
df2_wide <- map(year_cols, grab_chunk) %>%
  bind_cols()

Output

> df2_wide
# A tibble: 10 x 6
   Partner       X2000 Partner1      X2001 Partner2      X2002
   <fct>         <int> <fct>         <int> <fct>         <int>
 1 Russia      1710262 Russia      1733412 Russia      1497738
 2 Italy        917604 Italy        956196 China       1018680
 3 China        672549 China        646651 Italy        904222
 4 Germany      546887 Germany      495533 Switzerland  773596
 5 Switzerland  463171 Ukraine      490215 UAE          472277
 6 Ukraine      254241 Switzerland  378243 Poland       320482
 7 Netherlands  219459 UAE          330340 Iran         309882
 8 UK           219420 UK           271586 Ukraine      291469
 9 Iran         203270 Iran         208925 Germany      219732
10 USA          176198 Poland       164157 UK           131824

Thanks, I hadn't noticed that the `spread` step reordered alphabetically by Partner. Corrected by reordered Partner factor before the spread step. — Jon Spring, Jan 06 '19 at 05:55

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

Note: The answer below refers to the original question before the OP completely changed the question:

The original question demanded to rank Partner by descending value for each year and to display the top 10 partners and the corresponding value for each year in a wide format.

For the sake of completeness, here is also an answer which uses melt() and dcast(). The solution is flexible to work with an arbitrary number of year columns X2000, X2001, etc.

library(data.table)
top <- 10L
val_cols <- c("Partner", "value")
# reshape from wide to long format
long <- melt(setDT(ka), id.vars = c("Economy", "Partner"), variable.name = "year")
# order by decreasing value and append rank for each year
long[order(-value), rank := rowid(year)]
# reshape from long to wide format and keep only the top rows, remove rank
wide <- dcast(long, Economy + rank ~ year, value.var = val_cols)[, rank := NULL][1:top]
# reorder columns for convenience
setcolorder(wide, c("Economy",
            as.vector(outer(val_cols, unique(long$year), paste, sep = "_"))))
wide

       Economy Partner_X2000 value_X2000 Partner_X2001 value_X2001 Partner_X2002 value_X2002
 1: Kazakhstan        Russia     1710262        Russia     1733412        Russia     1497738
 2: Kazakhstan         Italy      917604         Italy      956196         China     1018680
 3: Kazakhstan         China      672549         China      646651         Italy      904222
 4: Kazakhstan       Germany      546887       Germany      495533   Switzerland      773596
 5: Kazakhstan   Switzerland      463171       Ukraine      490215           UAE      472277
 6: Kazakhstan       Ukraine      254241   Switzerland      378243        Poland      320482
 7: Kazakhstan   Netherlands      219459           UAE      330340          Iran      309882
 8: Kazakhstan            UK      219420            UK      271586       Ukraine      291469
 9: Kazakhstan          Iran      203270          Iran      208925       Germany      219732
10: Kazakhstan           USA      176198        Poland      164157            UK      131824

EDIT: rank column removed from result by request of the OP.

Addendum

The OP has posted a related question reshaping data frame with tidyverse as ggplot2 requires the plot data in long format.

The intermediate result long in this answer can be used for plotting:

long[rank <= 10]

As a complete answer:

library(ggplot2)
ggplot(
  long[, ord := sprintf("%03i", frank(long, year, -value, ties.method = "first"))][rank <= 10], 
  aes(x = ord, y = value/1000000, fill = Partner)) +
  geom_col() +
  scale_y_continuous(labels = scales::comma) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1), 
        legend.position = "none") +
  labs(title = "Kazhakhstan Exports to Largest Partners, 2000-2015", 
       y = "Bln USD", x = element_blank()) +
  facet_wrap(~ year, scales = 'free_x', drop = TRUE) + 
  scale_x_discrete(labels = long[, setNames(as.character(Partner), ord)])

Data

These are the data provided by the OP in the original question:

ka <- structure(list(Economy = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("Kazakhstan", "Kyrgyzstan", "Tajikistan", 
"Turkmenistan", "Uzbekistan"), class = "factor"), Partner = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 16L, 
17L, 18L, 19L, 20L, 31L, 21L, 22L, 23L, 25L, 26L, 27L, 28L, 24L, 
29L, 30L, 32L, 33L, 35L, 36L, 37L, 34L, 40L, 38L, 39L, 41L, 42L, 
15L), .Label = c("Austria", "Azerbaijan", "Bangladesh", "Belarus", 
"Belgium", "Canada", "China", "Czechia", "France", "Georgia", 
"Germany", "Greece", "Hungary", "India", "Indonesia", "Iran", 
"Israel", "Italy", "Japan", "Kazakhstan", "Kyrgyzstan", "Malaysia", 
"Mexico", "Moldova", "Mongolia", "Netherlands", "Pakistan", "Poland", 
"Romania", "Russia", "SouthKorea", "Spain", "Switzerland", "Tajikistan", 
"Thailand", "Turkey", "Turkmenistan", "UAE", "UK", "Ukraine", 
"USA", "Uzbekistan"), class = "factor"), X2000 = c(556L, 46816L, 
2839L, 16155L, 7584L, 4393L, 672549L, 7740L, 4319L, 7646L, 546887L, 
473L, 2717L, 29497L, 203270L, 7033L, 917604L, 9733L, NA, 36430L, 
57247L, 10005L, 6342L, 12240L, 219459L, 1455L, 56626L, 1076L, 
523L, 1710262L, 6514L, 463171L, 15174L, 62280L, 7100L, 52602L, 
254241L, 11008L, 219420L, 176198L, 133526L, NA), X2001 = c(3170L, 
69309L, 3111L, 5120L, 3627L, 333L, 646651L, 4167L, 7217L, 2954L, 
495533L, 4381L, 7904L, 9279L, 208925L, 3599L, 956196L, 15474L, 
NA, 43400L, 83778L, 1374L, NA, 9768L, 141327L, 489L, 164157L, 
2709L, 7837L, 1733412L, 6782L, 378243L, 3153L, 74231L, 14178L, 
61198L, 490215L, 330340L, 271586L, 142412L, 150234L, NA), X2002 = c(3307L, 
112657L, 1170L, 11815L, 7384L, 9492L, 1018680L, 15064L, 27434L, 
5238L, 219732L, 30678L, 12626L, 4213L, 309882L, 5839L, 904222L, 
22998L, NA, 48866L, 108462L, 2624L, NA, 8329L, 123553L, 438L, 
320482L, 2734L, 121044L, 1497738L, 11549L, 773596L, 17849L, 97449L, 
15254L, 45740L, 291469L, 472277L, 131824L, 116942L, 101022L, 
NA)), row.names = c(NA, -42L), class = "data.frame")

ka

      Economy      Partner   X2000   X2001   X2002
1  Kazakhstan      Austria     556    3170    3307
2  Kazakhstan   Azerbaijan   46816   69309  112657
3  Kazakhstan   Bangladesh    2839    3111    1170
4  Kazakhstan      Belarus   16155    5120   11815
5  Kazakhstan      Belgium    7584    3627    7384
6  Kazakhstan       Canada    4393     333    9492
7  Kazakhstan        China  672549  646651 1018680
8  Kazakhstan      Czechia    7740    4167   15064
9  Kazakhstan       France    4319    7217   27434
10 Kazakhstan      Georgia    7646    2954    5238
11 Kazakhstan      Germany  546887  495533  219732
12 Kazakhstan       Greece     473    4381   30678
13 Kazakhstan      Hungary    2717    7904   12626
14 Kazakhstan        India   29497    9279    4213
15 Kazakhstan         Iran  203270  208925  309882
16 Kazakhstan       Israel    7033    3599    5839
17 Kazakhstan        Italy  917604  956196  904222
18 Kazakhstan        Japan    9733   15474   22998
19 Kazakhstan   Kazakhstan      NA      NA      NA
20 Kazakhstan   SouthKorea   36430   43400   48866
21 Kazakhstan   Kyrgyzstan   57247   83778  108462
22 Kazakhstan     Malaysia   10005    1374    2624
23 Kazakhstan       Mexico    6342      NA      NA
24 Kazakhstan     Mongolia   12240    9768    8329
25 Kazakhstan  Netherlands  219459  141327  123553
26 Kazakhstan     Pakistan    1455     489     438
27 Kazakhstan       Poland   56626  164157  320482
28 Kazakhstan      Moldova    1076    2709    2734
29 Kazakhstan      Romania     523    7837  121044
30 Kazakhstan       Russia 1710262 1733412 1497738
31 Kazakhstan        Spain    6514    6782   11549
32 Kazakhstan  Switzerland  463171  378243  773596
33 Kazakhstan     Thailand   15174    3153   17849
34 Kazakhstan       Turkey   62280   74231   97449
35 Kazakhstan Turkmenistan    7100   14178   15254
36 Kazakhstan   Tajikistan   52602   61198   45740
37 Kazakhstan      Ukraine  254241  490215  291469
38 Kazakhstan          UAE   11008  330340  472277
39 Kazakhstan           UK  219420  271586  131824
40 Kazakhstan          USA  176198  142412  116942
41 Kazakhstan   Uzbekistan  133526  150234  101022
42 Kazakhstan    Indonesia      NA      NA      NA

rank is not needed, in fact, the rank for each year would be different so additional 15 columns would be created that makes cumbersome. — , Jan 06 '19 at 12:13
Of course, `rank` can be safely removed from the final result as it is equivalent to the row number. I have left it in just for demonstration. However, the task at hand can be described as "rank the Partner by descending value for each year and display in a wide format". So, rank (= row) no.3 is occupied by China in 2000 and 2001 and by Italy in year 2002. — Uwe, Jan 06 '19 at 12:27
I have a suggestion, we need not complicate the code. I can use `ka2$ord <- rep(10:1,len=120)` to create ka2$ord, which gives me descending order of values as I am already fixing the top 10 values. this now leaves the question of how to display ka2$Partner values instead of ka2$ord in scale_x_discrete. — , Jan 06 '19 at 17:34

picking 10 highest values from each column in data.frame and plot in descending order

3 Answers3

Addendum

Data