4

I'm producing a set of graphs in two languages with ggplot2 (Hadley Wikham). I could produce them in two separate workflows by renaming variables inside the original dataset. Instead, I wish to modify a ggplot object: I wish to first produce the graphs in English and then translate the labels into French. How could/should I change the legend keys inside the ggplot object? And how can I then sort the legend keys?

The reason I am exploring this approach is that I would like my plot colours and symbols to be the same in English and French, while at the same time having the legend keys ordered alphabetically. The problem is that French and English legend keys do not have the same alphabetical order (Spain versus Espagne). Compare the legend keys obtained from the MWE: the legend keys are ordered alphabetically in the English legend, but incorrectly in the French legend.

enter image description here

Replacing the xlab, ylab, ggtitle, and modifying the styles of the axes labels (e.g. number formatting) is rather straightforward, so my focus really is on the legend keys and their order of listing inside the legend.

A MWE with lots of names to illustrate the tediousness of having to copy names several times in the approach below (once to group, another time for colour, and again for shape, etc.):

    df <- structure(list(year = c("2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007"), country = c("Australia", "Australia", 
    "Austria", "Austria", "Belgium", "Belgium", "Canada", "Canada", 
    "Denmark", "Denmark", "Finland", "Finland", "France", "France", 
    "Germany", "Germany", "Greece", "Greece", "Italy", "Italy", "Japan", 
    "Japan", "Netherlands", "Netherlands", "New Zealand", "New Zealand", 
    "Norway", "Norway", "Portugal", "Portugal", "Spain", "Spain", 
    "Sweden", "Sweden", "Switzerland", "Switzerland", "United Kingdom", 
    "United Kingdom", "United States", "United States"), value = c(33, 
    33, 33, 33, 30, 30, 34, 34, 30, 30, 33, 33, 28, 29, 27, 27, 40, 
    39, 35, 35, 35, 35, 27, 27, 33, 33, 27, 27, 37, 37, 32, 32, 31, 
    31, 32, 31, 32, 32, 33, 33)), .Names = c("year", "country", "value"
    ), row.names = c(NA, -40L), class = "data.frame")

    library("ggplot2")
    ggplot(data = df, aes(x = year, y = value, group = country, colour = country)) + 
        geom_line(size = 0.5) + geom_point(size = 1)
    ggsave(last_plot(), file = "stackoverflow-1.png")

    ggplot(data = df, aes(x = year, y = value, group = factor(country, labels = c("Australie", "Autriche", "Belgique", "Canada", "Danemark", "Finlande", "France", "Allemagne", "Grèce", "Italie", "Japon", "Pays-Bas", "Nouvelle-Zélande", "Norvège", "Portugal", "Espagne", "Suède", "Suisse", "Royaume-Uni", "États-Unis")), colour = factor(country, labels = c("Australie", "Autriche", "Belgique", "Canada", "Danemark", "Finlande", "France", "Allemagne", "Grèce", "Italie", "Japon", "Pays-Bas", "Nouvelle-Zélande", "Norvège", "Portugal", "Espagne", "Suède", "Suisse", "Royaume-Uni", "États-Unis")))) + geom_line(size = 0.5) + geom_point(size = 1) + theme(legend.title = element_blank())
    ggsave(last_plot(), file = "stackoverflow-2.png")

I would like to have a method that would not break if I use only a subset of the variables (countries in the example). The most convenient and less error-prone would be to define a mapping like this:

list("A Cuckoo Land" = "Un Pays Idyllique", # This mapping is not used
 "Australia" = "Australie", 
 "Austria" = "Autriche", 
 "Belgium" = "Belgique", 
 "Canada" = "Canada",
 "Denmark" = "Danemark", 
 "Finland" = "Finlande", 
 "France" = "France", 
 "Germany" = "Allemagne", 
 "Greece" = "Grèce", 
 "Italy" = "Italie", 
 "Japan" = "Japon", 
 "Netherlands" = "Pays-Bas", 
 "New Zealand" = "Nouvelle-Zélande", 
 "Norway" = "Norvège", 
 "Portugal" = "Portugal", 
 "Spain" = "Espagne", 
 "Sweden" = "Suède", 
 "Switzerland" = "Suisse", 
 "United Kingdom" = "Royaume-Uni", 
 "United States" = "États-Unis")

and substitute, within the legend keys, every occurrence of the left-hand side by the right-hand side. (even better if the method can handle a trilingual approach, e.g. a mapping like "Belgium" = c("Belgique", "Bélgica").

PatrickT
  • 10,037
  • 9
  • 76
  • 111

2 Answers2

1

I might actually approach this by creating a list of data frames with the same column names, but with the country names in different languages. Creating the list of data frames might be a bit of work if there are a lot of them, but I'm fairly certain it will be less cumbersome than mucking about in with grobs and gtables. An example:

key <- unlist(list("A Cuckoo Land" = "Un Pays Idyllique", # This mapping is not used
                   "Australia" = "Australie", 
                   "Austria" = "Autriche", 
                   "Belgium" = "Belgique", 
                   "Canada" = "Canada",
                   "Denmark" = "Danemark", 
                   "Finland" = "Finlande", 
                   "France" = "France", 
                   "Germany" = "Allemagne", 
                   "Greece" = "Grèce", 
                   "Italy" = "Italie", 
                   "Japan" = "Japon", 
                   "Netherlands" = "Pays-Bas", 
                   "New Zealand" = "Nouvelle-Zélande", 
                   "Norway" = "Norvège", 
                   "Portugal" = "Portugal", 
                   "Spain" = "Espagne", 
                   "Sweden" = "Suède", 
                   "Switzerland" = "Suisse", 
                   "United Kingdom" = "Royaume-Uni", 
                   "United States" = "États-Unis"))
df_eng <- df
df_fra <- df
df_fra$country <- unlist(key[df_eng$country])

dfs <- list('english' = df_eng,'french' = df_fra)

library("ggplot2")
#Now you can create one "default" plot...
p <- ggplot(data = dfs[['english']], 
            aes(x = year, y = value, 
                group = country, colour = country)) + 
  geom_line(size = 0.5) + 
  geom_point(size = 1)
print(p)

#And simply swap out the data frame...
p %+% dfs[['french']]
joran
  • 169,992
  • 32
  • 429
  • 468
  • Nice, I didn't know about ``p %+% df``! Thanks. – PatrickT Apr 29 '16 at 12:32
  • @PatrickT It did for me, at least both versions had the countries sorted alphabetically in the legend. – joran Apr 29 '16 at 12:43
  • this does not preserve the order of the values, as it re-orders both the values and the corresponding levels, so that the order of the colors and symbols is not preserved. For the particular colour code selected here (the default), this is desirable, because the color has shades of green, blue, red, and it would look strange to have "Espagne" in pink squeezed between a green "Danemark" and a green "États-Unis". So for most applications, where preserving the order of the values is not a requirement, this is a great approach. – PatrickT Apr 29 '16 at 12:47
  • Oh I'm sorry Joran. Your answer is great and ultimately I'll probably use your approach, although I'm going to explore a re-ordering of the levels of the factors, to see if that can be made as convenient as ``p %+% df``! (I did write "I would like my plot colours and symbols to be the same in English and French, while at the same time having the legend keys ordered alphabetically." in paragraph 2. – PatrickT Apr 29 '16 at 12:54
  • I'm reading this, very promising, for keeping a consistent order, by you once again! http://stackoverflow.com/questions/6919025/how-to-assign-colors-to-categorical-variables-in-ggplot2-that-have-stable-mappin – PatrickT Apr 29 '16 at 15:43
  • I'm going to have to put this issue to rest for the weekend. Will be back on Monday to close the matter. Thanks for the help so far! – PatrickT Apr 29 '16 at 16:06
  • I have got it to work. To keep the order of colors and symbols, I have used a call to ``scale_color_manual``. I'll post more code later. But one remark, Joran, as I was having problems, I used a variant of your code, with a call to the match function and an ifelse along the lines of: ``matched <- match(df_eng$country, names(key))`` and ``df_fra$country <- ifelse(is.na(matched), df_fra$country, key[matched]) ``, where ``key`` is ``c("Australia" = "Australie", etc.`` – PatrickT May 02 '16 at 18:25
0

In this answer to my own question, I would like to detail further tweaking based on joran's answer, for the record and/or further discussions.

To summarize, the purpose is: to generate sets of graphs in 2 languages with consistent colors, shapes, linetypes, and so on, across the 2 sets of graphs. The difficulty is that ggplot orders by the levels, but the labels of the levels have a different alphabetical order in the 2 languages, e.g.: one expects "Spain" to be listed towards the end of the list in English, as it starts with the letter S, but near the beginning in French, as "Espagne" starts with the letter E.

In the following, I create a country factor with labels written in English and ordered according to the English alphabetical order, and a country.fr factor with labels written in French and ordered according to the French alphabetical order. The same logic would apply to shapes, line types, fill, etc.. My code is a little and various shortcuts are no doubt possible.

    ### Create a fixed assignment for colors, shapes, linetypes, etc.
    ### The same for both the English and French versions
    ### Data
    df <- structure(list(year = c("2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007", "2006", "2007", "2006", "2007", "2006", 
    "2007", "2006", "2007"), country = c("Australia", "Australia", 
    "Austria", "Austria", "Belgium", "Belgium", "Canada", "Canada", 
    "Denmark", "Denmark", "Finland", "Finland", "France", "France", 
    "Germany", "Germany", "Greece", "Greece", "Italy", "Italy", "Japan", 
    "Japan", "Netherlands", "Netherlands", "New Zealand", "New Zealand", 
    "Norway", "Norway", "Portugal", "Portugal", "Spain", "Spain", 
    "Sweden", "Sweden", "Switzerland", "Switzerland", "United Kingdom", 
    "United Kingdom", "United States", "United States"), value = c(33, 
    33, 33, 33, 30, 30, 34, 34, 30, 30, 33, 33, 28, 29, 27, 27, 40, 
    39, 35, 35, 35, 35, 27, 27, 33, 33, 27, 27, 37, 37, 32, 32, 31, 
    31, 32, 31, 32, 32, 33, 33)), .Names = c("year", "country", "value"
    ), row.names = c(NA, -40L), class = "data.frame")

    ## Create a unique country ID and a language map
    key <- read.table(textConnection("
    AUS,Australia,Australie
    AUT,Austria,Autriche
    BEL,Belgium,Belgique
    CAN,Canada,Canada
    CHE,Switzerland,Suisse
    DEU,Germany,Allemagne
    DNK,Denmark,Danemark
    ESP,Spain,Espagne
    FIN,Finland,Finlande
    FRA,France,France
    GBR,United Kingdom,Royaume-Uni
    GRC,Greece,Grèce
    ITA,Italy,Italie
    JPN,Japan,Japon
    NLD,Netherlands,Pays-Bas
    NZL,New Zealand,Nouvelle-Zélande
    NOR,Norway,Norvège
    PRT,Portugal,Portugal
    SWE,Sweden,Suède
    USA,United States,États-Unis"), 
    sep = ',', stringsAsFactors = FALSE)
    names(key) <- c('country.code', 'country.name', 'country.name.fr')
    ##  Check the types of data
    ##  ! Make sure country is a 'string' not a 'factor' !
    ##  ! otherwise, the 'translation' will be incorrect !
    str(key)
    ##'data.frame': 20 obs. of  3 variables:
    ## $ country.code   : chr  "         AUS" "         AUT" "         BEL" "         CAN" ...
    ## $ country.name   : chr  " Australia" " Austria" " Belgium" " Canada" ...
    ## $ country.name.fr: chr  " Australie" " Autriche" " Belgique" " Canada" ...

    ## Create a unique code variable for each country
    df$country.code <- NA
    matched <- match(df$country, key$country.name)
    df$country.code <- ifelse(is.na(matched), df$country, key$country.code[matched])

    ## translate country name with translation key
    df$country.fr <- NA
    matched <- match(df$country, key$country.name)
    df$country.fr <- ifelse(is.na(matched), NA, key$country.name.fr[matched])

    ## Set the country names to be factors (they are currently strings)
    ## function as.factor orders alphabetically
    # English
    df$country <- as.factor(df$country)
    View(df)
    # French
    df$country.fr <- as.factor(df$country.fr)
    View(df)

    ## Define some colors (here manually combining Set1 and Set3 of RColorBrewer)
    ## The Palette could also have been embedded in the key dataframe earlier...
    colorPalette <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999","#8DD3C7", "#FFFFB3", "#BEBADA", "#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD", "#CCEBC5", "#FFED6F")
    length(colorPalette)  # Make sure we have enough colors
    ## [1] 21

    ## Set the colors to each country within the dataframe
    ## There is no need for that, but I felt it was idiot-proof
    names(colorPalette) <- levels(df$country)
    df$colors <- NA
    matched <- match(df$country, names(colorPalette))
    df$colors <- ifelse(is.na(matched), NA, colorPalette[matched])
    ##'data.frame': 40 obs. of  6 variables:
    ## $ year        : chr  "2006" "2007" "2006" "2007" ...
    ## $ country     : Factor w/ 20 levels "Australia","Austria",..: 1 1 2 2 3 3 4 4 5 5 ...
    ## $ value       : num  33 33 33 33 30 30 34 34 30 30 ...
    ## $ country.code: chr  "AUS" "AUS" "AUT" "AUT" ...
    ## $ country.fr  : Factor w/ 20 levels "Allemagne","Australie",..: 2 2 3 3 4 4 5 5 6 6 ...
    ## $ colors      : chr  "#E41A1C" "#E41A1C" "#377EB8" "#377EB8" ...

    ### Make the English plot
    ##  use the country factor to order variables
    library("ggplot2")
    p <- ggplot(data = df, aes(x = year, y = value, 
                    group = country, colour = country)) + 
      geom_line(size = 0.5) + 
      geom_point(size = 1) +
      guides(colour = guide_legend(ncol = 2))
    p

    ### Swap out the colors with custom scheme using scale_colour_manual
    ## To ensure correct mapping, use named vectors in scale_colour_manual
    colors <- df$colors
    names(colors) <- df$country
    str(colors)
    ## Named chr [1:40] "#E41A1C" "#E41A1C" "#377EB8" ...
    ## - attr(*, "names")= chr [1:40] "Australia" "Australia" "Austria" "Austria" ...

    p + scale_colour_manual(name = "country", values = colors)

    ### Make the French plot
    ##  use the country.fr factor to order variables
    colors.fr <- df$colors
    names(colors.fr) <- df$country.fr
    str(colors.fr)
    ##Named chr [1:40] "#E41A1C" "#E41A1C" "#377EB8" ...
    ## - attr(*, "names")= chr [1:40] "Australie" "Australie" "Autriche" "Autriche" ...
    p <- ggplot(data = df, aes(x = year, y = value, 
                    group = country.fr, colour = country.fr)) + 
      geom_line(size = 0.5) + 
      geom_point(size = 1) +
      guides(colour = guide_legend(ncol = 2))
    p

    p + scale_colour_manual(name = "pays", values = colors.fr)

Here the corresponding legends side by side:

enter image description here

PatrickT
  • 10,037
  • 9
  • 76
  • 111