1

I'm quite new to programming as well as data analysis, please bear with me here. My data currently consists of a list of 14 matrices (lom), each corresponding to data from a country (with two-letter country codes).

Here is a full sample for Austria:

> lom["AT"]
$`AT`
   Year    AllKey    AllSub    SelKey    SelSub
1  2000  1.622279 0.5334964  1.892894 0.8057591
2  2001  1.903745 0.5827514  2.291335 0.8295899
3  2002  1.646538 0.4873866  2.006873 0.7360566
4  2003  1.405250 0.8692641  2.105648 1.2711968
5  2004  1.511154 1.5091751  1.970236 1.9407666
6  2005  1.459177 0.6781008  1.808982 1.1362805
7  2006  1.604652 0.5038658  1.942126 0.7992008
8  2007  2.107326 0.9260200  2.683072 1.3302627
9  2008  1.969735 0.6178362  2.994758 1.2051339
10 2009  1.955768 0.7365529  2.896198 1.2272024
11 2010  2.476157 0.7952590  3.715950 1.5686643
12 2011  2.092459 0.4970011  2.766169 0.6476707
13 2012  1.913122 0.5338756  2.450942 0.6022315
14 2013  2.086200 0.6739412  2.786736 0.9211941
15 2014  2.579428 0.8424793  3.152541 1.0225888
16 2015 10.662568 5.8472436  9.769320 3.8840780
17 2016 11.088286 4.6504581 10.567789 3.2383420
18 2017  7.225053 1.7528594  6.747515 1.2781224

I'd like to get all 14 countries plotted against x = Year and y = each of the other variables, i.e. four plots with 14 lines each. Hence the requirement in the question title.

I keep coming up with impossibilities involving some combination of a for loop and some apply function, for example:

for (i in colnames(lom$anyCountry)) {
    ggplot(lapply(lom, function(x) x[,1:14], aes(x=Year, y=i)   
}

which apart from many other problems I can now see throws:

Error: data must be a data frame, or other object coercible by fortify(), not a list

which led me to combine the list of matrices into a big matrix inspired by this:

bigDF <- do.call(rbind, lom)

I suppose I could restructure my data some other way, perhaps I'm missing some functionality that would help... probably both. I would appreciate any pointers as to how to achieve this as efficiently as possible.

Fons MA
  • 1,142
  • 1
  • 12
  • 21
  • Welcome to SO! This community has a few rules and norms, and following them will help you get a good answer to your question. In particular, it's best to provide an [MCVE](https://stackoverflow.com/help/mcve) (a minimum, complete, and verifiable example). Check out [this page](https://stackoverflow.com/a/5963610/4573108) for tips regarding R-specific MCVEs. It's also best to avoid using images of code/data and [here's why](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557). Thanks and good luck! – DanY Aug 21 '18 at 15:57
  • Thanks @DanY ! Really useful, I never posted before because it seemed hard enough already to provide the "test" code... I think I've got some good pointers now! :) – Fons MA Aug 22 '18 at 00:43

2 Answers2

0

Consider appending all matrix data into a master, single data frame with a country indicator that you can use for the color argument of line plots:

# CREATE LARGE DATAFRAME FROM MATRIX LIST
lom_df <- do.call(rbind, lapply(lom, data.frame))

# CREATE COLUMN NAMES FROM ROWNAMES
lom_df$country <- gsub("\\..*$", "", row.names(lom_df))
row.names(lom_df) <- NULL

# EXTRACT ALL FOUR Y COLUMN NAMES (MINUS Year AND country)
y_columns <- colnames(lom_df[2:(ncol(lom_df)-1)])

# PRODUCE LIST OF FOUR PLOTS EACH WITH COUNTRY LINES
plot_list <- lapply(y_columns, function(col)
  ggplot(lom_df, aes_string(x="Year", y=col, color="country")) +
     geom_line()
)

# OUTPUT EACH LIST 
plot_list
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • Thanks heaps! That was indeed _parfait_ ! :D Concise and worked straight out of the box. I have a couple of follow up questions, which may be related. Why is lapply required? I think I get the same with `do.call(rbind, lom)`? On the regex, I see it works... but I am quite flummoxed, so I'm asking a separate question for lack of space here. – Fons MA Aug 21 '18 at 23:49
  • Sorry, after much mucking around I actually do understand the regex. Is the $ always necessary at the end? – Fons MA Aug 22 '18 at 01:13
  • Great to hear solution worked! Being a loop, `lapply` is to iterate through the columns and the `$` in regex is to indicate the end of the string and may not be necessary in some cases. – Parfait Aug 22 '18 at 15:09
0

This solution uses package ggplot2.

It has two steps, data preparation and plotting.

First of all the list must be transformed into one large data frame, with a column as an id column. I have searched SO for a function that does this but couldn't find one so here it goes.

rbindWithID <- function(x, id.name = "ID", sep = "."){
    if(is.null(names(x))) names(x) <- paste(id.name, seq_along(x), sep = sep)
    res <- lapply(names(x), function(nm){
        DF <- x[[nm]]
        DF[[id.name]] <- nm
        x[[nm]] <- cbind(DF[ncol(DF)], DF[-ncol(DF)])
        x[[nm]]
    })
    do.call(rbind, res)
}

lom_df <- rbindWithID(lom, "Country")

Now reshape the data frame from wide to long.

molten <- reshape2::melt(lom_df, id.vars = c("Country", "Year"))

Finally, plot it.

library(ggplot2)

ggplot(molten, aes(Year, value, colour = Country)) +
    geom_line() +
    facet_wrap(~ variable)

enter image description here

DATA.

set.seed(1234)    # Make the results reproducible

lom <- lapply(1:4, function(i){
    data.frame(
        Year = 2000:2008,
        AllKey = runif(9, 1, 2),
        AllSub = runif(9, 0, 2),
        SelKey = runif(9, 1, 2),
        SelSub = runif(9, 0, 2)
    )
})

names(lom) <- c("AT", "DE", "FR", "PT")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66