1

I have a list of boroughs and a list of localities (like this one). Each locality lies in exactly one borough. What's the best way to store this kind of hierarchical structure in R, considerung that I'd like to have a convenient and readable way of accessing these, and using this list to accumulate data on the locality-level to the borough level.

I've come up with the following:

localities <- list("Mitte" = c("Mitte", "Moabit", "Hansaviertel", "Tiergarten", "Wedding", "Gesundbrunnen",
                   "Friedrichshain-Kreuzberg" = c("Friedrichshain", "Kreuzberg")
                  )

But I am not sure if this is the most elegant and accessible way.

If I wanted to assign additional information on the localitiy-level, I could do that by replacing the c(...) by some other call, like rbind(c('0201', '0202'), c("Friedrichshain", "Kreuzberg")) if I wanted to add additional information to the borough-level (like an abbreviated name and a full name for each list), how would I do this?

Edit: For example, I'd like to condense a table like this into a borough-wise version.

Roland
  • 517
  • 8
  • 25
  • Roland, you should make a small **[reproducible example of your problem](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)**. – BrodieG Mar 27 '14 at 14:48
  • @BrodieG : You mean of my example condensing? – Roland Mar 27 '14 at 14:50
  • Possibly, although that one is complicated because the data there appears to be a distance matrix so is not as simple as say finding the population by borough (so there are two questions there, how do I condense data, and how do I condense data in an element wise "distance" matrix). If I were you, I would generate a version of the distance matrix that has 9 cities, and then provide the mapping of those 9 cities to each borough (and probably pick cities with ASCII character names). – BrodieG Mar 27 '14 at 14:52
  • R throws an error when I run your code, because there's a misplaced comma. Here's how to make a 2-element list with your data: `localities <- list("Mitte" = c("Mitte", "Moabit", "Hansaviertel", "Tiergarten", "Wedding", "Gesundbrunnen"), "Friedrichshain-Kreuzberg" = c("Friedrichshain", "Kreuzberg"))` – rsoren Mar 27 '14 at 14:53
  • You might also find it useful to extract the data directly from the page using `readHTMLTables()`. I'll put an example below. – tcash21 Mar 27 '14 at 14:54
  • Removed the extra comma. I'm going to provide a nice reproducible example (without any umlauts in the names). The task is rather simple at the moment, but I'd like to choose the option which offers the best compatibility for the future. – Roland Mar 27 '14 at 14:57

3 Answers3

1

Hard to know without having a better view on how you intend to use this, but I would strongly recommend moving away from a nested list structure to a data frame structure:

library(reshape2)
loc.df <- melt(localities)               

This is what the molten data looks like:

           value                       L1
1          Mitte                    Mitte
2         Moabit                    Mitte
3   Hansaviertel                    Mitte
4     Tiergarten                    Mitte
5        Wedding                    Mitte
6  Gesundbrunnen                    Mitte
7 Friedrichshain Friedrichshain-Kreuzberg
8      Kreuzberg Friedrichshain-Kreuzberg

You can then use all the standard data frame and other computations:

loc.df$population <- sample(100:500, nrow(loc.df))    # make up population
tapply(loc.df$population, loc.df$L1, mean)            # population by borough

gives mean population by Borough:

Friedrichshain-Kreuzberg                    Mitte 
                278.5000                 383.8333     

For more complex calculations you can use data.table and dplyr

Community
  • 1
  • 1
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • What are the up/downsides of using lists? I'll add an example. – Roland Mar 27 '14 at 14:40
  • @Roland Most of the R tools that allow simple and efficient manipulation of tabular data work on table like (read data frames) objects, not lists. You can do it with lists, but you'll be bending over backwards to adapt the existing tools. It could be that your specific application will benefit from list like treatment, but again, without a better view of what you're trying to achieve, it's hard for me to know. – BrodieG Mar 27 '14 at 14:44
1

You can extract all of this data directly into a data.frame using the XML library.

library(XML)
theurl <- "http://en.wikipedia.org/wiki/Boroughs_and_localities_of_Berlin#List_of_localities"
tables<-readHTMLTable(theurl)

boroughs<-tables[[1]]$Borough
localities<-tables[c(3:14)]
names(localities) <- as.character(boroughs)
all<-do.call("rbind", localities)
tcash21
  • 4,880
  • 4
  • 32
  • 39
1

@Roland, I think you will find data frames superior to lists for the reasons cited earlier, but also because there is other data on the web page you reference. Loading to a data frame will make it easy to go further if you wish. For example, making comparisons based on population density or other items provided "for free" on the page will be a snap from a data frame.

spatton
  • 41
  • 1