3

I would like to add a column containing the year (found in the file name) to each column. I've spent several hours googling this, but can't get it to work. Am I making some simple error?

Conceptually, I'm making a list of the files, and then using lapply to calculate a column for each file in the list.

I'm using data from Census OnTheMap. Fresh download. All files are named thus: "points_2013" "points_2014" etc. Reading in the data using the following code:

library(maptools)
library(sp)
shps <- dir(getwd(), "*.shp")
for (shp in shps) assign(shp, readShapePoints(shp)) 
# the assign function will take the string representing shp
# and turn it into a variable which holds the spatial points data

My question is very similar to this one, except that I don't have a list of file names--I just want extract the entry in a column from the file name. This thread has a question, but no answers. This person tried to use [[ instead of $, with no luck. This seems to imply the fault may be in cbind vs. rbind..not sure. I'm not trying to output to csv, so this is not fully relevant.

This is almost exactly what I am trying to do. Adapting the code from that example to my purpose yields the following:

dat <- ls(pattern="points_")
dat
ldf = lapply(dat, function(x) {
  # Add a column with the year
  dat$Year = substr(x,8,11)
  return(dat)
})
ldf
points_2014.shp$Year

But the last line still returns NULL!

From this thread, I adapted their solution. Omitting the do.call and rbind, this seems to work:

lapply(points,
  function(x) {
    dat=get(x)
    dat$year = sub('.*_(.*)$','\\1',x)
    return(dat)
    })
points_2014.shp$year

But the last line returns a null.

Starting to wonder if there is something wrong with my R in some way. I tested it using this example, and it works, so the trouble is elsewhere.

# a dataframe
a <- data.frame(x = 1:3, y = 4:6)
a
# make a list of several dataframes, then apply function 
#(change column names, e.g.):
my.list <- list(a, a)
my.list <- lapply(my.list, function(x) {
  names(x) <- c("a", "b") 
  return(x)})
my.list

After some help from this site, my final code was:

#-------takes all the points files, adds the year, and then binds them together
points2<-do.call(rbind,lapply(ls(pattern='points_*'),
                              function(x) {
                                dat=get(x)
                                dat$year = substr(x,8,11)
                                dat
                              }))
points2$year
names(points2)

It does, however, use an rbind, which is helpful in the short term. In the long term, I will need to split it again, and use a cbind, so I can substract two columns from each other.

Community
  • 1
  • 1
Mox
  • 511
  • 5
  • 15
  • 2
    you're looping over parts of `dat` so you need to return the part and not the whole `dat` – rawr Jan 17 '17 at 03:49
  • Your last example also works. Take a look at `my.list` after you've run it - the `colnames` have changed. – thelatemail Jan 17 '17 at 03:53
  • Ok, excellent to know. In the last example, I'm looking at the wrong file: a vs my.list. I will edit it accordingly. – Mox Jan 17 '17 at 03:58
  • Ok, I got it. I had not designated anything to save what I was lapply'ing and cbind'ing. So it was running, but not modifying anything. – Mox Jan 17 '17 at 04:14

1 Answers1

4

I use the following Code:

for (i in names.of.objects){
  temp <- get(i)
  # do transformations on temp
  assign(i, temp)
}

This works, but is definitely not performant, since it does assignments of the whole data twice in a call by value manner.

snaut
  • 2,261
  • 18
  • 37