2

I have variables that are named team.1, team.2, team.3, and so forth.

First of all, I would like to know how to go through each of these and assign a data frame to each one. So team.1 would have data from one team, then team.2 would have data from a second team. I am trying to do this for about 30 teams, so instead of typing the code out 30 times, is there a way to cycle through each with a counter or something similar?

I have tried things like

    vars = list(sprintf("team.x%s", 1:33)))

to create my variables, but then I have no luck assigning anything to them.

Along those same lines, I would like to be able to run a function I made for cleaning and sorting the individual data sets on all of them at once.

For this, I have tried a for loop

    for (j in 1:33) {
      assign(paste("team.",j, sep = ""), cleaning1(paste("team.",j, sep =""), j))
    }

where cleaning1 is my function, with two calls.

    cleaning1(team.1, 1)

This produces the error message

Error in who[, -1] : incorrect number of dimensions

So obviously I am hoping the loop would count through my data sets, and also input my function calls and reassign my datasets with the newly cleaned data.

Is something like this possible? I am a complete newbie, so the more basic, the better.

Edit:

cleaning1:

cleaning1 = function (who, year) {
  who[,-1]
  who$SeasonEnd = rep(year, nrow(who))
  who = (who[-nrow(who),])
  who = tbl_df(who)
  for (i in 1:nrow(who)) {
    if ((str_sub(who$Team[i], -1)) == "*") {
      who$Playoffs[i] = 1
      } else {
      who$Playoffs[i] = 0
      }
     }
  who$Team = gsub("[[:punct:]]",'',who$Team)
  who = who[c(27:28,2:26)]
  return(who)
  } 

This works just fine when I run it on the data sets I have compiled myself.

To run it though, I have to go through and reassign each data set, like this:

team.1 = cleaning1(team.1, 1)   

team.2 = cleaning1(team.2, 2)   

So, I'm trying to find a way to automate that part of it.

Matt Collins
  • 69
  • 1
  • 7
  • Could you show `cleaning1`? Also, I guess the `vars` would be either `vars = sprintf("team.x%s", 1:33)` or `vars <- as.list(sprintf("team.x%s", 1:33))` It may be better to work with a single list of all the datasets rather than creating mutliple dataset objects in the global environment. – akrun Apr 19 '15 at 06:29
  • Alright, I added it to my original post. I guess that is etiquette, I am also new to Stack Overflow haha. – Matt Collins Apr 19 '15 at 06:43
  • 1
    It is just to make your code reproducible so that others can run it easily. I noticed that there are other variables in the `Cleaning1` which is not clear to me. A reproducible example is always preferred. – akrun Apr 19 '15 at 06:47
  • 1
    Please also check here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Apr 19 '15 at 06:49
  • 1
    Ok, I see. I'll try to put together something reproducible and post. – Matt Collins Apr 19 '15 at 06:59

1 Answers1

1

I think your problem would be better solved by using a list of data frames instead of many variables containing one data frame each.

You do not say where you get your data from, so I am not sure how you would create the list. But assuming you have your data frames already stored in the variables team.1 etc., you could generate the list with

team.list <- list(team.1, team.2, ...,team.33)

where the dots stand for the variables that I did not write explicitly (you will have to do that). This is tedious, of course, and could be simplified as follows

team.list <- do.call(list,mget(paste0("team.",1:33)))

The paste0 command creates the variable names as strings, mget converts them to the actual objects, and do.call applies the list command to these objects.

Now that you have all your data in a list, it is much easier to apply a function on all of them. I am not quite sure how the year argument should be used, but from your example, I assume that it just runs from 1 to 33 (let me know, if this is not true and I'll change the code). So the following should work:

team.list.cleaned <- mapply(cleaning1,team.list,1:33)

It will go through all elements of team.list and 1:33 and apply the function cleaning1 with the elements as its arguments. The result will again be a list containing the output of each call, i.e.,

list( cleaning1(team.list[[1]],1), cleaning1(team.list[[2]],2), ...)

Since you are now to R I strongly recommend that you read the help on the apply commands (apply, lapply, tapply, mapply). There are very useful and once you got used to them, you will use them all the time...

There is probably also a simple way to directly generate the list of data frames using lapply. As an example: if the data frames are read in from files and you have the file names stored in a character vector file.names, then something along the lines of

team.list <- lapply(file.names,read.table)

might work.

Stibu
  • 15,166
  • 6
  • 57
  • 71
  • Thanks for the help! You are correct, 1:33 are the years, so year 1, year 2, year 3, etc... I played around with trying to use lists but I could never get anything to work properly that way. I kept getting an error message for incorrect dimensions. However, what you described above is exactly what I am trying to do, so I will go try what you posted above and try to figure out the apply commands! – Matt Collins Apr 19 '15 at 17:36
  • So, I tried this, and I was successfully able to put my 33 data frames into a list. However, when I run the function on _team.list_ `team.list.cleaned <- mapply(cleaning1,team.list,1:33)`, instead of return a list of my 33 data frames, it returns a list of 891. The data frames have 27 variable each, 33 x 27 = 891, so it is taking my data out of the data frames and just returning a long list of all my variables. Any idea how to return in the form it was in after using `team.list <- do.call(list,mget(paste0("team.",1:33)))`? – Matt Collins Apr 22 '15 at 16:27
  • Can you try `team.list.cleaned <- mapply(cleaning1,team.list,1:33,SIMPLIFY=FALSE)` instead? – Stibu Apr 22 '15 at 16:47
  • Ignore my previous comment if you saw it. That worked, you are the best! Thank you. – Matt Collins Apr 22 '15 at 17:14
  • You are most welcome. Just to be sure: it worked without `SIMPLIFY=FALSE`? If not, I should change my answer, so please let me know. – Stibu Apr 22 '15 at 17:16