1

here is how I created number of data sets with names data_1,data_2,data_3 .....and so on for initial dim(data)<- 500(rows) 17(column) matrix

for ( i in 1:length(unique( data$cluster ))) {
  assign(paste("data", i, sep = "_"),subset(data[data$cluster == i,]))
}

upto this point everything is fine

now I am trying to use these inside the other loop one by one like

for (i in 1:5) {
  data<-  paste(data, i, sep = "_")
}

however this is not giving me the data with required format any help will be really appreciated.

Thank you in advance

Sotos
  • 51,121
  • 6
  • 32
  • 66
R_Learner
  • 13
  • 2
  • Use `data <- get(paste(data, i, sep = "_"))` – GGamba Mar 02 '17 at 13:04
  • thank you for so quick response. I am expecting the output also as a matrix such that dim(get(paste(data, i, sep = "_"))) <- matrix(m*n) above code is not helping – R_Learner Mar 02 '17 at 13:08
  • Your second for loop tells me that you are trying to combine the first 5 datasets into one? You can either write `rbind(data_1, data_2, data_3, ...)` or first store your datasets in a list and use `do.call(rbind, dataList)` to combine your datasets. I hope I am understanding this correctly. – acylam Mar 02 '17 at 13:21
  • @R_Learner If you need help with a specific problem, you need to provide a minimal reproducible example as explained here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys Mar 02 '17 at 13:24
  • '@ Joris Meys' Thanks a lot for your suggestion from next time I will definitely keep these standards in Mind. Thank you – R_Learner Mar 03 '17 at 05:47

1 Answers1

2

Let me give you a tip here: Don't just assign everything in the global environment but use lists for this. That way you avoid all the things that can go wrong when meddling with the global environment. The code you have in your question, will overwrite the original dataset data, so you'll be in trouble if you want to rerun that code when something went wrong. You'll have to reconstruct the original dataframe.

Second: If you need to split a data frame based on a factor and carry out some code on each part, you should take a look at split, by and tapply, or at the plyr and dplyr packages.

Using Base R

With base R, it depends on what you want to do. In the most general case you can use a combination of split() and lapply or even a for loop:

mylist <- split( data, f = data$cluster)
for(mydata in mylist){
  head(mydata)
  ...
}

Or

mylist <- split( data, f = data$cluster)
result <- lapply(mylist, function(mydata){
  doSomething(mydata)
})

Which one you use, depends largely on what the result should be. If you need some kind of a summary for every subset, using lapply will give you a list with the results per subset. If you need this for a simulation or plotting or so, you better use the for loop.

If you want to add some variables based on other variables, then the plyr or dplyr packages come in handy

Using plyr and dplyr

These packages come especially handy if the result of your code is going to be an array or data frame of some kind. This would be similar to using split and lapply but then in a way Hadley approves of :-)

For example:

library(plyr)
result <- ddply(data, .(cluster),
                function(mydata){
                  doSomething(mydata)
                })

Use dlply if the result should be a list.

Joris Meys
  • 106,551
  • 31
  • 221
  • 263