1

I have 9880 records in a data frame, I am trying to split it into 9 groups of 1000 each and the last group will have 880 records and also name them accordingly. I used for-loop for 1-9 groups but manually for the last 880 records, but i am sure there are better ways to achieve this,

library(sqldf)
for (i in 0:8)
{
assign(paste("test",i,sep="_"),as.data.frame(final_9880[((1000*i)+1):(1000*(i+1)),   (1:53)]))
}
test_9<- num_final_9880[9001:9880,1:53]

also am unable to append all the parts in one for-loop!

#append all parts
all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)

Any help is appreciated, thanks!

mnel
  • 113,303
  • 27
  • 265
  • 254
Vishesh Tayal
  • 25
  • 1
  • 7
  • Let me get this straight. You're trying to move per 1000 observations to an individual object (test_1, test_2...) and then at the end, rbind this together? Wouldn't you get the same object as when you started? – Roman Luštrik Jul 23 '12 at 07:35

2 Answers2

2

No for loop required -- use split

data <- data.frame(a = 1:9880, b = sample(letters, 9880, replace = TRUE))

splitter <- (data$a-1) %/% 1000

.list <- split(data, splitter)

lapply(0:9, function(i){
  assign(paste('test',i,sep='_'), .list[[(i+1)]], envir = .GlobalEnv)
  return(invisible())
})

all_9880<-rbind(test_0,test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9)

identical(all_9880,data)
## [1] TRUE
mnel
  • 113,303
  • 27
  • 265
  • 254
2

A small variation on this solution

ls <- split(final_9880, rep(0:9, each = 1000, length.out = 9880))  # edited to Roman's suggestion
for(i in 1:10) assign(paste("test",i,sep="_"), ls[[i]])

Your command for binding should work.

Edit

If you have many dataframes you can use a parse-eval combo. I use the package gsubfn for readability.

library(gsubfn)
nms <- paste("test", 1:10, sep="_", collapse=",")
eval(fn$parse(text='do.call(rbind, list($nms))'))

How does this work? First I create a string containing the comma-separated list of the dataframes

> paste("test", 1:10, sep="_", collapse=",")
[1] "test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10"

Then I use this string to construct the list

list(test_1,test_2,test_3,test_4,test_5,test_6,test_7,test_8,test_9,test_10)

using parse and eval with string interpolation.

eval(fn$parse(text='list($nms)'))

String interpolation is implemented via the fn$ prefix of parse, its effect is to intercept and substitute $nms with the string contained in the variable nms. Parsing and evaluating the string "list($mns)" creates the list needed. In the solution the rbind is included in the parse-eval combo.

EDIT 2

You can collect all variables with a certain pattern, put them in a list and bind them by rows.

do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))

ls finds all variables with a pattern "test_"

sapply retrieves all those variables and stores them in a list

do.call flattens the list row-wise.

Community
  • 1
  • 1
Ryogi
  • 5,497
  • 5
  • 26
  • 46
  • Thanks! i know the rbind works..but suppose the file is of million records and i am splitting it into 100 parts...it will be tiresome to rbind individually, is there any way to append them using for loop or something? – Vishesh Tayal Jul 23 '12 at 05:44
  • Why do you need to bind them anyway? Surely the it will return the original data? – mnel Jul 23 '12 at 05:49
  • mnel: acutally am treating the data splits and then appending it @RYogi: yup, this did it. thanks alot ppl :) – Vishesh Tayal Jul 23 '12 at 05:56
  • @RYogi: i really want to vote up your answer but #reps restricting me :) thanks for the backend edit – Vishesh Tayal Jul 23 '12 at 06:09
  • zodiac: no worries, I am glad you gave me an excuse to pull out `eval(fn$parse())`. – Ryogi Jul 23 '12 at 06:12
  • 2
    You could do `rep(0:9, each = 1000, length.out = 9880)`. – Roman Luštrik Jul 23 '12 at 07:18
  • @RomanLuštrik, interesting. How can you use that in this context? `list(ls(pattern = "test_"))` returns a list of strings, the names of the variables matching the pattern. – Ryogi Jul 23 '12 at 07:40
  • @RYogi, I would approach this like so: `test_1 <- runif(10); test_2 <- runif(10); test_3 <- runif(10); do.call("rbind", sapply(ls(pattern = "test_"), get, simplify = FALSE))` – Roman Luštrik Jul 23 '12 at 11:30