6

a questions from a relative n00b: I’d like to split a vector into three vectors of different lengths, with the values assigned to each vector at random. For example, I’d like to split the vector of length 12 below into vectors of length 2,3, and 7

I can get three equal sized vectors using this:

test<-1:12
split(test,sample(1:3))

Any suggestions on how to split test into vectors of 2,3, and 7 instead of three vectors of length 4?

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
Emilio M. Bruna
  • 297
  • 1
  • 3
  • 14

4 Answers4

14

You could use rep to create the indices for each group and then split based on that

split(1:12, rep(1:3, c(2, 3, 7)))

If you wanted the items to be randomly assigned so that it's not just the first 2 items in the first vector, the next 3 items in the second vector, ..., you could just add call to sample

split(1:12, sample(rep(1:3, c(2, 3, 7))))

If you don't have the specific lengths (2,3,7) in mind but just don't want it to be equal length vectors every time then SimonO101's answer is the way to go.

Dason
  • 60,663
  • 9
  • 131
  • 148
  • I would have thought: `split(sample(1:12), rep(1:3, c(2, 3, 7)))`. Permute first, then split. But I guess it comes out the same in the end. I didn't like the fact that you method seems to leave the samples ordered. – IRTFM Aug 23 '13 at 15:50
  • @DWin - Guess it depends on what you want. The way I have it now the vectors will be sorted (or at least in the original order). If that isn't what they want then your way would be better. – Dason Aug 23 '13 at 15:52
  • Thanks! This worked great.Appreciate the speedy responses from you all. – Emilio M. Bruna Aug 23 '13 at 15:59
5

How about using sample slightly differently...

set.seed(123)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )

#$`1`
#[1] 1 6

#$`2`
#[1]  3  7  9 10 12

#$`3`
#[1]  2  4  5  8 11

set.seed(1234)
test<-1:12
split( test , sample(3, 12 , repl = TRUE) )

#$`1`
#[1] 1 7 8

#$`2`
#[1]  2  3  4  6  9 10 12

#$`3`
#[1]  5 11

The first argument in sample is the number of groups to split the vector into. The second argument is the number of elements in the vector. This will randomly assign each successive element into one of 3 vectors. For 4 vectors just do split( test , sample(4, 12 , repl = TRUE) ).

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
1

It is easier than you think. To split the vector in three new randomly chosen sets run the following code:

test <- 1:12
split(sample(test), 1:3)

By doing so any time you run your this code you would get a new random distribution in three different sets(perfect for k-fold cross validation).

You get:

> split(sample(test), 1:3)
$`1`
[1] 5 8 7 3

$`2`
[1]  4  1 10  9

$`3`
[1]  2 11 12  6

> split(sample(test), 1:3)
$`1`
[1] 12  6  4  1

$`2`
[1] 3 8 7 5

$`3`
[1]  9  2 10 11
theLudo
  • 127
  • 4
0

You could use an auxiliary vector to format the way you want to split your data. Example:

Data <- c(1,2,3,4,5,6)

Format <- c("X","Y","X","Y","Z,"Z")

output <- split(Data,Format)

Will generate the output:

$X
[1] 1 3

$Y
[1] 2 4

$Z
[1] 5 6
darmat
  • 698
  • 4
  • 10