Sample random column in dataframe

Question

I have the following code: model$data

model$data
[[1]]
                    Category1  Category2  Category3 Category4
3555                        1          0          0         0
6447                        1          0          0         0
5523                        1          0          1         0
7550                        1          0          1         0
6330                        1          0          1         0 
2451                        1          0          0         0
4308                        1          0          1         0
8917                        0          0          0         0
4780                        1          0          1         0
6802                        1          0          1         0
2021                        1          0          0         0
5792                        1          0          1         0
5475                        1          0          1         0 
4198                        1          0          0         0
223                         1          0          1         0
4811                        1          0          1         0
678                         1          0          1         0

I am trying to use this formula to get an index of the column names:

sample(colnames(model$data), 1)

But I receive the following error message:

 Error in sample.int(length(x), size, replace, prob) : 
  invalid first argument

Is there a way to avoid that error?

Possible duplicate of [Random rows in dataframe in R](http://stackoverflow.com/questions/8273313/random-rows-in-dataframe-in-r). There are tons of similar existing questions. — smci, Mar 20 '17 at 06:31
I expect you're trying to sample random rows, not columns? Also, it helps if you tell us what `model$data` is: it looks like a list with one element which is a dataframe: `model$data[[1]]`. Rather than a plain dataframe. — smci, Mar 20 '17 at 06:33
Thanks a lot, but the question is different! I am actually trying to create an index of the column names for model$data. — sebastian-montero, Mar 20 '17 at 14:39
It's the same answer as I cited. You just sample from `1:ncol(df)` instead of `1:nrow()`, and then use those column indices on the RHS of the comma in `df[, ...]` — smci, Mar 20 '17 at 14:47
Your `model$data` appears to be a list containing a data frame, not a data frame as such. — Hong Ooi, Mar 20 '17 at 14:57

Hong Ooi · Accepted Answer · 2017-03-20T15:20:29.660

3

Notice this?

model$data
[[1]]

The [[1]] means that model$data is a list, whose first component is a data frame. To do anything with it, you need to pass model$data[[1]] to your code, not model$data.

sample(colnames(model$data[[1]]), 1)

edited Mar 20 '17 at 15:20

answered Mar 20 '17 at 14:56

Hong Ooi

56,353
13
134
187

smci · Answer 2 · 2019-02-13T18:56:05.230

2

This seems to be a near-duplicate of Random rows in dataframes in R and should probably be closed as duplicate. But for completeness, adapting that answer to sampling column-indices is trivial:

you don't need to generate a vector of column-names, only their indices. Keep it simple.
sample your col-indices from 1:ncol(df) instead of 1:nrow(df)
then put those column-indices on the RHS of the comma in df[, ...]

df[, sample(ncol(df), 1)]
the 1 is because you apparently want to take a sample of size 1.
one minor complication is that your dataframe is model$data[[1]], since your model$data looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign df <- model$data[[1]]
finally, if you really really want the sampled column-name(s) as well as their indices: samp_col_idxs <- sample(ncol(df), 1) samp_col_names <- colnames(df) [samp_col_idxs]

edited Feb 13 '19 at 18:56

answered Mar 20 '17 at 14:50

smci

32,567
20
113
146

This: df[, sample(ncol(df), 1)] doesn't work if you want to get a value from a different column for each row. It always returns the same column (no randomization of columns between rows. See example: iris[,sample(ncol(iris),1)]. – Omri374 Feb 12 '19 at 06:47
@Omri374: the answer I wrote was to the [originally-asked question, revision #1 3/20/2017 "Error when using sample formula in R"](https://stackoverflow.com/posts/42895794/revisions). After my answer, someone revised the question and OP added clarifications. Nowhere does the question say *"get a value from a different column for each row"*; it said *"create an index of the column names for model$data"*. It's not even like subsequent revisions invalidated this; the question was unclear and the OP did not clarify it. It's grossly unreasonable for you to essentially call for downvoting it. – smci Feb 12 '19 at 19:48
@Omri374: as you can clearly see here, this answer represents my best-guess effort to respond to the OP and help them clarify what output they wanted based on what they wrote. They never even bothered to reply. Or update their own question. How is that somehow my fault? You can even see that I wrote *"your `model$data` looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign `df <- model$data[[1]]`"*. That's essentially the same as the accepted answer. Plus 300% more. – smci Feb 12 '19 at 20:01
I understand now, but SO doesn't allow me to upvote this unless the answer is edited. If you'd like, please do a minor edit and I'll remove the downvote. I agree the answer is unclear, didn't mean to be harsh here. – Omri374 Feb 13 '19 at 12:13

Sample random column in dataframe

2 Answers2