-1

I need to shuffle the rows of a data frame, turning this:

A foo
B bar
C baz

into this:

B foo
C bar
A baz

I.e., the first column should be shuffled while keeping the the rest intact. I am doing this using sample() from the kimisc library as suggested here. A minimal working code example is:

>df<-read.table("file1", header=F, skip=1)
>library(kimisc)
>names<-read.table("file2")
>df1<- transform(sample(df,size=nrow(names)),V1=names)
>df1
  V1    V2
5  A 21266
8  C 22109
7  F 17971
1  J 11137

Where file1 is

Name Value
A 28463
B 11137
C 24966
D 24611
E 14980
F 21266
G 23441
H 17971
I 22109
J 31746

and file2 is:

A
C
F
J

I then want to write this data frame to a file and my expected output is

A 21266
C 22109
F 17971
J 11137

However, loading the kimisc library provides its own sample function which (unlike the vanilla) shuffles a data frame the way I want it to but seems to screw up the printing:

write.table(df1,"file3", quote=F, sep='\t', col=FALSE)

This produces the following output:

5   1:4 21266
8   1:4 22109
7   1:4 17971
1   1:4 11137

If I use the vanilla sample, the data frame generated is printed as expected but it is not shuffled in the way I need (ie, columns instead of rows are shuffled).

So, how can I use sample from kimisc which allows me to sample the rows and not the columns of a data frame, and still print it in the way write.table would work with a data frame returned by base::sample?


PS.I am using a list of names because I am actually trying to assign random values from a file containing 143558041 lines to a subset (39953) of the names mentioned in that file.


As requested, the output of dput(df1) is

> dput(df1)
structure(list(V1 = structure(list(V1 = structure(1:4, .Label = c("A", 
"C", "F", "J"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-4L)), V2 = c(24611L, 14980L, 22109L, 21266L)), .Names = c("V1", 
"V2"), row.names = c(3L, 4L, 8L, 5L), class = "data.frame")
Community
  • 1
  • 1
terdon
  • 3,260
  • 5
  • 33
  • 57
  • Could you please post the output of `dput(df1)`? – krlmlr Aug 28 '13 at 16:49
  • @krlmlr thanks for helping, I've added the output you requested. It appears like your sample function returns a list linked to specific rows of the object that was sampled, is that correct? – terdon Aug 28 '13 at 16:52

1 Answers1

1

I have reworked your input to a reproducible example:

library(kimisc)

## Loading required package: Rcpp

## Loading required package: logging


set.seed(20130828L)

df <- read.table(text="Name Value
A 28463
B 11137
C 24966
D 24611
E 14980
F 21266
G 23441
H 17971
I 22109
J 31746", header=F, skip=1)

names <- read.table(text="A
C
F
J")

df.s <- sample.data.frame(df,size=nrow(names))
df1<- transform(df.s,V1=names)

dput(df1)

## structure(list(V1 = structure(list(V1 = structure(1:4, .Label = c("A", 
## "C", "F", "J"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
## -4L)), V2 = c(14980L, 21266L, 17971L, 24966L)), .Names = c("V1", 
## "V2"), row.names = c(5L, 6L, 8L, 3L), class = "data.frame")

As you can see, the resulting dput output is similar to yours.

Actually, names is a data frame which is embedded into another data frame. This has nothing to do with problems in sample.data.frame. Two possible remedies:

  • Use names$V1 in the transform call
  • Read names as a vector and not as a table (careful with nrow(names))
Community
  • 1
  • 1
krlmlr
  • 25,056
  • 14
  • 120
  • 217
  • Argh! Thanks, using `names$V1` works perfectly, and thanks again for your package, it is exactly what I needed. To improve any future questions, if I understand the question you linked to correctly, a "reproducible example" is one that contains the seed if a random process is used and where the data are loaded directly in the code with no need for external files. So I should avoid using files and instead hard code my data in the minimal example? – terdon Aug 28 '13 at 17:09
  • 2
    @terdon: Yes, it's much easier if the one who wants to answer can just paste and run the code. I have generated my output with the [`soR`](https://github.com/krlmlr/scriptlets/blob/master/soR) script. – krlmlr Aug 28 '13 at 17:12