1

I am trying to use the stack command on data loaded from two text csv files I want to compare. I want to use crossprod(table(stack(data))) to see how many strings the different columns have in common (In the example it would be "dog" and "cat"). In this example the csv files contain columns with different numbers of strings.

> one<-read.delim("one.csv",sep="\t",header=F)
> two<-read.delim("two.csv",sep="\t",header=F)

> one
       V1
1     dog
2 hamster
3   mouse
4     cat

> two
      V1
1    dog
2    cat
3 rabbit

> data<-list(one,two)
> stack(data)
Error in stack.default(data) : at least one vector element is required

If I manually create lists with one<-c("dog",...) it works. What am I doing wrong, and how can I do this right?

zx8754
  • 52,746
  • 12
  • 114
  • 209
aldorado
  • 4,394
  • 10
  • 35
  • 46
  • what exactly are you trying to do with your files? How are you going to compare them? – Paulo E. Cardoso May 23 '14 at 08:43
  • why dont you try `df <- data.frame(one, two); stack(df)` – Paulo E. Cardoso May 23 '14 at 08:49
  • @PauloCardoso data.frame does not work because the columns have different lenghts. I need to use stack because I want to compare how many strings the columns of the different files have in common. I want to use `crossprod(table(stack(data)))` to compare. – aldorado May 23 '14 at 09:05
  • Do you have only two data.frames and both with only one column? In that case `x <- as.character(one[,1]); y <- as.character(two[,1]); sum(x %in% y)` should work – Andromeda May 23 '14 at 09:12
  • @Andromeda I have tab delimited csv files containing several columns of which I need to compare the one providing different nominal values. So the data frames I get have each only one column with different nominal values. – aldorado May 23 '14 at 09:15
  • you could try to `plyr::rbind.fill(...)` or see [here](http://stackoverflow.com/a/17309310/640783) – Paulo E. Cardoso May 23 '14 at 09:20
  • @Andromeda This approach seems to work on the example. I will try it with my original data. Thank you all! – aldorado May 23 '14 at 09:28

1 Answers1

3

You have a few problems here that you need to address in order to get stack to work as you intend it to.

  1. stack will not do anything to factor variables.
  2. stack works with named lists.
  3. stack does not work with nested lists, and a data.frame is a special type of list.

Let's look at addressing each of these:

Make sure that your read.table includes stringsAsFactors = FALSE. Here, I'm creating two data.frames with that argument included.

one <- data.frame(V1 = c("dog", "hamster", "mouse", "cat"), stringsAsFactors=FALSE)
two <- data.frame(V1 = c("dog", "cat", "rabbit"), stringsAsFactors=FALSE)

Make sure that your list is a named list.

data <- list(one = one, two = two)

Two requirements down... test. Error remains....

stack(data)
# Error in stack.default(data) : at least one vector element is required

"Flatten" your list, but not fully--use recursive = FALSE. Test with stack:

stack(unlist(data, recursive=FALSE))
#    values    ind
# 1     dog one.V1
# 2 hamster one.V1
# 3   mouse one.V1
# 4     cat one.V1
# 5     dog two.V1
# 6     cat two.V1
# 7  rabbit two.V1

From there, you can do your t/crossprod:

tcrossprod(table(stack(unlist(data, recursive=FALSE))))
#          values
# values    cat dog hamster mouse rabbit
#   cat       2   2       1     1      1
#   dog       2   2       1     1      1
#   hamster   1   1       1     1      0
#   mouse     1   1       1     1      0
#   rabbit    1   1       0     0      1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485