1

I have a data frame Likes for n users and m likes, with userid and likeid1 : likeidm as my variables. The specific userids are stored in column 1 (Likes$userid) and the cells contain 1 or 0 depending on wether the user liked the page with the respective likeid or not.

library(Matrix)

Likes <- data.frame(userid=c("n1","n2"),
                      m1=c(0,1),
                      m2=c(0,0),
                      m3=c(0,0),
                      m4=c(1,0)
                      )

Likes [1,1:5]

  userid       m1          m2          m3          m4
1 n1           0           0           0           1

Now, I want to create a sparse matrix. How would I specify j in the following code? I know it is not right the way I did it, since technically like ids are not in a column but already specified as variables in my data frame.

sM_Likes <- sparseMatrix(Likes, i=likes$userid, j=1,c(2:ncol(Likes)), x=1)

Thanks in advance (and please apologize the very basic question).

Hack-R
  • 22,422
  • 14
  • 75
  • 131
Sarah
  • 137
  • 9
  • Welcome to StackOverflow! For a first-time question in the r tag, this was a nice attempt. However, for code debugging please always ask with [reproducible](https://stackoverflow.com/q/5963269/1422451) code/data per the [MCVE](https://stackoverflow.com/help/mcve) and [`r`](https://stackoverflow.com/tags/r/info) tag description, with the desired output. You may wish to use `dput()` for sharing per the tag description. It is an easy way to reproduce R data. You can put the messy `dput(data_frame_name)` or `dput(head(data_frame_name))` output at the bottom of your question (click edit). – Hack-R Jul 06 '18 at 00:09
  • See if this question can be useful: https://stackoverflow.com/questions/29479198/sparsematrix-with-numerical-and-categorical-data – dmi3kno Jul 06 '18 at 00:13
  • @Hack-R thanks for the tip on reproducible code, I will try that. Regarding the solution from the post (thanks dmi3kno for sharing) I feel like this only answers the first question (i.e., I should create a dataframe without the first column beforehand) but not the second question, or am I mistaken? – Sarah Jul 06 '18 at 00:49
  • Thanks. So the 2nd question is "How would I specify j in the following code? I" is that correct? Because I think part of the problem that is that you're only supposed to be specifying 2 out of 3 parameters and I think you have all 3 specified. – Hack-R Jul 06 '18 at 01:09
  • @Hack-R I think what you are referring to is the p parameter - you are only supposed to specify 2 out of 3 (i,j,p) parameter, however x does not count. And yes, you are right about what my question is. My problem is to define j when it is not a column but the variables from 2:ncol. I am following a tutorial where they used two columns of a different dataframe for i and j, however, I thought I could also directly use the data frame I want to transform to sparse since all the information is there already. Any thoughts? :) – Sarah Jul 06 '18 at 11:04
  • when I create a reproducible example of `Likes` based on your question and I run the code it says `Error in sparseMatrix(Likes, i = likes$userid, j = 1, c(2:ncol(Likes)), : exactly one of 'i', 'j', or 'p' must be missing from call` – Hack-R Jul 06 '18 at 12:42
  • @Hack-R I think this is due to the wrong specification of j. I now created a new dataframe with two columns and created the sparse matrix with those two columns as i and j and it worked (even with i, j and x specified). Therefore I solved my problem, however I would have been interested if there was a way to directly use the column names of my matrix or simply 'sparsing' the old matrix. – Sarah Jul 06 '18 at 14:49
  • @Sarah wouldn't you say it was more like *we* solved your problem :D I was telling you again about that error specifically to help you diagnose the problem afterall ;) Had a feeling it was important. That also goes back to the reason that we require reproducible examples. – Hack-R Jul 06 '18 at 15:09
  • Regarding the latter question, I'll give it a try and see if I can do that. I'd do it now but I've got to set up a an example to work with again hehe. It may be a separate question anyway. – Hack-R Jul 06 '18 at 15:12
  • 1
    @Hack-R I didnt mean to underestimate your efforts ;) What I meant by I solved the problem is that I created a new df (not like in the original post) the same way they did it in a tutorial (df with columns userid and likeid) and then sparseMatrix (i=df$user_row, j=df$like_row, x=1). I created this questions because I was wondering if it is also possible to work around creating such a df (and instead 'sparse' my existing wide df). If you find out if there is a way: it would be great if you share this, however, no rush, since I can now continue working with the new df. THANKS!!:) – Sarah Jul 06 '18 at 15:30
  • Thanks, Sarah. By the way, I've updated my answer so show a way that preserves column names. To clarify the last issue you described in you most recent comment -- do you mean that instead of going from `data.frame` to sparse you want to go from `matrix` to sparse? Or what is the original data type that you'd prefer to convert to sparse? – Hack-R Jul 06 '18 at 15:42
  • Regarding your question, actually it was a data.frame originally, too. So I wanted to go from a wide data.frame to sparse matrix as opposed to the data.frame with 2 columns. But your recent post solved the problem with the column names I had, so thanks a lot :) – Sarah Jul 07 '18 at 15:03

1 Answers1

2

I tried to reproduce the problem by constructing an object like you described in the question (which I've now edited into the question) and by appending some additional fake rows to it.

library(Matrix)

Likes <- data.frame(userid=c("n1","n2"),
                      m1=c(0,1),
                      m2=c(0,0),
                      m3=c(0,0),
                      m4=c(1,0)
                      )

I found that running your code on this threw a different error:

sM_Likes <- sparseMatrix(Likes, i=likes$userid, j=1,c(2:ncol(Likes)), x=1)

Error in sparseMatrix(Likes, i = likes$userid, j = 1, c(2:ncol(Likes)), : exactly one of 'i', 'j', or 'p' must be missing from call

I mentioned this a couple of times in the comments as what I thought was causing the problem. You corrected the specification of your j argument and now it works :)

There's also a follow up question you asked in the comments about column names. I think this should solve that:

devtools::install_github("ben519/mltools")
require(mltools)
dt <- data.table(
  intCol=c(1L, NA_integer_, 3L, 0L),
  realCol=c(NA, 2, NA, NA),
  logCol=c(TRUE, FALSE, TRUE, FALSE),
  ofCol=factor(c("a", "b", NA, "b"), levels=c("a", "b", "c"), ordered=TRUE),
  ufCol=factor(c("a", NA, "c", "b"), ordered=FALSE)
)

sparsify(dt)
sparsify(dt, sparsifyNAs=TRUE)
sparsify(dt[, list(realCol)], naCols="identify")
sparsify(dt[, list(realCol)], naCols="efficient")
Hack-R
  • 22,422
  • 14
  • 75
  • 131