0

I'm preparing a master's degree project and stuck with basic data manipulation. I'm importing several data from the Prestashop database to R, one of those is a data frame with carts IDs and products included in it (see below).

enter image description here

What I want to do is to create a matrix that will reflect the same data but in the easiest way as a matrix, here's a draft of the most desirable look:

enter image description here

Any hints on how the code should look? Thank you in advance for any help!

EDIT:

Code sample (dataframe):

x <- data.frame (order_id  = c("12", "13","13","13","14","14","15","16"),
                   product_id = c("123","123","378","367","832","900",NA,"378"))

SOLUTION:

xtabs is good, but when it comes to NA values it skips the line in the results. There's an option to force addNA=TRUE, but it adds the NA 'column' and counts the NA as 1 (see below)

y <- xtabs(formula = ~., data = x)

Output - example 1 (addNA=FALSE):

        product_id
order_id 123 367 378 832 900
      12   1   0   0   0   0
      13   1   1   1   0   0
      14   0   0   0   1   1
      16   0   0   1   0   0

Output - example 2 (addNA=TRUE):

    product_id
order_id 123 367 378 832 900 <NA>
      12   1   0   0   0   0    0
      13   1   1   1   0   0    0
      14   0   0   0   1   1    0
      15   0   0   0   0   0    1
      16   0   0   1   0   0    0

The igraph approach seems to be more accurate.

Bart
  • 128
  • 8
  • 2
    Please read about [how to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and update your question accordingly. Include a sample of your data by pasting the output of `dput()` into your post or `dput(head())` if you have a large data frame. Also include code you have tried and any relevant errors. If you cannot post your data, then please post code for creating representative data. – LMc May 13 '22 at 18:06
  • checkout `tidyverse` package(s) to pivot your table. after applying a simple logic to the pivoted table you should be able to use `as.matrix(df)` function. – memo May 13 '22 at 18:09
  • 2
    Look at `help("xtabs")` – G. Grothendieck May 13 '22 at 18:21
  • 1
    Images are not a good way for posting data (or code). See [this Meta](https://meta.stackoverflow.com/a/285557/8245406) and a [relevant xkcd](https://xkcd.com/2116/). Can you post sample data in `dput` format? Please edit **the question** with the code you've tried and with the output of `dput(df)`. Or, if it is too big with the output of `dput(head(df, 20))`. (Note: `df` is the name of your dataset.) – Rui Barradas May 13 '22 at 18:57

1 Answers1

1

You are looking for creating an adjacency matrix from a bipartite network from which you have the nodes list. You can directly use the package igraph to create the adjacency matrix from the node list and simplify it.

From x:

  order_id product_id
1       12        123
2       13        123
3       13        378
4       13        367
5       14        832
6       14        900
7       15       <NA>
8       16        378
graph_from_dataframe <- igraph::graph.data.frame(x)
adjacency_matrix <- igraph::get.adjacency(graph_from_dataframe, sparse = FALSE)
# removing redundant entries
adjacency_matrix <- adj[rownames(adj) %in% x$order_id, colnames(adj) %in% x$product_id]
   123 378 367 832 900
12   1   0   0   0   0
13   1   1   1   0   0
14   0   0   0   1   1
15   0   0   0   0   0
16   0   1   0   0   0

More resources on this SO question and this RPubs blog post.

hrvg
  • 476
  • 3
  • 6