0

I want to create a cross table with R

I have a csv file with the following information:

Client_ID    Buying_Category
   123          A
   123          B 
   567          A
   567          C
   ...

I load that csv into R

data <- read.csv ("file", header = TRUE)

Now I want to create a cross table

so that it looks like this

       A    B   C
A      2    1   1
B      1    1   0
C      1    0   1

I tried this

crosstable <- data %>% group_by(Client_ID, Buying_Category) %>% summarize(records = sign(n()))

But get this error:

Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "function"

So i tried this

with(data, tapply(PRODUCT, list(Client_ID, Buying_Category), FUN = function(x) length(unique(x))))

and get this error:

Error in eval(substitute(expr), data, enclos = parent.frame()) : invalid 'envir' argument of type 'closure'

Any help appreciated!

jeangelj
  • 4,338
  • 16
  • 54
  • 98

1 Answers1

0

This process uses dplyr (data manipulation) and tidyr (data reshaping).

dt = read.table(text="
Client_ID    Buying_Category
123          A
123          B 
567          A
567          C", header=T)

library(dplyr)
library(tidyr)

dt %>% 
  group_by(Client_ID) %>%                  # for each client
  do(expand.grid(.$Buying_Category,        # create pariwise combinations of categories  
                 .$Buying_Category)) %>%
  count(Var1, Var2) %>%                    # count how many times you have each combination
  spread(Var2,n, fill=0)                   # reshape to get your table


# Source: local data frame [3 x 4]
# 
#      Var1     A     B     C
#    (fctr) (dbl) (dbl) (dbl)
# 1      A     2     1     1
# 2      B     1     1     0
# 3      C     1     0     1

Based on your latest comment I'll provide some more info about how to proceed with the plot you want to create. I'll create the simplest version as an example:

dt %>% 
  group_by(Client_ID) %>%                  # for each client
  do(expand.grid(.$Buying_Category,        # create pariwise combinations of categories  
                 .$Buying_Category)) %>%
  count(Var1, Var2) -> dt_circ             

No need to reshape here because you can use this format of dt_circ

#     Var1   Var2     n
#   (fctr) (fctr) (int)
# 1      A      A     2
# 2      A      B     1
# 3      A      C     1
# 4      B      A     1
# 5      B      B     1
# 6      C      A     1
# 7      C      C     1

Where the following function will understand Var1 is "from", Var2 is "to", n is the "value".

library(circlize)

chordDiagram(dt_circ)

You'll get something like:

enter image description here

AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • This worked! Thank you; How can I save the reshaped table into excel? When I use 'write.xlsx(dt, "F:/dt.xlsx")' it saves the original format into excel. – jeangelj Sep 21 '15 at 14:55
  • Yes, because `dt` is still your original dataset. The process didn't change `dt`. You should save the whole process as another dataset and save that one into excel. For example `... %>% spread(Var2,n, fill=0) -> dt2` . – AntoniosK Sep 21 '15 at 15:01
  • Perfect! Worked like a charm; thank you so much for your quick response! Last question, can I use this dataframe in this format for the circlize package? – jeangelj Sep 21 '15 at 15:31
  • How are you planning to use that package with this dataset? What will be your objective? If you give me an a example I'll try to check it. – AntoniosK Sep 21 '15 at 17:13
  • Thank you in advance; The objective is to create a circos diagram to show how clients buy between the different buying categories. This is the type of diagram I am aiming for [http://stackoverflow.com/questions/27500041/r-make-circle-chord-diagram-with-circlize-from-dataframe] – jeangelj Sep 21 '15 at 20:47
  • I provided some more info by updating my answer. I don't have a deep knowledge of that package, so I'm providing a simple version of what you need to do. You will be able to change colors, remove numbers, etc. Also, check what alternative inputs you can pass to this function. As an alternative to my example you can pass a cross-tab, but in a matrix format where A,B,C must be row names and column names and not a column with values A,B,C. Check which is faster and better for you. – AntoniosK Sep 21 '15 at 22:54