0

I have a data frame

Name     Address    War   
Stacy_A  Stacy_A_1  A  
Stacy_B  Stacy_B_1  B  
Stacy_C  Stacy_C_1  C  
Stacy_A  Stacy_A_2  A  
Stacy_B  Stacy_B_2  B  
Stacy_C  Stacy_C_2  C  
Stacy_D  Stacy_D_2  O

so I have four unique names and under each name I have.

  • Stacy_A has 2 addresses and both in War A
  • Stacy_B has 2 addresses and both in War B
  • Stacy_C has 2 addresses and both in War c
  • Stacy_D has only 1 address and in War O

What I want is the output in the format having just count.

I need to have a format where I can get

names    A  B  C   O    
Stacy_A  2  0  0   0    
Stacy_B  0  2  0   0  
Stacy_C  0  0  2   0  
Stacy_D  0  0  0   1 
sgibb
  • 25,396
  • 3
  • 68
  • 74
iamashish
  • 57
  • 7

1 Answers1

2

As the OP mentioned about having "Name" as a separate column, we can use dcast from reshape2. We convert the 'long' to 'wide' format and specify the fun.aggregate i.e. in this case it is length.

library(reshape2)
dcast(df1, Name~War, value.var="Address", length)
#     Name A B C O
#1 Stacy_A 2 0 0 0
#2 Stacy_B 0 2 0 0
#3 Stacy_C 0 0 2 0
#4 Stacy_D 0 0 0 1

Or using dplyr/tidyr, we group by "Name", "War" column, get the frequency of the unique combinations with tally and then spread (tidyr) from 'long' to 'wide' format.

library(dplyr)
library(tidyr)
df1 %>%
   group_by(Name, War) %>%
   tally() %>%
   spread(War, n, fill=0)
#    Name     A     B     C     O
#    (chr) (dbl) (dbl) (dbl) (dbl)
#1 Stacy_A     2     0     0     0
#2 Stacy_B     0     2     0     0
#3 Stacy_C     0     0     2     0
#4 Stacy_D     0     0     0     1

Or as @docendo discimus mentioned, count from dplyr can be used in combination with spread

 count(df, Name, War) %>%
           spread(War, n, fill = 0) 
akrun
  • 874,273
  • 37
  • 540
  • 662