0

I'm fairly new to R and using the dplyr package currently. I have a dataframe that looks something like this simplified table:

year category
2009 A
2009 B
2009 B
2010 A
2010 B
2011 A
2011 C
2011 C

I want to count for each year hence I used:

df %>% count(year, category)

and got

year category count
2009 A 1
2009 B 2
2010 A 1
2010 B 1
2011 A 1
2011 C 2

However I would like to use the year as column names, to get the following:

2009 2010 2011
A 1 1 1
B 2 1 0
C 0 0 2

What is an easy way to get this? I would like to get this in absolute numbers, and if possible as a normalized table (percentages of the total of each year).

I hope you guys can help me out!

2 Answers2

2
df %>% count(year, category) %>%
  pivot_wider(
    category,
    names_from = year,
    names_prefix = "year_",
    values_from = n, 
    values_fill = 0
  )


# A tibble: 3 x 4
  category year_2009 year_2010 year_2011
  <chr>        <int>     <int>     <int>
1 A                1         1         1
2 B                2         1         0
3 C                0         0         2
Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
1

Using reshape:

df2 = df %>% count(year, category)

df2 = reshape(df2, idvar='category', timevar='year', direction='wide')

rownames(df2) = df2$category

df2[is.na(df2)] = 0

df2 = df2[,c(2:4)]
noan
  • 121
  • 4