0

I have a data frame called 'data' that contains multiple columns:

Grade EMPID PayBand
A 12345 15001-20000
c 64859 30001-35000
A 61245 20001-25000
D 75134 45001-50000
D 78451 40001-45000
C 31645 30001-35000
A 62513 20001-25000
D 91843 25001-30000
D 91648 35001-40000

I need R code to create a data frame that counts the number of each Grade within each PayBand that looks like this. E.g:

PayBand A C D
15001-20000 1 0 0
20001-25000 2 0 0
25001-30000 0 0 1
30001-35000 0 2 0
35001-40000 0 0 1
40001-45000 0 0 1
45001-50000 0 0 1

I am unsure how to create the new dataframe and the new columns that are based off the first dataframe. Any help is much appreciated.

atm1984
  • 19
  • 3
  • 2
    See this similar question from yesterday: [How to reshape long format data and count the values](https://stackoverflow.com/questions/74616896/how-to-reshape-long-format-data-and-count-the-values) – zephryl Nov 30 '22 at 12:15

2 Answers2

1

With table:

table(df[c(3, 1)])

Or with pivot_wider:

tidyr::pivot_wider(df, -EMPID, 
                   names_from = "Grade", values_from = "Grade", 
                   values_fn = length, values_fill = 0)

#   PayBand         A     C     D
# 1 15001-20000     1     0     0
# 2 30001-35000     0     2     0
# 3 20001-25000     2     0     0
# 4 45001-50000     0     0     1
# 5 40001-45000     0     0     1
# 6 25001-30000     0     0     1
# 7 35001-40000     0     0     1
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 1
    Thanks so much. I have decided to use your first option but then need to convert it to a dataframe. Much appreciated. – atm1984 Nov 30 '22 at 13:15
1

Good simple answer from Maël. Here are some related alternatives, including some options for conversion of the result to a data frame. atm1984, please make it easier next time with a reproducible example.

> df <- data.frame(Grade = c('A', 'C', 'A', 'D'),
+                  PayBand = c('15001-20000', '30001-35000', '20001-25000', '450
01-50000'))

> table(df$PayBand, df$Grade)

              A C D
  15001-20000 1 0 0
  20001-25000 1 0 0
  30001-35000 0 1 0
  45001-50000 0 0 1

> as.data.frame(table(df$PayBand, df$Grade))
          Var1 Var2 Freq
1  15001-20000    A    1
2  20001-25000    A    1
3  30001-35000    A    0
4  45001-50000    A    0
5  15001-20000    C    0
6  20001-25000    C    0
7  30001-35000    C    1
8  45001-50000    C    0
9  15001-20000    D    0
10 20001-25000    D    0
11 30001-35000    D    0
12 45001-50000    D    1

> as.data.frame.matrix(table(df$PayBand, df$Grade))
            A C D
15001-20000 1 0 0
20001-25000 1 0 0
30001-35000 0 1 0
45001-50000 0 0 1
sashahafner
  • 435
  • 1
  • 7
  • Hi, thank you for your options for converting to a dataframe, much appreciated. I'm fairly new to stackoverflow - what do you suggest I do next time to give a reproducible example? Thanks – atm1984 Nov 30 '22 at 13:17
  • You are welcome. Regarding a reproducible example, it was really just the data object that was missing from your question. If you already have a data frame like the one shown in your question you can use `dput()` to get something you can share with copy/paste. See here for some details and more information: – sashahafner Nov 30 '22 at 14:24
  • Regarding your original question, you might also check out the data.table package. – sashahafner Nov 30 '22 at 14:26
  • Great, I will try to use that dput() next time I ask a question. Thanks – atm1984 Nov 30 '22 at 14:29