2

If my writing seems troublesome, please see the attached pic. enter image description here I would like to have generalized code, so that I can apply that to a very large file. Any numeric calculation for formula for 2/3 rows/columns won't be helpful for me. Suppose I have a following CSV table:

Class1  Count1
1        1
2        1
1        0
3        1
1        1
1        1
3        0
3        1
2        1

Now I would like to have summation based on the values of class in a new column. Suppose we get two new columns named Class2 and Count2, as per my desire the top most cell (in the first row) of Count2 should have the sum of all the values in column "Count1" corresponding to "1" in column "Class1", the next-to top most cell (in the 2nd row) of Count2 should have the sum of all the values in column "Count1" corresponding to "2" in Class1, and so on. After running the code, I would like to have a table like following:

Class1  Count1  Class1  Count2
1        1        1       3
2        1        2       2
1        0        3       2
3        1
1        1
1        1
3        0
3        1
2        1

I work on social sciences, so this type of work is very tough for me. It will be of great help if anyone give me R code for executing this. Thanks in Advance.

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • You could try `cbind(df1, rbind(as.matrix(aggregate(Count1~Class1, df1, sum)), matrix(NA, nrow(df1) - length(unique(df1$Class1)), ncol(df1))))` where `df1` is your csv table – Onyambu Jun 16 '21 at 18:16

3 Answers3

3

you could use summarise from dplyr package: First group by Class1 then sum.

library(dplyr)
df %>% 
  group_by(Class1) %>% 
  summarize(Count2 = sum(Count1))

output:

  Class1 Count2
*  <dbl>  <dbl>
1      1      3
2      2      2
3      3      2

data:

df <- structure(list(Class1 = c(1, 2, 1, 3, 1, 1, 3, 3, 2), Count1 = c(1, 
1, 0, 1, 1, 1, 0, 1, 1)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))
TarJae
  • 72,363
  • 6
  • 19
  • 66
2

Here is a base R solution.

y <- with(df1, table(Count1 * Class1))
y <- as.data.frame(y[names(y) != 0])

y
#  Var1 Freq
#1    1    3
#2    2    2
#3    3    2

The names can be changed, if needed.

names(y) <- c("Class2", "Count2")

To combine the input data and this result, use merge, see @G.Grothendieck, point 1.

merge(df1, y, by = 0, all = TRUE)[-1]
#  Class1 Count1 Class2 Count2
#1      1      1      1      3
#2      2      1      2      2
#3      1      0      3      2
#4      3      1   <NA>     NA
#5      1      1   <NA>     NA
#6      1      1   <NA>     NA
#7      3      0   <NA>     NA
#8      3      1   <NA>     NA
#9      2      1   <NA>     NA

Data

df1 <- read.table(text = "
Class1  Count1
1        1
2        1
1        0
3        1
1        1
1        1
3        0
3        1
2        1
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

An option with rowsum and transform in base R

transform(df1, Class2 = `length<-`(unique(Class1), nrow(df1)),
     Count2 = `length<-`(rowsum(Count1, Class1)[,1], nrow(df1)))

-output

   Class1 Count1 Class2 Count2
1      1      1      1      3
2      2      1      2      2
3      1      0      3      2
4      3      1     NA     NA
5      1      1     NA     NA
6      1      1     NA     NA
7      3      0     NA     NA
8      3      1     NA     NA
9      2      1     NA     NA

-output

df1 <- structure(list(Class1 = c(1L, 2L, 1L, 3L, 1L, 1L, 3L, 3L, 2L), 
    Count1 = c(1L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 1L)), 
    class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662