section wise add up data

Question

hello as the title suggest i need to sum section wise data --->

can.id       status            qid    marks
001    section 1 question 1    112     3
001    section 1 question 2    117     3
001    section 1 question 3    116     3
001    section 2 question 1    115     3
001    section 2 question 2    114    -1
001    section 2 question 3    111     3
001    section 3 question 1    112    -1
001    section 3 question 2    116     3
002    section 1 question 1    114     3
002    section 1 question 2    111     3
002    section 2 question 2    111    -1
002    section 3 question 1    111    -1

i want to display sum of marks for each can.id for every section, help is appreciated....

Do you need a `R` solution as there is a `r` tag? – akrun Jun 23 '16 at 08:58 — akrun, Jun 23 '16 at 08:58
take out status from your group by clause – Rich Benner Jun 23 '16 at 09:05 — Rich Benner, Jun 23 '16 at 09:05

akrun · Accepted Answer · 2016-06-23T09:42:57.790

In R, we can use dplyr. We extract (from tidyr), substring from 'status' to create 'section', then grouped by 'can.id' and 'section', get the sum of 'marks'.

library(dplyr)
library(tidyr)
df1 %>% 
  extract(status, into = "section", "(.*\\d+)\\s+[[:alpha:]].*") %>%
  group_by(can.id, section) %>% 
  summarise(SumMarks = sum(marks))
#  can.id   section SumMarks
#   <int>     <chr>    <int>
#1      1 section 1        9
#2      1 section 2        5
#3      1 section 3        2
#4      2 section 1        6
#5      2 section 2       -1
#6      2 section 3       -1

Or using data.table

library(data.table)
setDT(df1)[,.(SumMarks = sum(marks)), .(can.id, 
                 section = sub("\\s+[[:alpha:]].*", "", status))]

data

 df1 <- structure(list(can.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
 2L, 2L, 2L), status = c("section 1 question 1", "section 1 question 2", 
 "section 1 question 3", "section 2 question 1", "section 2 question 2", 
 "section 2 question 3", "section 3 question 1", "section 3 question 2", 
 "section 1 question 1", "section 1 question 2", "section 2 question 2", 
 "section 3 question 1"), qid = c(112L, 117L, 116L, 115L, 114L, 
  111L, 112L, 116L, 114L, 111L, 111L, 111L), marks = c(3L, 3L, 
 3L, 3L, -1L, 3L, -1L, 3L, 3L, 3L, -1L, -1L)), .Names = c("can.id", 
 "status", "qid", "marks"), class = "data.frame",
  row.names = c(NA, -12L))

score 0 · Answer 2 · edited May 23 '17 at 11:52

0

I've answered this with some SQL.

The issue you appear to have is that your section needs to be split out from the status field, you could do something like this;

SELECT
    [can.id]
    ,SUBSTRING([status],1,8) Section
    ,SUM(marks) Total
FROM samp_data
GROUP BY
    [can.id]
    ,SUBSTRING([status],1,8)

If you just want the top 3 for each group, check the related link below

How to select top 3 values from each group in a table with SQL which have duplicates

edited May 23 '17 at 11:52

Community

1
1

answered Jun 23 '16 at 09:11

Rich Benner

7,873
9
33
39

Take a look at the link in my answer. If you have any further questions then it would probably be best to post another question. – Rich Benner Jun 28 '16 at 07:05

section wise add up data

2 Answers2

data