0

I have a dataset with two values in a cell/column per row:

Gene  Count
ACE     1,2
BRCA    3,7
HER2    100,300

I am trying to sum the numbers in each row of the count column to output:

Gene  Count
ACE     3
BRCA    10
HER2    400

I've been trying to build something to get the numeric values but I don't know how to then specify the addition of values per row:

getnum = function(col) str_extract_all(col,"[0-9\\.-]+") %>%
  lapply(.,function(x)as.numeric(x) ) %>%
  unlist() 

Any help or suggestions on what to use would be appreciated.

Input data:

structure(list(Gene = c("ACE", "BRCA", "HER2"), Count = c(3L, 
10L, 400L)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"))
DN1
  • 234
  • 1
  • 13
  • 38

6 Answers6

2

Straight in base R:

df$Count <- sapply(strsplit(df$Count, ","), function(x) sum(as.integer(x)))
df

#   Gene Count
# 1  ACE     3
# 2 BRCA    10
# 3 HER2   400

Data:

df <- data.frame(
  Gene = c("ACE", "BRCA", "HER2"), 
  Count = c("1,2", "3,7", "100,300")
)
s_baldur
  • 29,441
  • 4
  • 36
  • 69
1

Here's a dplyr solution, may not be the best, for your reference.

library("dplyr")

data %>% 
  group_by(row_number()) %>% 
  mutate(Count2 = sum(as.numeric(unlist(strsplit(Count, ","))))) %>% 
  ungroup() %>% 
  select(-"row_number()")
# A tibble: 3 x 3
  Gene  Count   Count2
  <chr> <chr>    <dbl>
1 ACE   1,2          3
2 BRCA  3,7         10
3 HER2  100,300    400
BellmanEqn
  • 791
  • 3
  • 11
1

Extract the numbers from Count, convert to numeric and sum.

This can be done using stringr

sapply(stringr::str_extract_all(df$Count, '\\d+'), function(x) sum(as.integer(x)))
#[1]   3  10 400

Or base R :

sapply(regmatches(df$Count, gregexpr('\\d+', df$Count)), function(x) sum(as.integer(x)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

You also could change the comma to a plus and evaluate the string:

sapply(df$Count, function(x) eval(parse(text = gsub(",", "+", x))))
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
0
library(data.table)

dt <- fread('Gene  Count
ACE     1,2
BRCA    3,7
HER2    100,300')

sum_num <- function(col){
unlist(lapply(strsplit(col,","),function(x) sum(as.numeric(x))))
}


dt[,result:=sum_num(Count)][]
#>    Gene   Count result
#> 1:  ACE     1,2      3
#> 2: BRCA     3,7     10
#> 3: HER2 100,300    400

Created on 2020-05-20 by the reprex package (v0.3.0)

Frank Zhang
  • 1,670
  • 7
  • 14
0

You mentioned converting string to numeric and summing. You've gotten answers for that. I wanted to point out another possibility.

In your earlier question, you wanted to generate output that also contained items separated by commas. The answers included generating that column from data.table directly such that the column actually contains a list of values.

If in this case, your dataset had been constructed similarly from data.table, you wouldn't have a character string with a comma, rather a column of lists. (The comma shows up in the printout for convenience.)

The summation would be straightforward.

library(data.table)
# create such a dataset 
y <- data.table(Gene=c("ACE","BRCA","HER2"),
                Count1=c(1,3,100),
                Count2=c(2,7,300))
D <- y[,.(Gene=rep(Gene,2),x=c(Count1,Count2))] 
D <- D[,.(Count=list(x)),by=Gene] 
D

# convert Count to the sum      
D[,Count:=sapply(Count,sum)]
D

Maybe this isn't applicable to your actual case, but I thought I'd mention it.

DaveTurek
  • 1,297
  • 7
  • 8