0

I have group of 200 mouse IDs with a list of gene expression values for each mouse, but there are multiple instances of the same gene for each mouse. I would like to have the gene listed only once per mouse, and have the value equal the sum of all previous values.

For example this data:

   mouse_number      value   gene
1           64    2.00000 Lypla1
2           65    1.00000 Lypla1
3           64    7.00000 Lypla1
4           65    3.00000 Lypla1
7           64    4.00000 Pck1
8           65    2.00000 Pck1
9           64    1.00000 Pck1
10          65    5.00000 Pck1

Should be:

   mouse_number      value   gene
1           64    9.00000 Lypla1
2           65    4.00000 Lypla1
3           64    5.00000 Pck1
4           65    7.00000 Pck1

Please assist, thank you!

neilfws
  • 32,751
  • 5
  • 50
  • 63

1 Answers1

0

You can use aggregate:

 df <- data.frame(
     mouse_number = c(64, 65, 64, 65, 64, 65, 64, 65),
     value = c(2.0, 1.0, 7.0, 3.0, 4.0, 2.0, 1.0, 5.0),
     gene = c("Lypla1", "Lypla1", "Lypla1", "Lypla1", "Pck1", "Pck1", "Pck1", "Pck1"));
 df.collapsed <- aggregate(value ~ mouse_number + gene, FUN = sum, data = df);

 df.collapsed;
 # mouse_number   gene value
 #1           64 Lypla1     9
 #2           65 Lypla1     4
 #3           64   Pck1     5
 #4           65   Pck1     7
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68