How to create a variable dependent on identifier?

Question

Total R newbie here. I am trying to create/calculate a risk measuring variable for my companies in my dataset.

My dataset looks as following:

   # A tibble: 6,971 x 14
   ISIN   Jahr Prüfungsurteil Prüfungshonorar Returns Name  Branchencode Bilanzsumme Wirtschaftsprue~ Eigenkapital
   <chr> <dbl> <chr>                    <dbl>   <dbl> <chr> <chr>              <dbl> <chr>                   <dbl>
 1 AU00~  2015 uneingeschrän~              NA   NA    Marl~ G47919          15687199 NA                   15012287
 2 AU00~  2016 uneingeschrän~              NA   NA    Marl~ G47919          29921136 Pricewaterhouse~     24797985
 3 DE00~  2005 uneingeschrän~              NA   NA    FinL~ M70101              8087 NA                       3788
 4 DE00~  2006 uneingeschrän~              NA   NA    FinL~ M70101          27565119 Oberfränkische ~     14858993
 5 DE00~  2007 uneingeschrän~              NA    4.48 FinL~ M70101          79490000 Verhülsdonk & P~     58038000
 6 DE00~  2008 uneingeschrän~           44000  -52.9  FinL~ M70101          61159000 Verhülsdonk & P~     49004000
 7 DE00~  2009 uneingeschrän~           60000  -66.1  FinL~ M70101          61092000 Verhülsdonk & P~     48635000
 8 DE00~  2010 uneingeschrän~           65000  -25.   FinL~ M70101          61689000 Verhülsdonk & P~     52334000
 9 DE00~  2011 uneingeschrän~           60000  -65.6  FinL~ M70101          40725000 ifb Treuhand Gm~     33143000
10 DE00~  2012 uneingeschrän~              NA  -82.1  FinL~ M70101          29232000 ifb Treuhand Gm~     24047000

I thought about defining my risk measurement as (Company-Return-Standard-Deviation)-(Total-Return-Standard-Deviation).

The Total-Return-Standard-Deviation would be calculated as:

sd(Returns, na.rm=TRUE)

What I can't figure out is how to calculate the standard deviation for each company separately. I tried something like

sd(Returns[ISIN], na.rm=True)

But the output was NA.

Please provide a proper reproducible example using dput() and also it would be great if you can provide us expected output — Hunaidkhan, Dec 27 '18 at 11:13

akrun · Answer 1 · 2018-12-27T11:35:26.680

0

The sd can be applied on vector/column. Here, the OP wanted to get the standard deviation of the column 'Returns' grouped by 'ISIN'

library(dplyr)
df1 %>%
  group_by(ISIN) %>%
  summarise(returnsD = sd(Returns, na.rm = TRUE))

edited Dec 27 '18 at 11:35

answered Dec 27 '18 at 11:13

akrun

874,273
37
540
662

Thank´s for replying. I am not 100% sure if I understand you completely. I only wanted to get the standard deviation of the column "Returns" grouped by "ISIN". Your code would calculate the standard deviations of all numeric columns, wouldn´t it? – Arny Dec 27 '18 at 11:34
@Amy In that case, you just need `summarise`. Updated the code – akrun Dec 27 '18 at 11:35

How to create a variable dependent on identifier?

1 Answers1