How can I get the standard deviation for a set of rows in my dataframe based on a condition in one column?

Question

I have a dataframe of

years, latitude, longitude
1971, 30.212, -87.423
1971, 30.211, -87.455
1971, 30.111, -94.444
1972, 24.114, -94.231
1972, 25.114, -92.121

I want to find the standard distribution for the latitude column by year, such that a new column is created, and would have a repeating standard distribution for each instance of 1971, and a different sd for 1972, etc.

I believe this may be somewhere in the dplyr universe? having difficulties with this one.

In a logical experession, I am asking: what is the standard deviation for df$latitude, WHEN df$years = "all Patterns (being years)"

`df %>% group_by(year) %>% mutate(lat_sd = sd(lat))` – alistaire Sep 21 '17 at 20:43 — alistaire, Sep 21 '17 at 20:43

score 0 · Answer 1 · answered Sep 21 '17 at 20:43

0

df %>% group_by(year) %>% mutate(lat_sd=sd(latitutde, na.rm=T))

answered Sep 21 '17 at 20:43

Djork

3,319
1
16
27

score 0 · Answer 2 · answered Sep 21 '17 at 20:44

0

Assuming your data frame is built like this and is stored in a variable called "df":

year, lat, long
1971, 20, 40

You would need this code using dplyr:

output <- df %>% group_by(year) %>% summarise(dev = sd(lat))

merge(df, output, by = "year")

answered Sep 21 '17 at 20:44

leeum

264
1
13

score 0 · Accepted Answer · answered Sep 21 '17 at 20:45

0

Another way, using base R...

df$lat_sd <- ave(df$lat, df$year, FUN=sd)

answered Sep 21 '17 at 20:45

Andrew Gustar

17,295
1
22
32

How can I get the standard deviation for a set of rows in my dataframe based on a condition in one column?

3 Answers3