How do I extract means/max/min/median from a subset of rows in a column in a dataframe?

Question

EDITED FOR CLARITY

Sorry for the previous version. In my mind I was being thorough while in reality I was making it difficult to help me.

I have several files (about 80) where I have observations of temperatures at different coordinates. Each file corresponds to observations of a species. It is set up as below:

dput(head(BIRD_NAME, 10))
structure(list(x = c(-4.752939, 1.129787, 51.918896, 21.512894, 
-9.319702, -0.046992, 6.38507, -7.441907, 33.9505, -3.165023), 
    y = c(42.067673, 52.03018, 47.105537, 61.84814, 38.7668, 
    51.59226, 53.170395, 37.208645, 36.302677, 40.64759), Feb = c(5.80000019073486, 
    3.90000009536743, -7.30000019073486, -6.09999990463257, 12.1000003814697, 
    4.80000019073486, 2.20000004768372, 12.5, 11.3000001907349, 
    6.40000009536743), Mar = c(8.19999980926514, 6, 0.100000001490116, 
    -2.5, 13.6000003814697, 6.90000009536743, 4.90000009536743, 
    14.1000003814697, 13.8999996185303, 9), Apr = c(9.80000019073486, 
    7.90000009536743, 11.8000001907349, 2.40000009536743, 14.5, 
    8.80000019073486, 7.5, 15.6000003814697, 17.6000003814697, 
    11), May = c(13.3999996185303, 11.5, 18.8999996185303, 8.30000019073486, 
    16.6000003814697, 12.5, 11.8000001907349, 18.2999992370605, 
    21.2999992370605, 14.8000001907349), Jun = c(17.7999992370605, 
    14.5, 24.3999996185303, 13.1999998092651, 19.2000007629395, 
    15.6000003814697, 14.5, 22, 24.7999992370605, 19.7999992370605
    ), Sep = c(17.7999992370605, 14.3999996185303, 17.7999992370605, 
    10.1999998092651, 20.3999996185303, 14.8000001907349, 13.6000003814697, 
    22.8999996185303, 25.7999992370605, 19.6000003814697), MONTH = structure(c(3L, 
    4L, 4L, 5L, 4L, 4L, 5L, 3L, 4L, 5L), levels = c("2", "3", 
    "4", "5", "6", "9"), class = "factor"), DATE = structure(c(3L, 
    4L, 4L, 5L, 4L, 4L, 5L, 3L, 4L, 5L), levels = c("2", "3", 
    "4", "5", "6", "9"), class = "factor")), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

The column 'MONTH' related to the month the observation was made, while the columns with month names are the temperature at the site of observation in different months.

What I want to do is to get the median/minimum/maximum/mean for each month column, but ONLY for the rows which correspond to observations in that same month. Eg, for the month column Feb, I only want the values from rows with the value of 2, for Mar 3, for Apr 4... (that is why I went through all the confusing steps).

My end product would be a new df with columns where the first column is liek the original filemane.

TEMP_MAXMINMEDMEAN = data.frame(filename = character(),
                     feb_max = numeric(), feb_min = numeric(),feb_med =numeric(),feb_mean = numeric(),
                   mar_max = numeric(),mar_min = numeric(),mar_med =numeric(), mar_mean = numeric(),
                   apr_max = numeric(), apr_min = numeric(),apr_med =numeric(),apr_mean = numeric(), 
                    sept_max = numeric(),sept_min = numeric(),sept_med =numeric(),sept_mean = numeric())

I hope this is more clear than the goosechase I wrote before.

Hi @ArchaeoAmos! You may get better, faster help if you edit your question for clarity and down to exactly what you need. Right now it is a bit confusing - are you simply trying to get the summary statistics by month? Try including your original "start" data, and your desired "end" data. In your current example data, you dont have `TEMP`. Try editing to add in the output of `dput(head(TEMP_WIDE, 10))`. Also, a lot of information you put here isn't particularly useful and makes the question convoluted. Try editing out anything that is unrelated to the question. Just trying to help, good luck! — jpsmith, Mar 08 '23 at 12:06
I would recommend to re-organize the data with `pivot_longer` from package **tidyr**, so that the relevant column months are rows in two columns, e.g. "month" and "values". Then you can easily use `group_by` and ` summarize`from **dplyr**. — tpetzoldt, Mar 08 '23 at 12:21
What is speciesname? Please provide a [minimal, reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Captain Hat, Mar 08 '23 at 12:27
@jpsmith thanks for that. I thought I was helping, but I was hurting. Hope this is clearer. — ArchaeoAmos, Mar 08 '23 at 13:46
No worries, we'r all just trying to help each other. I dont think you need to create MONTH and DATE the way you have. Try `BIRD_NAME[, -c(9:10)] %>% pivot_longer(Feb:Sep, names_to = "month") %>% group_by(month) %>% summarize(mean_val = mean(value), max_val = max(value), min_val = min(value), median_val = median(value))` and see if that gives what you want — jpsmith, Mar 08 '23 at 16:56

How do I extract means/max/min/median from a subset of rows in a column in a dataframe?

0 Answers0