0

I have a data.frame "DF" of 2020 observations and 79066 variables. The first column is the "Year" spanning continuously from 1 to 2020, the others variables are the values.

In the first instance, I did an average by row in order to have one mean value per year.

E.g.

Aver <- apply(DF[,2:79066], 1, mean, na.rm=TRUE)

However, I would like to do a weighted average and the weight values differ based on columns string values.

The header name of the variables is "Year" (first column) followed by 79065 columns, where the name of each column is composed of a string that starts from 50 to 300, followed by ".R" repeated from 1 to 15 times, and the ".yr" from 10 to 30. This brings 251(50-300) x 15(R) x 21(10-30) = 79065 columns E.g. : "Year", "50.R1.10.yr", "50.R1.11.yr", "50.R1.12.yr", ... "50.R1.30.yr", "51.R1.10.yr", "51.R1.11.yr", "51.R1.12.yr", ... "51.R1.30.yr", ..."300.R1.10.yr", "300.R1.11.yr", "300.R1.12.yr", ... "300.R1.30.yr", "50.R2.10.yr", "50.R2.11.yr", "50.R2.12.yr", ... "50.R2.30.yr", "51.R2.10.yr", "51.R2.11.yr", "51.R2.12.yr", ... "51.R2.30.yr", ..."300.R2.10.yr", "300.R2.11.yr", "300.R2.12.yr", ... "300.R2.30.yr", ... "50.R15.10.yr", "50.R15.11.yr", "50.R15.12.yr", ... "300.R15.30.yr".

The weight I would like to assign to each column is based on the string values 50 to 300. I would like to give more weight to values on the column "50." and following a power function, less weight to "300.".

The equation fitting my values is a power function: y = 2305.2*x^-1.019.

E.g.

av.classes <- data.frame(av=seq(50, 300, 1))
library(dplyr)
av.classes.weight <- av.classes %>% mutate(weight = 2305.2*av^-1.019)

Thank you for any help.

  • You said that the first column is year and the remaining columns are the values (I'm inferring numbers), where then are these "string" columns that are used for determining the weights? In the end, discussion about these columns is generally not good, give us real sample data using `dput(x)`. While I recognize you have lots of data, this can easily be demonstrated by a dozen columns and perhaps a dozen or so rows. Just enough to *start* using the logic. See https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info – r2evans Jan 11 '22 at 15:47

2 Answers2

1

I guess you could get your weight vector like this:

library(tidyverse)

weights_precursor <- str_split(names(data)[-1], pattern = "\\.", n = 2, simplify = TRUE)[, 1] %>% 
  as.numeric()

weights <- 2305.2 * weights_precursor ^ -1.019
saae
  • 96
  • 4
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 11 '22 at 18:37
0

Setting up some sample data:

DF <- data.frame(year=2020,`50.R1.10.yr`=1,`300.R15.30.yr`=10)
names(DF) <- stringr::str_remove(names(DF),"X")

Getting numerical vector:

weights <- stringr::str_split(names(DF),"\\.")
weights <- sapply(1:length(weights),function(x) weights[[x]][1])[-1]
as.numeric(weights)