0

How to loop through columns in a Data Frame and cap the values at 97.5th percentile of that column?

Eg. if one particular column has values 1 to 100 filled in it, the value >97.5, i.e 98, 99 and 100 should all be given 97.5.

Please see, I want to do this for columns 4 to last in the data frame.

Prince
  • 69
  • 5
  • 1
    Welcome to SO! Can you give example of data ? Also, read more about positing questions here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – YOLO Mar 24 '18 at 16:42

3 Answers3

1

Here's a minimal example of what you are trying to do. Here I'm modifying the last 2 columns:

set.seed(2)

library(data.table)
df <- data.table(a = runif(10, 90, 100), 
                 b = runif(10, 95, 105),
                 c = runif(10, 90, 100))

df[,c('b','c') := lapply(.SD, function(x) pmin(x, quantile(x, 0.975))), .SDcols = c('b','c')]

print(df)

           a         b        c
 1: 91.82174 103.40371 99.49889
 2: 93.60763 104.45268 91.01073
 3: 99.03800  95.44965 92.56751
 4: 93.94048 102.58383 98.95147
 5: 97.79881  97.96888 93.87944
 6: 92.84159 101.51054 97.94285
 7: 98.53721  95.84990 93.49397
 8: 91.72242 104.68683 91.38744
 9: 90.79264  95.13625 96.50509
10: 92.92065 100.38869 95.44004
YOLO
  • 20,181
  • 5
  • 20
  • 40
1

Using the data that @ManishSaraswat set up, I believe you want something like this,

df <- data.frame(a = runif(10, 90, 100), 
                 b = runif(10, 95, 105),
                 c = runif(10, 90, 100))
apply(df, 2, function(x){
  quant <- quantile(x, 0.975)
  ifelse(x > quant, quant, x)
})

I hope this helps!

smanski
  • 541
  • 2
  • 7
1

You can do this in one line in base R

#set up the data
df <- data.frame(a = sample(100,replace=TRUE), 
                 b = sample(100,replace=TRUE),
                 c = sample(100,replace=TRUE))

df2 <- as.data.frame(lapply(df, function(x) pmin(x, quantile(x, 0.975))))

To just modify columns 4 to 10 (for example) of your dataframe, you could do

data[,4:10] <- as.data.frame(lapply(data[,4:10], function(x) pmin(x, quantile(x, 0.975))))
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32