1

I am a new user in R, and I have a dataset with 5000 variables. I am looking to add 1 for every 100 numerical variables, but I am not sure how I can determine these variables out of the huge dataset or what the formula could be. Just to make it clearer, the variable names that I want to add 1 for each one of them range from B-ARP, so I want to make something like this: B-ARP + 1. Perhaps this is a simple question, but I just started using R yesterday because I was using Stata.

I found this code bellow from another question but I am not sure if it is apply to my situation. library(dplyr) df %>% mutate(across(

N_H
  • 25
  • 4
  • 1
    Do you want `df %>%mutate(across(where(is.numeric), ~ .x + 1))` or is it `df %>% mutate(across(seq(100, ncol(.), by = 100), ~ .x + 1))` – akrun Feb 14 '23 at 04:55
  • Thank you for you response. I have tried the second code but I got an error message (Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "function"). column names not in pattern they are like, B,C,D.....ARP. – N_H Feb 14 '23 at 05:10
  • Sorry for the confusion; I only started using R yesterday. – N_H Feb 14 '23 at 05:15
  • Suppose you have a dataset of 20 columns and want to add 1 for every 5 columns, `seq(5, 20, by = 5)` (column index), use the index to subset and add `df[seq(5, 20, by = 5)] <- df[seq(5, 20, by = 5)] + 1` – akrun Feb 14 '23 at 05:19
  • Just to make it clear, suppose you read your data, `df <- read.csv("yourfile.csv")`, then `df` is the data.frame object – akrun Feb 14 '23 at 05:23
  • Thanks for your help. but the columns that I have are in sequence; however, I can't determine their number because I have 5000 columns and they are 100 columns out of 5000, and I don't want to add them one by one. – N_H Feb 14 '23 at 05:26

1 Answers1

0

Consider this data frame and say, we want to add 999 to columns X4 to X7.

dat
#   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
# 1  0  0  0  0  0  0  0  0  0   0
# 2  0  0  0  0  0  0  0  0  0   0
# 3  0  0  0  0  0  0  0  0  0   0
# 4  0  0  0  0  0  0  0  0  0   0
# 5  0  0  0  0  0  0  0  0  0   0

First, we could identify which columns have those names,

rg <- which(names(dat) %in% c('X4', 'X7'))
rg
# [1] 4 7

and make a numerical sequence out of it.

cols <- seq.int(rg[1], rg[2])
cols
# [1] 4 5 6 7

(Alternatively using do.call:)

cols <- do.call(seq.int, as.list(rg))
## or altogether:
cols <- do.call(seq.int, as.list(which(names(dat) %in% c('X4', 'X7'))))

Then just add 999 to subset dat[cols].

dat[cols] <- dat[cols] + 999
dat
#   X1 X2 X3  X4  X5  X6  X7 X8 X9 X10
# 1  0  0  0 999 999 999 999  0  0   0
# 2  0  0  0 999 999 999 999  0  0   0
# 3  0  0  0 999 999 999 999  0  0   0
# 4  0  0  0 999 999 999 999  0  0   0
# 5  0  0  0 999 999 999 999  0  0   0

Data:

dat <- data.frame(matrix(0, 5, 10))
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thank you very much; it helped me. But how can I see changes in the data? I mean because I have huge data. – N_H Feb 14 '23 at 06:10
  • @Noor You can look at a subset of the data `dat[1:3, 3:5]` to check if it worked. – jay.sf Feb 14 '23 at 06:14
  • I got "NA" under all columns. it didn't increase by 1 – N_H Feb 14 '23 at 06:20
  • 1
    @Noor You probably have factor or character, use `dat[cols] <- lapply(x, as.numeric(as.character(x)))` before adding. Be sure to make a [reproducible example](https://stackoverflow.com/a/5963610/6574038) next time. – jay.sf Feb 14 '23 at 06:23