R Optimal conditional editing

Question

I'm exploring the simplest way to edit the vector below. I'd like to replace values from A and B with test values < 2 (0 or 1) with NA, then eliminate test columns. I know we could just replace them without conditionals but this example is to illustrate the problem from a much larger data frame.

> df <- data.frame(list(A=c(100, 30, 200, 80, 5), B=c(12, 40, 100,70,50), testA=c(17, 1, 120,400,70), testB=c(5, 4, 1, 10, 0)))

It looks like this:

 A    B testA testB
100   12   17    5
 30   40    1    4
200  100  120    1
 80   70  400   10
  5   50   70    0

It should look like this:

Thank you in advance!

There are a number of ways to conditionally replace values: https://stackoverflow.com/a/41585689/5088194 — leerssej, Nov 08 '17 at 01:56

bringtheheat · Answer 1 · 2017-11-08T14:39:27.003

0

like @Jens Leerssen points out, tons of ways. simpliest way i can see would be to have a conditional for each column. if you want scale it, use *apply or for loop:

with(df, ifelse(testA < 2, NA, A))

not sure how much data youre dealing with but this works on my box (osx, 16gb, i5):

df <- data.frame(list(A=c(100, 30, 200, 80, 5), B=c(12, 40, 100,70,50), testA=c(17, 1, 120,400,70), testB=c(5, 4, 1, 10, 0)))

# create two vectors, one for each set of columns
vec_nam = names(df)
vec_split = tolower(grepl('^test.*', vec_nam)) # tolower() to avoid conflict with TRUE/FALSE
list_df = split(vec_nam, vec_split)

num_comparisons = length(list_df$false)
list_return = vector('list', length = num_comparisons)

for (i in 1:num_comparisons){
  col_test = list_df$true[i]
  col_valu = list_df$false[i]

  list_return[[i]] = ifelse(df[, col_test] < 2, NA, df[, col_valu])
}

final_df = setNames(do.call(cbind.data.frame, list_return), list_df$false)

edited Nov 08 '17 at 14:39

answered Nov 08 '17 at 03:23

bringtheheat

90
8

Thanks, but yes, I am also looking for the loop as, in reality, my data is much larger. Also, memory utilization is important – Joe Black Nov 08 '17 at 03:33
so you're looking to use multiple pairs of columns? so you need testZ, Z, N, where if testZ <= N, then NA, else Z? if you're worried about memory, then your function would replace the column instead of creating a new vector/df column. – bringtheheat Nov 08 '17 at 14:12
That is the problem I am dealing with. I think I figured how to tackle the problem but by creating a new vector, not very efficient. – Joe Black Nov 08 '17 at 18:20
well, imagine the minimum number of columns you're going to need to load is = 2 * final_number_of_columns. i imagine you're going to deal with (1) memory and (2) performance. (1) you can process and write out in chunks (2) not sure how you can be memory efficient by not creating new memory locations. i know data.table as a memory efficient merge method..... – bringtheheat Nov 08 '17 at 19:11

score 0 · Answer 2 · answered Nov 08 '17 at 04:11

0

> df2 <- data.frame(A= with(df, ifelse(df$testA < 2, NA, df$A)), B= with(df, ifelse(df$testB < 2, NA, df$B)))

It works for the given example but it it's also impossible to scalate. I dubt it is effiecient.

answered Nov 08 '17 at 04:11

Joe Black

1
3

R Optimal conditional editing

2 Answers2