Is there an R function for finding the value below specific value

Question

Is there an R function to find value below specific value. Example : Below is the input table. I need value in another column (COL3) below value " Age " that is seen in column 2 (COL2)

COl1    COl2    
James   Age 
James   23  
Andrew  Age 
Andrew  24

I need another column with

COl1    COl2    COl3
James   Age     23
James   23      23
Andrew  Age     24
Andrew  24      24

Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — jogo, May 11 '19 at 04:48

NelsonGon · Answer 1 · 2019-05-11T04:32:04.993

2

With dplyr:

 df %>% 
  mutate_if(is.factor,as.character) %>% 
   mutate(COL3=ifelse(COl2=="Age",lead(COl2),COl2))
    COl1 COl2 COL3
1  James  Age   23
2  James   23   23
3 Andrew  Age   24
4 Andrew   24   24

With base, we could do the following and remove unwanted columns:

 df$COL3<-expand.grid(df[which(df$COl2=="Age")+1,])
df
    COl1 COl2 COL3.COl1 COL3.COl2
1  James  Age     James        23
2  James   23    Andrew        23
3 Andrew  Age     James        24
4 Andrew   24    Andrew        24

edited May 11 '19 at 04:32

answered May 11 '19 at 04:23

NelsonGon

13,015
7
27
57

Thanks :) I got it :) In case if I need 2 cells below Age, I tried with COl2+1 in the last line. But getting error – Vinay May 11 '19 at 04:35
It depends on what your data looks like. Th error might be due to length or choosing a factor or character column. What is the exact error you get? – NelsonGon May 11 '19 at 04:36
1

Error in col2 + 1 : non-numeric argument to binary operator – Vinay May 11 '19 at 04:41
Yes, it won't work for a `factor`. Use the first solution and set `lead(COL2,2)`. – NelsonGon May 11 '19 at 04:45
Perfect Thanks. But a general question, When I export from R to csv and then convert to excel. Excel becomes very slow. I have the same excel prior importing to R it is fine and working fast, Why is this? – Vinay May 11 '19 at 05:26
Sorry I don't quite follow. Perhaps it's system dependent. I'm not sure how big your data is and I honestly don't always use Excel so I'm not an expert on that. Perhaps ask a new question and tag it with both R and Excel? Also, you can accept an answer that helps you by clicking the tick. – NelsonGon May 11 '19 at 06:10

mpalanco · Answer 2 · 2019-05-11T08:00:20.133

Subsetting the data frame by COL2 and joining back to the original data frame.

Base

merge(df, subset(df, COl2 != "Age"), by = c("COl1"))

dplyr

library(dplyr)
df %>% 
  left_join(df %>% filter(COl2 != "Age") , by = "COl1")

sqldf

library(sqldf)
sqldf('SELECT *
      FROM df
      LEFT JOIN(SELECT *
      FROM df WHERE COl2 != "Age" )USING (COl1)')

Output

    COl1 COl2.x COl2.y
1 Andrew    Age     24
2 Andrew     24     24
3  James    Age     23
4  James     23     23

Data

df <- structure(list(COl1 = structure(c(2L, 2L, 1L, 1L), .Label = c("Andrew", 
"James"), class = "factor"), COl2 = structure(c(3L, 1L, 3L, 2L
), .Label = c("23", "24", "Age"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

Ronak Shah · Answer 3 · 2019-05-11T04:35:30.327

One way you can try with dplyr is to create groups using cumsum and then select the next COl2 value after "Age" in each group.

library(dplyr)

df %>%
  group_by(group = cumsum(COl2 == "Age")) %>%
  mutate(Col3 = COl2[which.max(COl2 == "Age") + 1]) %>%
  ungroup() %>%
  select(-group)

 #  COl1   COl2  Col3 
 #  <chr>  <chr> <chr>
 #1 James  Age   23   
 #2 James  23    23   
 #3 Andrew Age   24   
 #4 Andrew 24    24

Or since we are incrementing at "Age" we can select the second value from the group

library(dplyr)
df %>%
  group_by(group = cumsum(COl2 == "Age")) %>%
  mutate(Col3 = COl2[2L])

Or using base R ave

with(df ,ave(COl2, cumsum(COl2 == "Age"), FUN = function(x) x[2L]))
#[1] "23" "23" "24" "24"

@Vinay Sorry, your comment seems incomplete ? – Ronak Shah May 11 '19 at 04:43 — Ronak Shah, May 11 '19 at 04:43

OmG · Answer 4 · 2019-05-12T04:59:35.530

1

A solution is using from sqldf‍‍ by joining the data frame df to itself with the specified constraint:

library(sqldf)
result <- sqldf("SELECT df_origin.*, df_age.Col2 as Col3 FROM 
       df df_origin join
          (SELECT Col1, Col2, cast(Col2 as int) as Col2Int FROM df WHERE Col2Int > 0) df_age 
       on (df_origin.Col1 = df_age.Col1)")

edited May 12 '19 at 04:59

answered May 11 '19 at 04:54

OmG

18,337
10
57
90

utubun · Answer 5 · 2019-05-11T07:37:30.320

Using dplyr/tidyr once more¹:

library(tidyverse)

dat %>%
  mutate(COl3 = na_if(COl2, "Age")) %>%
  fill(COl3,     .direction = "up")

Data:

#dat <- read.table(
#  text = "COl1    COl2
#  James   Age
#  James   23
#  Andrew  Age
#  Andrew  24",
#  header = T,
#  stringsAsFactors = F
#)

Output:

#    COl1 COl2 COl3
#1  James  Age   23
#2  James   23   23
#3 Andrew  Age   24
#4 Andrew   24   24

¹ Which is correct only if !(any(is.na(dat$COl2)).

score 0 · Answer 6 · answered May 19 '19 at 17:51

in base R :

df <- read.table(text="COl1    COl2    
James   Age 
James   23  
Andrew  Age 
Andrew  24 ", h = T)

transform(df, COl3 = ave(COl2, COl1, FUN = function(x) tail(x,1)))
#     COl1 COl2 COl3
# 1  James  Age   23
# 2  James   23   23
# 3 Andrew  Age   24
# 4 Andrew   24   24

Is there an R function for finding the value below specific value

6 Answers6