1

Is there an R function to find value below specific value. Example : Below is the input table. I need value in another column (COL3) below value " Age " that is seen in column 2 (COL2)

COl1    COl2    
James   Age 
James   23  
Andrew  Age 
Andrew  24  

I need another column with

COl1    COl2    COl3
James   Age     23
James   23      23
Andrew  Age     24
Andrew  24      24
Vinay
  • 39
  • 3

6 Answers6

2

With dplyr:

 df %>% 
  mutate_if(is.factor,as.character) %>% 
   mutate(COL3=ifelse(COl2=="Age",lead(COl2),COl2))
    COl1 COl2 COL3
1  James  Age   23
2  James   23   23
3 Andrew  Age   24
4 Andrew   24   24

With base, we could do the following and remove unwanted columns:

 df$COL3<-expand.grid(df[which(df$COl2=="Age")+1,])
df
    COl1 COl2 COL3.COl1 COL3.COl2
1  James  Age     James        23
2  James   23    Andrew        23
3 Andrew  Age     James        24
4 Andrew   24    Andrew        24
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • Thanks :) I got it :) In case if I need 2 cells below Age, I tried with COl2+1 in the last line. But getting error – Vinay May 11 '19 at 04:35
  • It depends on what your data looks like. Th error might be due to length or choosing a factor or character column. What is the exact error you get? – NelsonGon May 11 '19 at 04:36
  • 1
    Error in col2 + 1 : non-numeric argument to binary operator – Vinay May 11 '19 at 04:41
  • Yes, it won't work for a `factor`. Use the first solution and set `lead(COL2,2)`. – NelsonGon May 11 '19 at 04:45
  • Perfect Thanks. But a general question, When I export from R to csv and then convert to excel. Excel becomes very slow. I have the same excel prior importing to R it is fine and working fast, Why is this? – Vinay May 11 '19 at 05:26
  • Sorry I don't quite follow. Perhaps it's system dependent. I'm not sure how big your data is and I honestly don't always use Excel so I'm not an expert on that. Perhaps ask a new question and tag it with both R and Excel? Also, you can accept an answer that helps you by clicking the tick. – NelsonGon May 11 '19 at 06:10
2

Subsetting the data frame by COL2 and joining back to the original data frame.

Base

merge(df, subset(df, COl2 != "Age"), by = c("COl1"))

dplyr

library(dplyr)
df %>% 
  left_join(df %>% filter(COl2 != "Age") , by = "COl1")

sqldf

library(sqldf)
sqldf('SELECT *
      FROM df
      LEFT JOIN(SELECT *
      FROM df WHERE COl2 != "Age" )USING (COl1)')

Output

    COl1 COl2.x COl2.y
1 Andrew    Age     24
2 Andrew     24     24
3  James    Age     23
4  James     23     23

Data

df <- structure(list(COl1 = structure(c(2L, 2L, 1L, 1L), .Label = c("Andrew", 
"James"), class = "factor"), COl2 = structure(c(3L, 1L, 3L, 2L
), .Label = c("23", "24", "Age"), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))
mpalanco
  • 12,960
  • 2
  • 59
  • 67
1

One way you can try with dplyr is to create groups using cumsum and then select the next COl2 value after "Age" in each group.

library(dplyr)

df %>%
  group_by(group = cumsum(COl2 == "Age")) %>%
  mutate(Col3 = COl2[which.max(COl2 == "Age") + 1]) %>%
  ungroup() %>%
  select(-group)

 #  COl1   COl2  Col3 
 #  <chr>  <chr> <chr>
 #1 James  Age   23   
 #2 James  23    23   
 #3 Andrew Age   24   
 #4 Andrew 24    24   

Or since we are incrementing at "Age" we can select the second value from the group

library(dplyr)
df %>%
  group_by(group = cumsum(COl2 == "Age")) %>%
  mutate(Col3 = COl2[2L])

Or using base R ave

with(df ,ave(COl2, cumsum(COl2 == "Age"), FUN = function(x) x[2L]))
#[1] "23" "23" "24" "24"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

A solution is using from sqldf‍‍ by joining the data frame df to itself with the specified constraint:

library(sqldf)
result <- sqldf("SELECT df_origin.*, df_age.Col2 as Col3 FROM 
       df df_origin join
          (SELECT Col1, Col2, cast(Col2 as int) as Col2Int FROM df WHERE Col2Int > 0) df_age 
       on (df_origin.Col1 = df_age.Col1)") 
OmG
  • 18,337
  • 10
  • 57
  • 90
0

Using dplyr/tidyr once more1:

library(tidyverse)

dat %>%
  mutate(COl3 = na_if(COl2, "Age")) %>%
  fill(COl3,     .direction = "up")
Data:
#dat <- read.table(
#  text = "COl1    COl2
#  James   Age
#  James   23
#  Andrew  Age
#  Andrew  24",
#  header = T,
#  stringsAsFactors = F
#)
Output:
#    COl1 COl2 COl3
#1  James  Age   23
#2  James   23   23
#3 Andrew  Age   24
#4 Andrew   24   24


1 Which is correct only if !(any(is.na(dat$COl2)).

utubun
  • 4,400
  • 1
  • 14
  • 17
0

in base R :

df <- read.table(text="COl1    COl2    
James   Age 
James   23  
Andrew  Age 
Andrew  24 ", h = T)

transform(df, COl3 = ave(COl2, COl1, FUN = function(x) tail(x,1)))
#     COl1 COl2 COl3
# 1  James  Age   23
# 2  James   23   23
# 3 Andrew  Age   24
# 4 Andrew   24   24
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167