0

I have a data frame with several thousand rows. The data frame is ordered by a column with numerical values. I want to create a column indicating whether the row is the first row containing the specific numerical value. It should only be based on that column.

Data frame A is an example of how my data is organized right now and B is how I would like it to be organized.

A <- data.frame(c(22, 27, 32, 32, 33, 33, 37), c(121, 243, 765, 322, 433, 435, 728)) 
colnames(A) <- c("V1", "V2")

B <- data.frame(c(22, 27, 32, 32, 33, 33, 37), c(121, 243, 765, 322, 433, 435, 728), c("y", "y", "y", "n", "y", "n", "y")) 
colnames(B) <- c("V1", "V2", "V3")
erikfjonsson
  • 197
  • 1
  • 11

1 Answers1

5

You are basically looking for duplicates, i.e.

!duplicated(A$V1)
#[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

or

ifelse(!duplicated(A$V1), 'y', 'n')
#[1] "y" "y" "y" "n" "y" "n" "y"

We can also avoid ifelse (thanks to @jogo)

c("n", "y")[1 + !duplicated(A$V1)]
#[1] "y" "y" "y" "n" "y" "n" "y"
Sotos
  • 51,121
  • 6
  • 32
  • 66