0

I want to create a new column based on whether or not it is a duplicate row. I have my data ordered by user # then date. I want the new column to check to see if the value in the first column is equal to the row before, then do the same for the date.

For example I have the first two columns of data and want to create a boolean array in the 3rd column whether or not it was a new user on a new day:

User#   Date     Unique   
1       1/1/17    1 
1       1/1/17    0
1       1/2/17    1
2       1/1/17    1
3       1/1/17    1
3       1/2/17    1
PMo
  • 1
  • 2
  • 2
    Welcome to [Stack Overflow](http://stackoverflow.com)! At this site you are expected to try to **write the code yourself**. After [doing more research](http://meta.stackoverflow.com/questions/261592) if you have a problem you can **post what you've tried** with a **clear explanation of what isn't working** and providing a **[Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve)**. I suggest reading [How to Ask a good question](http://stackoverflow.com/questions/how-to-ask). Also, be sure to take the [tour](http://stackoverflow.com/tour) – AWinkle May 30 '17 at 18:07

2 Answers2

0

There might be a typo in the sample data set as the last row is unique per the given criteria

df1$Unique <- c(1, diff(df1$User) != 0 | diff(df1$Date) != 0)

  User       Date Unique
1    1 2017-01-01      1
2    1 2017-01-01      0
3    1 2017-01-02      1
4    2 2017-01-01      1
5    3 2017-01-01      1
6    3 2017-01-02      1

update

If the users are stored as factors then the following will work

User <- c(1, 1, 1, 2, 3, 3)
User <- letters[User]
Date <- c("1/1/17", "1/1/17", "1/4/17", "1/1/17", "1/1/17", "1/2/17")
df1 <- data.frame(User, Date)
df1$Date <- as.Date(df1$Date, "%m/%d/%y")

df1$Unique <- c(1, diff(as.numeric(df1$User)) != 0 | diff(df1$Date) > 1)

  User       Date Unique
1    a 2017-01-01      1
2    a 2017-01-01      0
3    a 2017-01-04      1
4    b 2017-01-01      1
5    c 2017-01-01      1
6    c 2017-01-02      0
manotheshark
  • 4,297
  • 17
  • 30
  • Is there a way you would adapt the code above if the user ID has a mix of letters and numbers? And adapt to be "unique" if the date difference is less than 2 days? I applied your method to my data and got a "1" for the first row then NA the rest of the way down. Thanks in advance. – PMo Jun 04 '17 at 14:50
  • @PMo updated answer to include Users that are stored as text and date range – manotheshark Jun 14 '17 at 18:40
0

This may give you what you are looking for

library(dplyr)

User <- c(1,1,1,2,3,3)
Date <- c("1/1/17","1/1/17","1/2/17","1/1/17","1/1/17","1/2/17")

df <- data.frame(User,Date,stringsAsFactors = FALSE)

df <- df %>%
       group_by(User, Date) %>%
       mutate(Unique = if_else(duplicated(Date) == FALSE, 1, 0))
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21