0

I have read excel file in R, where sheet1 has 51500 rows and 5 column and sheet 2 has user ID of buyers (only one column). Objective: Aim to extract the user in sheet_1 whose User Id are occurred in sheet 2. Here is the two example input files and desired output:

df <- data.frame(User.ID=c(12: 17), Group="Test", Spend=c(15:20), Purchase=c(5:10))

    df
  User.ID Group Spend Purchase
1      12  Test    15        5
2      13  Test    16        6
3      14  Test    17        7
4      15  Test    18        8
5      16  Test    19        9
6      17  Test    20       10

 hash.ID <- data.frame(User.ID= c(13:16))

  User.ID
1      13
2      14
3      15
4      16

desired output : 

  User.ID Group Spend Purchase     Redem_Status
1      12  Test    15        5    Test_NonRedeemer
2      13  Test    16        6    Test_Redeemer
3      14  Test    17        7    Test_Redeemer
4      15  Test    18        8    Test_Redeemer
5      16  Test    19        9    Test_Redeemer
6      17  Test    20       10    Test_NonRedeemer

based on above example, we can see that if user Id from df is existed in hash.ID table, then we add new column and label it as Test_Redeemer, otherwise label it as Test_NonRedeemer. Is there any straightforward approach that can do this task ? Thanks a lot !!

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • Or (as it seems like all data is already contained in f2), f2[f2$status=="pass",] – Heroka Feb 27 '16 at 12:38
  • @Heroka: thanks. But, to be clarify, in my toy example, f1 is only stdId column, while f2 is whole table that also include some stdId from f1. so based on the stdId from f1, find out which instance from f2 are "pass". Thank you –  Feb 27 '16 at 12:48
  • You say you want to extract students that failed the test, but your desired output shows only those that passed? – Roman Luštrik Feb 28 '16 at 21:14
  • @RomanLuštrik: I have updated my post. Could you give me some idea how to get through with this in R ? –  Feb 29 '16 at 09:47
  • What about `sheet_1[sheet1[, "User ID"] %in% sheet_2, ]`? – Roman Luštrik Feb 29 '16 at 09:50
  • @RomanLuštrik: I tried, but this code couldn't fetch any data that I want to get. My approach is, based on the User_ID in sheet_2, I am searching whether user in sheet_1 whose User_ID match or not, if there is matched User_ID in sheet_1 found, I am gonna add new column into sheet_1 and give name for it. according to my approach, could you give me some IDEA to get easily done this in R ? Thank you –  Feb 29 '16 at 09:57
  • Please provide a small reproducible example (something we can copy/paste into our R sessions) and what the expected result would look like. The above implementation will take sheet_1 and filter out all users that appear in sheet_2. I was under the impression that this is what you were after. To create a new column, just `sheet_1[sheet1[, "User ID"] %in% sheet_2, "newcolumn"] <- "newvalue"`. – Roman Luštrik Feb 29 '16 at 14:04
  • @RomanLuštrik: I have updated my post again. I added reproducible example and desired output. Could you give me some IDEA to get done with this ? Thanks a lot !! –  Feb 29 '16 at 15:21

1 Answers1

0

The testcase you presented helped, thanks. As mentioned in the comments, you need to subset the rows you're interested in and assign them value. By placing ! in front of the statement (notice the braces!) you negate the statement and thus select all records not selected in the previous call.

df[df$User.ID %in% hash.ID$User.ID, "Redem_Status"] <- "Test_Redeemer"
df[!(df$User.ID %in% hash.ID$User.ID), "Redem_Status"] <- "Test_NonRedeemer"
df

  User.ID Group Spend Purchase     Redem_Status
1      12  Test    15        5 Test_NonRedeemer
2      13  Test    16        6    Test_Redeemer
3      14  Test    17        7    Test_Redeemer
4      15  Test    18        8    Test_Redeemer
5      16  Test    19        9    Test_Redeemer
6      17  Test    20       10 Test_NonRedeemer
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197