0

Data:

DB1 <- data.frame(orderItemID  = c(1,2,3,4,5,6,7,8,9,10), 
orderDate = c("1.1.12","1.1.12","1.1.12","1.1.12","1.1.12", "1.1.12","1.1.12","1.1.12","2.1.12","2.1.12"),  
itemID = c(2,3,2,5,12,4,2,3,1,5),  
size = factor(c("l", "s", "xl", "xs","m", "s", "l", "m", "xxs", "xxl")), 
color = factor(c("blue", "black", "blue", "orange", "red", "navy", "red", "purple", "white", "black")),  
customerID = c(33, 15, 1, 33, 14, 55, 33, 78, 94, 23))

Expected output:

selection_order = c("yes","no","no","no","no","no","yes","no","no","no")

In the data set I have items with the same size or the same color, the same ItemID. Every registered user has his unique customerID.

I want to identify when a user orders products (more then one) with the same itemID (in different sizes or colors = for example the user with the customerID = 33 orders the same item (ItemID = 2) in two different colors) and mark it in a new column named like "selection order"(for example) with "Yes" or "No". It should NOT show me a "Yes", when he or she orders an item with an other ID. I just want to get a "yes", when there is an order (at the same day or in the past) with the same ID more then once - regardless from other ID´s (other products).

I've tried a lot already,but nothing works. There are a few thousand different userID's and ItemId's-so I can´t subset for every Id. I tried it with the duplicated function - but it's not leading to a satisfactory solution:

The problem is, that if the same person orders more then one object (customerID is duplicated then) and another person(customerId) orders an item with the same Id (itemId is duplicated then) it gives me a "yes": and it must be a "No" in this case. (in the example the duplicate function will give me an "yes" at orderItemID 4 instead of an "no")

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Jarvis
  • 71
  • 1
  • 1
  • 8
  • 2
    Please make a simple reproducible example: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example - we're not that awesome that we're going to waste time trying to create a data set based on your wordy description when you could supply code. – Spacedman Oct 28 '14 at 14:38
  • Also post what you've tried. – shadowtalker Oct 28 '14 at 14:38
  • No example, so just a pointer: the `?ave` function might be helpful or maybe `?table`. Check those out and make your question reproducible with expected output. – talat Oct 28 '14 at 14:42
  • Tried my best do put it into a form you can tolerate (I hope) – Jarvis Oct 28 '14 at 21:50
  • This is not what I wanted. I did not want to transfer the data to another table or sort the data. The records are to remain in their order. It should only be introduced an additional column that a "right" / "wrong"; "yes" / "no" .. outputs. It should show "right" "yes" ... if the (respective) order is an item that a the same person ordered in several colors and/or sizes (ie the same Item ID from the same person more than once) was ordered – Jarvis Oct 29 '14 at 18:53
  • Ok, I've edited my answer, if this isn't what you looking for, then I give up on this – David Arenburg Oct 29 '14 at 19:23
  • Thanks till now David - now I (think) get what´s the communication problem: it was my mistake (of course). Now it´s gonna show me "yes" everytime a person orders more than one item (sometimes of the same itemID) but it also shows me when he or she orders an item with an other ID, too. I just want to get an "yes", when there is an order (at the same day or in the past) with the same ID more then once - regardless from other ID´s (other products). I hope you have not lost patience with me already :) – Jarvis Oct 29 '14 at 20:34
  • 3
    Can you provide what the output should be given the data frame you provided? – n8sty Oct 30 '14 at 14:06
  • Output: "Yes","No","No","No","No","No", "Yes", "No","No","No": Only "Yes", when one costumerID orders the same itemID more then once-ALSO "No", when one customerID orders different items (with diffent ItemIDs) – Jarvis Oct 30 '14 at 22:06
  • This endless wording doesn't help. Just provide the actual values that `selection_order` column should have for the provided data set – David Arenburg Oct 31 '14 at 10:37
  • did it now- hope it will help :) – Jarvis Oct 31 '14 at 11:04

3 Answers3

2

I think I understand what is your desired output now, try

library(data.table)
setDT(DB1)[, selection_order := .N > 1, by = list(customerID, itemID)]
DB1
#     orderItemID orderDate itemID size  color customerID selection_order
#  1:           1    1.1.12      2    l   blue         33            TRUE
#  2:           2    1.1.12      3    s  black         15           FALSE
#  3:           3    1.1.12      2   xl   blue          1           FALSE
#  4:           4    1.1.12      5   xs orange         33           FALSE
#  5:           5    1.1.12     12    m    red         14           FALSE
#  6:           6    1.1.12      4    s   navy         55           FALSE
#  7:           7    1.1.12      2    l    red         33            TRUE
#  8:           8    1.1.12      3    m purple         78           FALSE
#  9:           9    2.1.12      1  xxs  white         94           FALSE
# 10:          10    2.1.12      5  xxl  black         23           FALSE

In order to convert back to a data.frame, use DB1 <- as.data.frame(DB1) (for older versions) or setDF(DB1) for the lates data.table version.


You can do it (less efficiently) with base R too

transform(DB1, selection_order = ave(itemID, list(customerID, itemID), FUN = function(x) length(x) > 1))

Or using the dplyr package

library(dplyr)
DB1 %>%
  group_by(customerID, itemID) %>%
  mutate(selection_order = n() > 1)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • The only way we will understand what you actually want is providing the actual desired output (not in words, but with actual data set) for the provided data (similar to how you provided your data set) instead of keep posting duplicate questions when you keep providing confusing information – David Arenburg Oct 31 '14 at 10:35
  • just did it-hope it´s understandable now :) – Jarvis Oct 31 '14 at 11:06
  • Yes-that´s it! But is there another possibility without turning the data frame into a data.table?(or how can I turn it back into a data frame afterwards again? David, you´re my hero for today :) Thanks a lottle – Jarvis Oct 31 '14 at 16:18
  • See my edit. I've added some alternative solutions too – David Arenburg Nov 01 '14 at 19:26
0

The following code will append a new column selection.order to your data frame if the row represents a duplicate (customerID, itemID) tuple.

# First merge together the table to itself
m<- merge(x=DB1,y=DB1,by=c("customerID","itemID"))

# Now find duplicate instances of orderItemID, note this is assumed to be UNIQUE
m$selection.order<-sapply(m$orderItemID.x,function(X) sum(m$orderItemID.x==X)) > 1
m <- m[,c("orderItemID.x","selection.order")]

# Merge the two together
DB1<- merge(DB1, unique(m), by.x="orderItemID",by.y="orderItemID.x",all.x=TRUE,all.y=FALSE)
vpipkt
  • 1,710
  • 14
  • 17
0

If you just want the subset, as you say in the title, then do this:

DB1[duplicated(DB1[c("itemID", "customerID")]),]

If you want the column, then:

f <- interaction(DB1$itemID, DB1$customerID)
DB1$multiple <- table(f)[f] > 1L

Note that is also easy to get the actual count by simplifying the last line above.

Michael Lawrence
  • 1,031
  • 5
  • 6
  • will not work-shows also yes, when the ID is orderd by another user – Jarvis Oct 30 '14 at 22:07
  • The problem with this is now, that if the same person orders more then one object (customerID is dublicated then) and another person(customerID) orders an item with the same ID (itemID is dublicated then) it gives me a yes: and it must be a "No" – Jarvis Oct 30 '14 at 23:10