0

So I have been trying to merge these 2 data.tables that look like this

structure(list(orderDate = structure(c(18414, 18444, 18475, 18506, 
18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779
), class = "Date"), productName = c("A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady"), totalOrders = c(2L, 
15L, 52L, 225L, 27L, 10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L)), row.names = c(NA, 
-13L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>, sorted = "orderDate")

and

structure(list(returnDate = structure(c(18444, 18475, 18506, 
18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779
), class = "Date"), productName = c("A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
"A. De La Sota Lady"), totalReturns = c(5L, 10L, 129L, 73L, 18L, 
3L, 8L, 15L, 43L, 44L, 30L, 6L), orderDate = structure(c(18444, 
18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718, 
18748, 18779), class = "Date")), row.names = c(NA, -12L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>, sorted = "orderDate")

the result is a merged data.table

structure(list(orderDate = structure(c(18444, 18475, 18506, 18536, 
18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779), class = "Date"), 
    productName = c("A. De La Sota Lady", "A. De La Sota Lady", 
    "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
    "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
    "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady", 
    "A. De La Sota Lady"), totalOrders = c(15L, 52L, 225L, 27L, 
    10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L), totalReturns = c(5L, 
    10L, 129L, 73L, 18L, 3L, 8L, 15L, 43L, 44L, 30L, 6L)), sorted = "orderDate", class = c("data.table", 
"data.frame"), row.names = c(NA, -12L), .internal.selfref = <pointer: 0x0000024e1b7d1ef0>)

however in the returnTest table there there's a date row that's missing.

I tried merging using the productName column as a key column but for some reason it kept giving me an error and that was the only way I could have merged both tables without an error. Ultimately, I want to have a data table to check the return rate of a certain product, but with this method I'm always missing a month where I could have orders but no returns or vice versa. Can anyone please help? I've been trying to solve this for about a week now.

test1  <- ordersByProductNameAndSize[`productName` == 'A. De La Sota Lady'  ] 
setkeyv(test1, 'orderDate')

test2 <- returnsByProductNameAndSize[`productName` == 'A. De La Sota Lady'  ]
test2[, 'orderDate' := returnDate]
setkeyv(test2, 'orderDate'
  
returnTest <- merge(test1, test2[, c('orderDate', 'totalReturns'), all = TRUE, with = FALSE]) # , 'totalReturns' 
returnTest[, 'returnRate' := ((totalReturns / totalOrders) *100)] 
Greg
  • 3,054
  • 6
  • 27
  • 1
    Please don't post images of data, few people will attempt to transcribe your data into something usable. It's best to paste the output from `dput(x)` (into a [code block](https://stackoverflow.com/editing-help)) where `x` is enough rows/columns to get the point across. – r2evans Jul 01 '21 at 13:41
  • From @Skaggs: are you using `all=TRUE` in your merge? – r2evans Jul 01 '21 at 13:43
  • I would love to do that but I actually don't know what you're talking about nor how to do it, I have tried using the ```dput()``` function on my test 1 variable but didn't know what to do with it not how to upload it here, and yes I did just use ```all = TRUE``` in my merge, but still have the same issue @r2evans – Andrew Kfoury Jul 01 '21 at 13:49
  • I said *"paste the output from `dput(x)` (into a code block)"* (by [edit]ing your question). I don't know how else to say it. Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Jul 01 '21 at 13:54
  • 1
    @r2evans perhaps it is time to update your .NORM package ;-).. https://xkcd.com/2116/ – Wimpel Jul 01 '21 at 13:57
  • @r2evans haha thanks, I'm updating it now, I don't know if that makes sense now xD – Andrew Kfoury Jul 01 '21 at 13:59
  • @Greg you're an absolute angel man, thanks for the help in the edit. I'm honestly very new to programming and don't really know how my way around stackoverflow yet – Andrew Kfoury Jul 01 '21 at 14:06
  • AndrewKfoury, FYI, in Stack sites, the "code fence" (`\`\`\``) must be on a line of its own, starting on the first character of the line with nothing after it. The only exception is that the first one *may* (optionally) have a language hint, so the first line can be `\`\`\`` (in which case Stack guesses, often good enough) or `\`\`\`lang-r`, then a newline, then your code/data. – r2evans Jul 01 '21 at 14:12
  • Happy to help! For future reference, the formatting tips can always be accessed when creating or editing your question, by clicking [More](https://stackoverflow.com/editing-help). If you're new to R, and using `data.table` is not truly a performance requirement, I'd highly recommend the [`dplyr`](https://dplyr.tidyverse.org) package, which intuitively handles [`join`s](https://dplyr.tidyverse.org/reference/join.html), `filter`ing, aggregation, sorting, and so forth. – Greg Jul 01 '21 at 14:13
  • 1
    @Greg I'm using data.table for speed improvements as I'm also trying to develop a shiny app, so I thought I'd need all the performance boosts I can get hehe. But, I also need to work with data.table for my internship as it's required. – Andrew Kfoury Jul 01 '21 at 14:21

1 Answers1

0

Thanks for posting your data! If I am understanding this correctly, your "missing" value is simply the case where an item was ordered on 2020-06-01, but nothing was returned on that date, correct?

t1 <- structure(list(
  orderDate = structure(c(18414, 18444, 18475, 18506,  18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779 ), class = "Date"),
  productName = c("A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady"),
  totalOrders = c(2L,  15L, 52L, 225L, 27L, 10L, 5L, 19L, 36L, 41L, 58L, 16L, 2L)),
  row.names = c(NA,  -13L),
  class = c("data.table", "data.frame"))

t2 <- structure(list(
  returnDate = structure(c(18444, 18475, 18506,  18536, 18567, 18597, 18628, 18659, 18687, 18718, 18748, 18779 ), class = "Date"),
  productName = c("A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady", "A. De La Sota Lady", "A. De La Sota Lady",  "A. De La Sota Lady"),
  totalReturns = c(5L, 10L, 129L, 73L, 18L,  3L, 8L, 15L, 43L, 44L, 30L, 6L),
  orderDate = structure(c(18444,  18475, 18506, 18536, 18567, 18597, 18628, 18659, 18687, 18718,  18748, 18779), class = "Date")),
  row.names = c(NA, -12L),
  class = c("data.table",  "data.frame"))

rt <- merge(t1, t2, by = "orderDate", all = TRUE)

# calculate return rate
rt$returnRate <- (rt$totalReturns / rt$totalOrders) * 100
Skaqqs
  • 4,010
  • 1
  • 7
  • 21
  • Yes exactly! I used the argument ```all = TRUE``` but I'm still having this missing value. I also thought about maybe adding a "0" value for the month where I had orders but no returns but then I realized that I'll be running into dividing by 0 at some points where I have returns but no orders on a specific month. – Andrew Kfoury Jul 01 '21 at 14:16
  • Using my approach, your `returnRate` on a date when nothing is returned is `NA`. If you prefer for this value to be 0, then use `rt$returnRate2 <- ifelse(is.na(rt$returnRate), 0, rt$returnRate)` – Skaqqs Jul 01 '21 at 14:17
  • Thank you very much! it worked, kinda. when I tried the ```rt$returnRate <- (rt$totalReturns / rt$totalOrders) * 100``` it gave back an ```Error in rt$totalReturns : object of type 'closure' is not subsettable ``` However, I was able to use the same way of calculating the return rate without using ```$``` – Andrew Kfoury Jul 01 '21 at 14:43
  • 1
    data.table code golf: `rt <- merge(t1, t2, all = TRUE)[, returnRate := 100 * totalReturns / totalOrders]`. In general, it usually best to explicitly list the `by=` fields, in case it is ever evaluated programmatically/unsupervised. – r2evans Jul 01 '21 at 14:44
  • @AndrewKfoury, it sounds like you assigned to something other than `rt`, make sure to update your `rt$` references to your own variable name. – r2evans Jul 01 '21 at 14:45
  • @r2evans hahahah thanks! I honestly feel very dumb right now xD. But anyways thank you very much for the help today! – Andrew Kfoury Jul 01 '21 at 15:01
  • 2
    FYI, in R a "closure" is a fancy term for a function and its enclosing environment. It doesn't make sense, for example, to do `func <- function(x) {...}` and then do `func$quux`, since one doesn't use `$` (subsetting) on functions. Well, now realizing that, if not otherwise defined, R sees `rt` as the function `stats::rt`, one might start to realize the value of not naming variables the same as base R function names. It's a hard habit to break, I still do it frequently, but the consequence can be confusing. But now you know what `'closure' is not subsettable` really means :-) – r2evans Jul 01 '21 at 15:19
  • Good lesson for me too. Thanks for your advice and help on this one @r2evans – Skaqqs Jul 01 '21 at 15:22