0

I have a dataframe titled FilteredData with many columns. Specifically, there are two columns I am interested in: Date and Sale number.

I want to group all Sale number entries by dates. Date is a date-type field, and Sale number is a character-type field. If I'm not mistaken, I think these types are the reason why other Q&As on S.O. haven't been much help to me.

How can I do this?

I've tried the following:

aggregate(FilteredData$`Sale number`, by FilteredData$Date, FUN = count)
group_by(FilteredData$`Sale number`, FilteredData$Date)

Neither worked, and neither did the solution found here when I tried it.

I tried the following:

library(sqldf)
Freq = sqldf('SELECT Date, COUNT('Sale Number') FROM FilteredData GROUP BY Date')

and it surprisingly worked. However, is there a way to obtain this result without having to use SQL syntax, i.e. something "purely" in R?

daOnlyBG
  • 595
  • 4
  • 20
  • 49
  • 3
    Maybe just `table(FilteredData$FundedDate)` or `rowSums(table(FilteredData$FundedDate, FilteredData$FundedDate) > 0)` for unique counts by date. – lmo May 24 '17 at 20:05
  • @lmo I've been banging my head around trying to get what you just suggested. Thank you – daOnlyBG May 24 '17 at 20:08
  • I notice that `by FilteredData$FundedDate` cannot work. Did you have an = in there? – G5W May 24 '17 at 20:09
  • @G5W, why can it not work? – daOnlyBG May 24 '17 at 20:12
  • or `rowSums(table(FilteredData$FundedDate, FilteredData$\`Sale number\`) > 0)` for unique counts by date. (typo in previous comment). – lmo May 24 '17 at 20:12
  • `by FilteredData$FundedDate` won't parse. You need `by = FilteredData$FundedDate ` – G5W May 24 '17 at 20:13

2 Answers2

1

You can use data.table as follows:

library(data.table)
setDT(FilteredData)

FilteredData[ , uniqueN(`Sale number`), by = Date]

I'm not sure if dplyr has a tailored function for this... you may just want length(unique(`Sale number`)) there.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
1

Your question is a little unclear... So you want to group by date and then count the number of non-duplicate entries within a date?

dplyr can do this:

FilteredData %>% # take filtered data
  group_by(FundedDate) %>% # group by the date
  subset(!duplicated('Sale number')) %>% # remove rows that are duplicated sales numbers 
  count('Sale number') # count sales numbers
Mark White
  • 1,228
  • 2
  • 10
  • 25