0

I have this data in a flat file-

State Date HasASale
CA 2013-01-01 1
SC 2013-01-01 1
NY 2013-02-01 1
MN 2013-03-01 1
WA 2013-04-01 1
CA 2013-05-01 1
SC 2013-05-01 1

It is a many to many relation for state to date.

Which months have most sales? Which state has most sales?

I want to be able to plot the results.

I am using R to get this information. I am able to read the information-

hm <- read.table("states.data", header=T, sep="")
df <- data.frame(hm$Date,hm$States, hm$HasASale)
az <- with(df, zoo(hm.Freq, hm.Date))
df.TS <- aggregate(az, as.yearmon, sum)
df.sts <- aggregate(az, list(h=hm$States), sum)

This gives me the aggregates. How can I get the top 20 states by sales. Or top 20 sale dates?

Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
blue01
  • 2,035
  • 2
  • 23
  • 38
  • Please read some of the guidelines for SO: [**here**](http://stackoverflow.com/help/on-topic), [**here**](http://meta.stackexchange.com/questions/156810/stack-overflow-question-checklist) and [**here**](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). "Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results". Thanks! – Henrik Jan 07 '14 at 00:28
  • Dump of your data (or at least sample) would be preferable. You can get it with `dput(data)` – user974514 Jan 07 '14 at 01:44

1 Answers1

1

I think simple solution with by and with should work even on your starting dataset.

statesBYsales <- c(with(df, by(HasASale, State , sum)))

Analogically with dates

datesBYsales <- c(with(df, by(HasASale, Date, sum)))

After you will get those vectors, simply sort the vector and print first 20 values.

sort(datesBYsales, decreasing = TRUE)[1:20]
Blue Magister
  • 13,044
  • 5
  • 38
  • 56
user974514
  • 552
  • 1
  • 7
  • 19
  • 1
    Firstly you have to attach your data frame to use this. So `attach(df)`, then I assume that your dates are factors (default) and `HasASale` is equal to one, then use this command `unique(df[Date%in%as.factor(seq(as.Date("2013-01-01"), as.Date("2013-01-12"), "days")),1])` – user974514 Jan 10 '14 at 19:52
  • Can you explain why you unnaccepted my answer? It answers your original question completely and you were accepted it before. – user974514 Jan 10 '14 at 19:56
  • Sorry that was accidental. – blue01 Jan 11 '14 at 00:10