0

I have a table(csv file) where first two attributes are store and dept and then there are other attributes like Date, Sales etc. The table is as follows:-

Store Dept Date Sales Holiday  
1      1    ...  ...   ...  (... means some random value)
1      1    ...  ...   ...  
1      2    ...  ...   ...  
1      2    ...  ...   ...  
1      3    ...  ...   ...  
2      1    ...  ...   ...   
2      1    ...  ...   ...  
2      2    ...  ...   ...   
2      2    ...  ...   ...  
  1. Now first I loaded this file into a train variable:-

    train<- read.csv("train.csv")

  2. Then, I divided/grouped it based on Store:

    dataByStore<- split(train, train$Store)

  3. Now, I want to dataByStore and divide it as per department. So, as a result I will get data of each department of each store. I think for this, I will have to initialise an array of the size of the number of stores eg: dataByStoredept, and for each store i do

    dataByStoreByDept[i]<- split(dataByStore[i], dataByStore[i]$Dept)

So, dataByStoreByDept[i][0] will contain the first department data of store i and so on. Can anyone tell me the syntax to do this as I don't know how to declare such a 2d array. A short explanation with few lines of code would suffice.

Do mention if any of my presumptions above are wrong.

Update:
For the third step, I want to write a function which should go as follows(Its only the syntax that I don't know):

dataByStoreByDept<- array(seq_len(dataByStore)) -------> seq_len(dataByStore) is the number of stores

for(i in seq_len(dataByStore)){
dataByStoreByDept[i]<- split(dataByStore, dataByStore$dept) 
}
Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
AvinashK
  • 3,309
  • 8
  • 43
  • 94
  • Hi -- could you produce a small sample of your data and what you have tried? The "array" you mention is called a list in R-speak, as noted under "Value" in `help("split")`, and starts with an index of 1, not zero. – Frank Mar 17 '14 at 18:57
  • @Frank..I mentioned the table structure and the two lines I have written till now in the question itself. I didn't mention the date and other attribute values as it is not important here – AvinashK Mar 17 '14 at 19:09
  • 1
    I guess I should have said "stripped-down example" not "sample" data. Here's the standard reference: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It looks like @ChristopherLouden has done this for you in his answer :) – Frank Mar 17 '14 at 19:18
  • @Frank...thanks for the link – AvinashK Mar 17 '14 at 19:33

1 Answers1

2

Making up some data:

df <- data.frame(Store = sample(1:2, 20, replace = TRUE), 
                 Dept  = sample(1:2, 20, replace = TRUE))

suppose we want to split the data.frame, df, first by Store then by Dept. We can do that as follows:

lapply(split(df, as.factor(df$Store)), FUN = function(x) split(x, x$Dept))

The split(df, as.factor(df$Store)) part does the first split, by Store. The result of that is a list. We then use lapply to apply split on each element of the list created by split(df, as.factor(df$Store)). I put the split into a wrapper function so that I could pass the second split factor to split.

This will give you a list of lists as you describe.

Christopher Louden
  • 7,540
  • 2
  • 26
  • 29