I have sales data from walmart like superstore.
It has 5 variables:
ProductId, Category, Mode of Operations, Sales, Profit Margin
Out of these 5 variables two are unique i.e
ProductId and Category
. The other 3 can change its value for each month.The data is for 23 months.
So, if I had one product, life would be simple, and it would be a 2 dimensional data. But as my number of products are over 20,000, I am a little confused how to tackle this data so I can run linear regression
, cart
and other functions on it.
Edit1:example
read.table(text="
A B C1 D1 E1 C2 D2 E2 C3 D3 E3 C4 D4 E4 C5 D5 E5 C6 D6 E6
1 K X 5 5 X 6 7 Y 2 1 Z 0 0 X 6 7 X 7 9
2 L Y 5 4 X 6 9 Z 2 3 Z 0 0 X 6 6 X 7 10
3 M X 5 5 Z 6 7 X 2 1 Y 0 0 Y 6 7 Y 7 9",
header=TRUE)
C1,D1,E1
are values for 1st month, C2,D2,E2
are values for second month and so on.
I would expect it to become
A B C D E MONTH
1 K X 5 5 1
1 K X 6 7 2
1 K Y 2 1 3
1 K Z 0 0 4
1 K X 6 7 5
1 K X 7 9 6
2 L Y 5 4 1
2 L X 6 9 2
. . .
structure of raw data
structure of data after applying melt, transform and cast function
Which is not something I was expecting.
Under A should have 4 levels (i.e Joint Venture, Reseller, Distribution, 0)
Under B should have prices
Under C should have profit margin which in my data is always in decimal places 0.xxx
what am I doing wrong?
edit 3
dcast function is not picking values from value column to assign to variables A,B and C. Where it is getting all these 1s to put under A, B and C?
> names(wushang_transformed2)
[1] "Productcode" "Category" "CAT" "Month" "value"
> wushang_casted = dcast(wushang_transformed2, Productcode+Category+Month~CAT, value.var="value")
What's wrong with this command?