I am new to R and perhaps my question is very silly. First of all, I would like to describe my data and then the problem.
I have (unbalanced) panel of monthly household consumption data from Jan 2000 to Dec 2010. In Jan 2005, consumption tax increased from 7% to 10%. At this moment, I am trying to understand the data more and get very deeper understanding of the data.
For this purpose, I would like to take an average of 12 months consumption before the tax increase, that is Jan 2004 to Dec 2004. Then using this computed mean, I would like to classify households into 4 categories: first category USD 1000-2500, second category USD 2501 - 5000, third category USD 5001-7500, and fourth category USD 7501 - 10000. (in data set minimum monthly consumption expenditure is USD 1000 and max is USD 10,000.00)
Using the above categorization criteria, I would like to check by how much expenditure has increased in Jan 2005, feb 2005 through dec 2010 for each category. I have been struggling on this issue for about 3 weeks and I could not figure how to even start. I would be highly grateful any suggestions and help. Thank you so much in advance.
I am using confidential data from tax office and I am not able to share the same dataset. However, I created the data that is similar to it:
data2 <- structure(list(id = c(1223, 1223, 1223, 1223, 1223, 1223, 1223,
1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223, 1223,
1223, 1223, 1223, 1223, 1223, 1223, 1224, 1224, 1224, 1224, 1224,
1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224,
1224, 1224, 1224, 1224, 1224, 1224, 1224, 1224), con = c(1954,
1965, 2220, 1789, 2855, 2192, 1028, 2745, 1190, 2892, 1941, 1045,
1778, 1660, 1037, 1259, 1655, 1429, 1617, 1927, 1105, 1948, 1929,
1673, 7309, 9420, 9849, 7824, 7522, 7448, 7370, 6717, 9024, 7635,
9316, 5173, 9071, 5997, 6315, 6636, 9978, 8077, 9170, 5440, 9442,
6668, 5732, 8460), year = c(2004, 2004, 2004, 2004, 2004, 2004,
2004, 2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2004, 2004, 2004, 2004,
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2005, 2005, 2005,
2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005, 2005), month = c(1,
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11, 12)), row.names = c(NA, -48L), class = c("tbl_df",
"tbl", "data.frame"))