0

I have a dataset with an irregular time interval, and I am trying to visualize it as a time series data and predict for 2019. I wish to know how can I convert it in R to 'ts' and what would be the frequency of the data is available at irregular time (year) interval? My data look like this:(This is an extracted part, the complete data set includes 2102 observations)

structure(list(Year = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1993", 
"1999", "2006", "2011", "2016"), class = "factor"), Region = c("South", 
"South", "South", "South", "South", "South", "South", "South", 
"South", "South", "North", "North", "North", "North", "North", 
"North", "North", "North", "North", "North", "North", "North", 
"North", "North", "North", "North", "East", "East", "East", "East", 
"East", "East", "North", "North", "North", "North", "North", 
"North", "North", "North", "North", "North", "North", "South", 
"South", "South", "South", "West", "West", "West", "West", "West", 
"West", "West"), statename = c("Andhra Pradesh", "Andhra Pradesh", 
"Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", 
"Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", 
"Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", 
"Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", 
"Andhra Pradesh", "Andhra Pradesh", "Andhra Pradesh", "Uttarakhand", 
"Haryana", "NCT of Delhi", "Rajasthan", "uttar Pradesh", "Bihar", 
"Sikkim", "Arunachal Pradesh", "Nagaland", "Manipur", "Mizoram", 
"Bihar", "Bihar", "Bihar", "Bihar", "Bihar", "Bihar", "Bihar", 
"Bihar", "Bihar", "Bihar", "Bihar", "KERALA", "KERALA", "KERALA", 
"LAKSHADWEEP", "MADHYA PRADESH", "MADHYA PRADESH", "MADHYA PRADESH", 
"MADHYA PRADESH", "MADHYA PRADESH", "MADHYA PRADESH", "MADHYA PRADESH"
), statecode = c(28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 
28, 28, 28, 28, 28, 28, 28, 28, 28, 28, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 32, 
32, 32, 31, 23, 23, 23, 23, 23, 23, 23), disctrictcode = c(1, 
2, 3, 4, 5, 6, 7, 8, 9, 10, 8, 9, 10, 11, 12, 13, 14, 15, 17, 
18, 20, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 218, 219, 
220, 221, 222, 223, 224, 225, 226, 227, 228, 601, 594, 590, 587, 
465, 461, 459, 457, 441, 447, 420), LPG = c(1.5625, 1.79640718562874, 
2.40963855421687, 0.609756097560976, 5.76923076923077, 19.6319018404908, 
5.07246376811594, 1.05263157894737, 1.69491525423729, 3.2, 5.94059405940594, 
1.11111111111111, 5.23255813953488, 8, 4.6875, NA, 1.08108108108108, 
5, 4.54545454545455, 1.5748031496063, 4.76190476190476, 18.2117388919364, 
10.1745936183022, 55.607476635514, 2.84514925373134, 3.67709936719685, 
2.55157437567861, 29.6979865771812, 16.6825548141087, 8.89787664307381, 
22.8630278063852, 35.2459016393443, 16.0183066361556, 11.5853658536585, 
14.9032992036405, 11.4190687361419, 11.9521912350598, 10.4426787741203, 
10.2941176470588, 8.53658536585366, 14.2228739002933, 10.6060606060606, 
7.45098039215686, 25.0891561083135, 35.0948454610251, 11.2079289927582, 
2.85374554102259, 1.94829277137229, 1.83006535947712, 2.22847511653655, 
1.54357439899654, 3.90050051315975, 2.78252669830342, 1.60864503942542
)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 388L, 
389L, 390L, 391L, 392L, 393L, 394L, 395L, 396L, 397L, 398L, 810L, 
811L, 812L, 813L, 814L, 815L, 816L, 817L, 818L, 819L, 820L, 910L, 
911L, 912L, 913L, 914L, 915L, 916L, 917L, 918L, 919L, 920L, 1740L, 
1741L, 1742L, 1743L, 1744L, 1745L, 1746L, 1747L, 1748L, 1749L, 
1750L), class = "data.frame")
Jyoti
  • 1
  • 1
  • 1
    Your data has problems you'd need to resolve first. Each `Year` has 10 or 11 wildly different values. Are these actually monthly data but you've lost the month? Or does each row for the same Year refer to some other missing variable? – pseudospin Aug 15 '20 at 22:16
  • Have you tried answers in this post https://stackoverflow.com/questions/29046311/how-to-convert-dataframe-into-time-series ? – Ronak Shah Aug 16 '20 at 05:21
  • The data is about what percentage of household In India (at district level) uses LPG for cooking? and I have just shared an extracted part. The complete data set has 2102 observations. It is a survey data, and the reports are published for year 1992-1993, 1998-1999, 2005-2006, 2010-2011, and 2015-2016. – Jyoti Aug 16 '20 at 08:57
  • Then you need a column to distinguish the different regions, otherwise your sample makes no sense. I wouldn't worry too much about the minor irregularity in the frequency of the surveys. Just plot them all on a graph and decide how you want to predict future values. Maybe a linear regression is a good enough first try? – pseudospin Aug 16 '20 at 10:10
  • Sir, first of all, I tried to see the change in usage pattern of LPG over the year for India, and then across all four regions (North, South, East and West) and for that I plotted the box plot under ggplot2 package. Now I want to see the change in trend by plotting the trend line and geeting the equation for the same, and for that I am unable to figure out how should I proceed further? I am updating the dataset for that. It would be kind of you if you could help me for that. – Jyoti Aug 16 '20 at 10:52

0 Answers0