0

I have this year variable, and I want to change it into a categorical variable with 3 levels. I use the levels function here, which is really painful.

traintest$YearBuilt <- as.factor(traintest$YearBuilt)
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1872,1875,1879,1880,1882,
                                                             1885,1890,1892,1893,1895,
                                                             1896,1898,1900,1901,1902,
                                                             1904,1905,1906,1907,1908,
                                                             1910,1911,1912,1913,1914,
                                                             1915,1916,1917,1918,1919,
                                                             1920,1921,1922,1923,1924,
                                                             1925,1926,1927,1928,1929,
                                                             1930,1931,1932,1934,1935,
                                                             1936,1937,1938,1939,1940,
                                                             1941,1942,1945,1946,1947,
                                                             1948,1949)] <- "Before1950"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(1950,1951,1952,1953,1954,
                                                             1955,1956,1957,1958,1959,
                                                             1960,1961,1962,1963,1964,
                                                             1965,1966,1967,1968,1969,
                                                             1970,1971,1972,1973,1974,
                                                             1975,1976,1977,1978,1979,
                                                             1980,1981,1982,1983,1984,
                                                             1985,1986,1987,1988,1989,
                                                             1990,1991,1992,1993,1994,
                                                             1995,1996,1997,1998,1999)] <- "Between1950-2000"
levels(traintest$YearBuilt)[levels(traintest$YearBuilt)%in%c(2000,2001,2002,2003,2004,
                                                             2005,2006,2007,2008,2009,
                                                             2010)] <- "After2000"

I tried using the cut function, but it didn't quite work for me, it basically took all the variables into the first category, and the other two categories became zeros.

Is there any easier method I can do this?

Khasteh
  • 1
  • 1
  • `cut` didn't work how? Try to make this [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – camille Jan 21 '20 at 18:25

1 Answers1

0

One option is to create a logical vector

v1 <- as.numeric(levels(traintest$YearBuilt))
i1 <- v1  < 1950
i2 <- !i1 & v1 < 2000
i3 <- v1 >=2000
levels(traintest$YearBuilt)[i1] <- "Before 1950"
levels(traintest$YearBuilt)[i2] <- "Between1950-2000"
levels(traintest$YearBuilt)[i3] <- "After 2000"

Or use cut

levels(traintest$YearBuilt) <- cut(v1, breaks = c(-Inf, 1949, 1999, 
       Inf), labels = c("Before1950", "Between1950-2000", "After 2000"))
akrun
  • 874,273
  • 37
  • 540
  • 662