0

I have a typical problem. I have a dataset from which I have filtered the dataframe based on selected values in a column. When I check the levels it gives the same levels as before though the data frame does not show that. Please the images. The levels(gd2$Series) returns same levels as as levels(gd1$Series), why? The code is as follows

gd <- read.csv("d3.csv")
names(gd) = sub("X","",names(gd))
levels(gd$Series)

names(gd)
sapply(gd, class)
gd1 <- gd[order(gd$Series),] 
rownames(gd1) <- NULL #reordering the rows
gd1[ gd1 == ".." ] <- NA

rownames(gd1) <- NULL #reordering the rows
levels(gd1$Series) # COMPARE THE LEVELS HERE WITH gd2$Series.

library(dplyr)
selected <- c("Air transport, freight (million ton-km)", 
          "Air transport, passengers carried",
          "Railways, goods transported (million ton-km)", 
          "Railways, passengers carried (million passenger-km)", 
          "Rail lines (total route-km)")


gd2 <- as.data.frame(gd1[gd1$Series %in% selected,])
levels(gd2$Series)

The data is downloadable from this link: https://1drv.ms/u/s!AtnYqHF_dUb1gdFSRkzrlSanIIbMPg It is a small csv file.

The gd2 Dataframe

ambrish dhaka
  • 689
  • 7
  • 27
  • Filtering out some specific value does not result in the value being dropped from the levels of the factor. You could change that by using the `droplevels` command for the new series. https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/droplevels – User2321 Aug 21 '18 at 15:15
  • Thanks! Sorted out with `droplevels`, But my supplementary query is that should this be run on the concerned column only or for the entire dataframe. – ambrish dhaka Aug 21 '18 at 22:55
  • I would say based on what you have provided, only on the column since it is the only factor column you have. – User2321 Aug 22 '18 at 14:03

0 Answers0