1

I have a very simple problem and somehow I cannot solve it, despite two hours of trying to do so. Short of sending you the data, I have to explain my problem with words and very little code.

I have a dataframe (elecData) with several variables and a factor (Partido). All I want to do is to create a new dataframe selecting one factor level (Podemos), so that I have a dataframe in which only rows containing Podemos as a factor of Partido are present. The code I use is the following:

PodemosSort=subset(elecData, subset=elecData$Partido=="Podemos")

For some reason, the new dataframe does not select only the intended level (Podemos), but all levels of the factor. Moreover, I have used the subset function on a simple dataframe I made up to see if it works and it did. Why is it not working in this case?

Thank you in advance.

Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30
Spaniel
  • 329
  • 3
  • 15
  • Try ``PodemosSort=subset(elecData, Partido=="Podemos")`` – Melissa Key Apr 09 '18 at 18:26
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Are you sure there are rows with different levels? What does `table(PodemosSort$Partido)` return? You need to use `droplevels()` to remove levels from factors. Factors "remember" what's not there when subsetting by default. – MrFlick Apr 09 '18 at 18:26
  • It returns 260 cases for Podemos, 0 for all other levels. When I put levels(Podemos), it gives me all the variable names, however. OMG, I cannot believe this took me two hours. I used levels() to check if what I wanted it to do had worked. – Spaniel Apr 09 '18 at 18:31
  • Look at ``table(elecData$Partido)`` That should tell you how many observations have each value of the factor. – Melissa Key Apr 09 '18 at 18:33
  • On a perphas related note, if I may, I encountered a perhaps similar problem with something else, so I am putting it here. I also wanted to apply: PodemosSort1=PodemosSort[-which(PodemosSort$votePerc2015==0),] to eject values with 0. Previously, I renamed 3 levels and gave them the same name (Podemos). Now, when I apply the above function, nothing (that I can see) happens. Is this problem related? – Spaniel Apr 09 '18 at 18:39

1 Answers1

0
df <- data.frame(A = 1:4, B = factor(c("AA","BB","BB","AA"), levels = c( AA","BB")))
df
#   A  B
# 1 1 AA
# 2 2 BB
# 3 3 BB
# 4 4 AA

Check class of columns

sapply(df,class)
#        A         B 
# "integer"  "factor" 

Levels of factor column "B"

 levels(df$B)   
 # [1] "AA" "BB"

Rows with Level "BB"

df[df$B == "BB",]
# A  B
# 2 2 BB
# 3 3 BB

Number of rows with level "AA", "BB" using table() function

table(df$B)

#  AA BB 
#   2  2 

Using subset() function

subset(df, df$B == "BB")  
#  A  B
# 2 2 BB
# 3 3 BB

To drop levels with frequency count 0, use droplevels() function. If at all any level exists with frequency 0.

levels(df$B) <- droplevels(df$B)
levels(df$B)
Sowmya S. Manian
  • 3,723
  • 3
  • 18
  • 30