0

I am looking at a video game dataset

I'm trying to calculate the average User score (User_score column in the dataset).

The issue I'm facing is that when ever I try to use the mean function to get the User score average , I always get this error:

"‘>’ not meaningful for factors[1] 16" and i get Nan as a result .

I looked up this problem online and it seems that it happens because I'm trying to find the mean for a categorical variable, however when I use typeof() to check the data type for User_score it says its a integer which is the same as another column I found the mean of(Critic_Score). i tried to remove all rows that have NAN and NA's in order for it to work but it hasn't.

Here is what I tried so far

game_data = read.csv('Video_Games_Sales_as_at_22_Dec_2016.csv')
game_data <- mutate(game_data, Critic_Score = ifelse(Critic_Score > 100, NA, Critic_Score))
game_data <- game_data[complete.cases(game_data), ]
 
typeof(game_data$User_Score)
typeof(game_data$Critic_Score)


#game_data$User_Score = as.numeric(game_data$User_Score)
game_data <- mutate(game_data, User_Score = ifelse(User_Score > 10, NA, User_Score))

head(game_data)
ncol(game_data)
nrow(game_data)
mean(game_data$Critic_Score, na.rm = T)
mean(game_data$User_Score,na.rm = T)

here are the results

[1] "integer"
[1] "integer"
‘>’ not meaningful for factors[1] 16
[1] 7017
[1] 70.24982
[1] NaN

I was wondering if anyone could help

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • Could you add a minimal sample of the data with `dput(head(game_data, n))`? – NelsonGon Jun 21 '20 at 06:08
  • 2
    Can you try doing `game_data$User_Score = as.numeric(as.character(game_data$User_Score))` – Ronak Shah Jun 21 '20 at 06:14
  • @NelsonGon User_score – internshiphopeful Jun 21 '20 at 06:34
  • @RonakShah i was wondering if you could explain what you did? did you make the column a string data type then convert it to a numeric data type? – internshiphopeful Jun 21 '20 at 06:43
  • As of R version 4.0 `read.csv` doesn't infer factors anymore, so if you use version at least 4.0 the behavior shouldn't happen. `read_csv` from the `dplyr` package doesn't convert to factors in general (any R version I mean). – Valeri Voev Jun 21 '20 at 07:59
  • 1
    This post would help you to understand https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information – Ronak Shah Jun 21 '20 at 08:09

1 Answers1

0

It seems that it's a data cleaning issue: some values in User_Score column are not numeric but "tbd", and that's why it's imported as character column instead of numeric. Moreover, read.csv() imports that as factor.

str(game_data$User_Score)
# Factor w/ 97 levels "","0","0.2","0.3",..: 79 1 82 79 1 1 84 65 83 1 ...

Check that with:

table(game_data$User_Score)

So you need to replace the "tbd" values. You need to decide what you want to do with them: replace with 0, replace with NA - it's up to you and depends on your insight into the dataset.

If you want to use NAs, you can just convert that from factor to characters and then to numeric values:

game_data$User_Score = as.numeric(as.character(game_data$User_Score))