-1

I have a spreadsheet with a column name Autonumber below which has maximum values expect for a few values "*"

CityName    Autonumber
NYC         2
Atlanta     4
Nashville   8
Boston      6
LA          *
Vegas       3
Florida     *

How can I:-

Scenario a) remove rows with "* " values
Scenario b) put average value of Autonumber to replace "*"

Joe
  • 183
  • 5
  • 16

3 Answers3

1

Something like this?
First, I substitute NA for "?".

df$Autonumber <- as.character(df$Autonumber)
is.na(df$Autonumber) <- df$Autonumber == "?"
df$Autonumber <- as.numeric(df$Autonumber)

Now what the question asks for.

# 1) remove rows with NA
i <- is.na(df$Autonumber)
df[!i, ]

# 2) replace NA's by mean value
df$Autonumber[i] <- mean(df$Autonumber, na.rm = TRUE)
df

Data.

df <- read.table(text = "
CityName    Autonumber
NYC         2
Atlanta     4
Nashville   8
Boston      6
LA          ?
Vegas       3
Florida     ?                 
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

This question is possibly duplicate, see this link.

Filter multiple values on a string column in dplyr

Anyway:

  • Scenario A
library(dplyr)
df # your dataframe

cA <- df %>% 
  dplyr::filter(Autonumber != "?")

  • Scenario B
MAutonumber  <- mean(df$Autonumber)
MAutonumberX <- mean(df$Autonumber, na.rm = T)

cB1 <- df %>% 
  dplyr::mutate(Autonumber = ifelse(Autonumber == "?", MAutonumber, Autonumber))

cB1 <- df %>% 
  dplyr::mutate(Autonumber = ifelse(Autonumber == "?", MAutonumberX, Autonumber))

Note that there are several ways to do what you want.

https://stackoverflow.com/a/41588616/9699371

bbiasi
  • 1,549
  • 2
  • 15
  • 31
0

Scenario a and b put together:

df[df$Autonumber == "?","Autonumber"] <- mean(df[df$Autonumber != "?","Autonumber"])

Make sure that the rest of the numbers are numerical and not factors

John Nielsen
  • 184
  • 6