There are signs of "$"
and ","
in the Price
variable in my dataframe as the head(data) and str(data) show.
I tried to delete the signs using gsub()
:
data_new <- gsub("[$,]", "", data)
I also tried:
data_new <- gsub("[\\$,]", "", data)
But when I checked the data_new using head(data)
, it turned:
image
"c(59 32 60 56 52 95 4 47 32 293 353 23 25 119 280 330 172 65 73 370 22 32 383 65 14 26 172 106 43 59 297 32 315 50 315 363 25 254 353 230 383 23 76 209 17 378 37 105 365 353 17 95 69 105 59 353 52 254 94 172 331 383 330 95 353 172 341 242 280 59 25 353 131 156 49 383..."
Thanks to your ideas, what I am doing now is:
# delete "$" and "," sign
data_price <- gsub("[\\$,]", "", data$price)
# select other variables in the data and combine the price vector to create
a new data frame.
df <- data.frame(price = data_price, room_type = data$room_type,
accommodates = data$accommodates, bedrooms = data$bedrooms,
bathrooms = data$bathrooms, beds = data$beds,
review_scores_rating = data$review_scores_rating)
Though it works, I have a few questions:
Why did the previous way alter the data is changed? Is it common in data cleaning and preparation?
Any other way works better to delete
$
and , in thePrice
variable but keep all other information same as before? "Better", I mean, more concise code.
Here are the first 12 observations head(data, 12)
:
price room_type accommodates bedrooms bathrooms beds
<fctr> <fctr> <int> <int> <dbl> <int>
1 $150.00 Entire home/apt 6 2 2 4
2 $119.00 Entire home/apt 4 0 1 2
3 $151.00 Entire home/apt 4 2 2 2
4 $146.00 Entire home/apt 2 1 1 1
5 $140.00 Entire home/apt 4 1 1 2
6 $199.00 Entire home/apt 4 2 1 3
7 $1,200.00 Entire home/apt8 3 1 4
9 $135.00 Entire home/apt 8 4 3 4
11 $119.00 Entire home/apt 2 1 1 1
12 $55.00 Private room 2 1 1 1
Here is the structure:
'data.frame': 5052 obs. of 7 variables:
$ price : num 150 119 151 146 140 199 1200 135 119 55 ...
$ room_type : Factor w/ 3 levels "Entire home/apt",..: 1 1 1 1 1 1 1 1 1 2 ...
$ accommodates : int 6 4 4 2 4 4 8 8 2 2 ...
$ bedrooms : int 2 0 2 1 1 2 3 4 1 1 ...
$ bathrooms : num 2 1 2 1 1 1 1 3 1 1 ...
$ beds : int 4 2 2 1 2 3 4 4 1 1 ...
$ review_scores_rating: int 93 96 84 98 95 93 80 100 93 91 ...
Thank you.