2

I am trying to follow along in a tutorial on ggplot but the data set I have list dollar values with $ and percent values with % making plotting impossible as it says that it must be numeric.

for example my datasets name is housing and column with the prices of homes is labeled Home.Value the prices are formatted: $24,895 $25,175

How would I go about removing the dollar sign and the percent sign?

Justin Reid
  • 119
  • 1
  • 9

2 Answers2

4

Suppose you have a data frame like this one:

df<-data.frame(A=c("$5,33","$3,55"),B=c(T,F))

Then you could replace column A with

df$A<-gsub("\\$","",df$A)

You have to use \ or fixed=T for gsub to understand that $ (or %) are what you want to get replaced.

If you want one line for $ and % you can use "OR" opperator (|)

df$A<-gsub("\\$|%","",df$A)

UPDATE:

Maybe you want it that way but take into account that your numbers are formatted with commas and will stay as characters for R. You're probably going to substitute the comma later.

To do that we have to get rid of the commas using the expression "\," (again we must escape the comas with \)

df$A<-as.numeric(gsub("\\,","",df$A))

df
    A     B
1 533  TRUE
2 355 FALSE

Notice now, A column is numeric

str(df)
'data.frame':   2 obs. of  2 variables:
 $ A: num  533 355
 $ B: logi  TRUE FALSE

Again, you could have done everything with one line but I'm guessing it would be more easy for you in two lines.

Matias Andina
  • 4,029
  • 4
  • 26
  • 58
0

This answer shows a method for removing comas when reading the data into R. It can be modified easily to also remove $, %, and other things as well (just change gsub(",","", from) to gsub("[,$%]","", from)).

Community
  • 1
  • 1
Greg Snow
  • 48,497
  • 6
  • 83
  • 110