4

I have a dataframe that originated from an excel file. It has the usual headers above the columns but some of the columns have % signs in them which I want to remove.

Searching stackoverflow gives some nice code for removing percentages from matrices, Any way to edit values in a matrix in R?, which did not work when I tried to apply it to my dataframe

as.numeric(gsub("%", "", my.dataframe))

instead it just returns a string of "NA"s with a warning message explaining that they were introduced by coercion. When I applied,

gsub("%", "", my.dataframe))

I got the values in "c(...)" form, where the ... represent numbers followed by commas which was reproduced for every column that I had. No % was in evidence; if I could just put this back together ... I'd be cooking.

Any help greatfully received, thanks.

Community
  • 1
  • 1
DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
  • 3
    As usual, it is difficult to help much unless you provide data for us to work with. `dput(head(...))` is often useful. – joran Jan 17 '13 at 17:52
  • it sounds like you may have factors in your data.frame as well. Try reading it in setting stringsAsFactors to FALSE: read.table(file="file.csv", stringsAsFactors = FALSE) – tcash21 Jan 17 '13 at 17:52
  • 1
    @tcash21 `stringsAsFactors = FALSE` doesn't solve the problem because once you read the data and then apply Arun's solution the resulting data.frame will consist of only factors, the numeric elements will be coerced to be factors, so the solution is using Arun's suggestion and adding `as.numeric` inside `sapply` call. – Jilber Urbina Jan 17 '13 at 18:05

1 Answers1

7

Based on @Arun's comment and imaging how your data.frame looks like:

> DF <- data.frame(X = paste0(1:5,'%'), 
                   Y = paste0(2*(1:5),'%'),
                   Z = 3*(1:5), stringsAsFactors=FALSE )

> DF # this is how I imagine your data.frame looks like
   X   Y  Z
1 1%  2%  3
2 2%  4%  6
3 3%  6%  9
4 4%  8% 12
5 5% 10% 15

> # Using @Arun's suggestion
> (DF2 <- data.frame(sapply(DF, function(x) as.numeric(gsub("%", "", x)))))
  X  Y  Z
1 1  2  3
2 2  4  6
3 3  6  9
4 4  8 12
5 5 10 15

I added as.numeric in sapply call for the resulting cols to be numeric, if I don't use as.numeric the result will be factor. Check it out using sapply(DF2, class)

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • Thanks to @Jiber and everyone else (I could only mention one person). The code above worked like a charm. It was said, 'as usual no data' apologies for that but the data I'm working with is proprietary; it is from a number of experiments. In fact, the cleaned up data is in triplicate. I most probably have to start another question thread but, how can I reduce my triplicate data to a third of the size by finding the mean of values 1:3, 4:6, 7:9, etc until I get up to 22:24 and save everything in a new dataframe? – DarrenRhodes Jan 18 '13 at 09:35
  • 1
    @user1945827 if this answer meets your needs, then consider accepting it by doing clic in the green tick-mark – Jilber Urbina Jan 18 '13 at 10:07