0

I'm using open office calc to upload statistics from the internet into R for data manipulation. Some of the files have columns that are in percents with the percent symbol present. I need to get rid of the percentage sign in order to run the script i have created. Whether or not the numbers are changed into decimal or remain in percent (without the actual percent symbol) does not matter. When i created the script i had use of microsoft office and was able to change the percent to decimal there before uploading into R, but have since given up my subscription to microsoft and cannot find a way to do this in open office (would save me alot of time to write a script for it anyhow as im daily working with 30+ sheets, and manually converting some of the colomns is time consuming). Thank you for your assistance

  • 1
    See the accepted answer here: http://stackoverflow.com/questions/10294284/remove-all-special-characters-from-a-string-in-r – Phil Apr 17 '15 at 12:14
  • There are plenty of ways of dealing with this. It would serve best if you could simulate a small dataset that demonstrates your problem. – Roman Luštrik Apr 17 '15 at 12:41
  • Thanks all for the responses. im not very proficient with R but i found a code that nearly does the trick. Here's a sample of a file im trying to convert – Dan TheMan Apr 18 '15 at 06:17
  • Name x y z xy xyz dantheman 15.2% 10.1% 15.1 16.4 19.7 Billybob 22.1% 16.1% 14.0 20.1 18.7 – Dan TheMan Apr 18 '15 at 06:19
  • SOrry i cant figure out how to write the above in table form, but im sure you all get the idea. Name row has 5 columns (x,y,z,xy,xyz). with two rows under the names, dantheman and billybob. This is a replication of my data as my tables are very long containing about 10 columns and hundreds of rows. I found a code as i said earlier that will erase the percentage signs but my problem with the code is that it takes the names and turns them into NA (ex. dantheman will become NA) Will show the code in the next message – Dan TheMan Apr 18 '15 at 06:28
  • Heres the code (pitchdash <- data.frame(sapply(pitchdash, function(x) as.numeric(gsub("%", "", x))))) @Jilber THank you for this code, its so very close to what i need. If you see this please help me. Thank you all again for your support – Dan TheMan Apr 18 '15 at 06:29
  • You'll need to operate only on the numeric columns, not the whole data frame. Look at the duplicate question I linked to above - there they needed to preserve a date column, as you need to preserve a name column. – Sam Firke Apr 18 '15 at 14:22
  • thank you @SamFirke. works great for one column of names. Unfortunatley i have two columns that have names in them, and if i repeat the function, with the name of the second column in the second piece of code it goes back to giving me NA's in both name columns. Is there anyway to fix this? Heres the warning given:Warning message: In extract_numeric(c(273L, 94L, 333L, 362L, 114L, 392L, 165L, 71L, : NAs introduced by coercion. Thank you again very much – Dan TheMan Apr 19 '15 at 05:36
  • In order to operate solely on the numeric columns, you'll want to exempt the character columns, however many there are. In that question, the accepted answer does that by referring to the column number df[-1, ]; the answer with extract_numeric refers to the column as "Year." You can add your other character column to that argument. Either way, you'll want to read up on how to select certain columns of a data frame, like: http://stackoverflow.com/questions/10085806/extracting-specific-columns-from-a-data-frame – Sam Firke Apr 19 '15 at 21:14
  • Thanks a bunch for all the help @SamFirke. and every one else that popped in to offer advice. this was the final code i came up with and it works great so far. All i did was copy the first line of your post and add the second column name to it. cbind(df %>% select(Name),(df %>% select(Team), # preserve the year column as-is df %>% select(-Name) %>% mutate_each(funs(extract_numeric)) )) – Dan TheMan Apr 20 '15 at 02:41

1 Answers1

0

See these 2 answers:

How to read data when some numbers contain commas as thousand separator?

Specify custom Date format for colClasses argument in read.table/read.csv

Just modify the answers to remove the percent sign instead of the commas and optionally divide by 100.

Community
  • 1
  • 1
Greg Snow
  • 48,497
  • 6
  • 83
  • 110