0

So I have a table with 5 rows and 3 columns

>temp
  data1  data2  data3
1 35.5Â 410.8Â 327.2Â
2 32.7Â 406.9Â 281.3Â
3 30.3Â 410.3Â 288.2Â
4 27.6Â 403.1Â 273.6Â
5 27.3Â 364.2Â 236.8Â

I want to get rid of those pesky "Â"s and convert these factors to characters. So, I tried something like:

temp[,1:3] <- apply(temp[,1:3], 2, as.numeric)

but I got warning messages and all values coerced as NAs.

So then I tried:

temp[,1:3] <- sub("Â","",temp[,1:3])

but this did not work.

I can do this:

temp[,1] <- sub("Â","",temp[,1])

which does work, but I need to do this over a large range of columns in a larger data set. Is there a way to sub on range of columns, instead of individual vectors?

  • What is the source of your data? It is sometimes easier to fix the problem when importing data. –  Oct 30 '15 at 02:43
  • 1
    `temp[] <- lapply(temp, sub, pattern = "\\D$", replacement = "")` would do it, but you can probably remedy this situation when you read the data into R. What did you use to read it? – Rich Scriven Oct 30 '15 at 02:45
  • 1
    It's an encoding issue - see [this explanation of how a &nbsp (non-breaking space) is being mapped to `"Â"`](`http://stackoverflow.com/questions/1461907/html-encoding-issues-Â-character-showing-up-instead-of-nbsp). Try specifying an encoding when you read the data, looks like ISO-8859-1. – Ken Benoit Oct 30 '15 at 02:49
  • I am scraping the data using temp <- readHTMLTable(url, colClasses = "character") it looks like nbsp; is being mapped as "Â". How can I fix this during import? – Paul Smith Oct 30 '15 at 03:24
  • You can set the coding. `readHTMLTable()` accepts the `encoding` argument from `read.table()`. See `?read.table` for the explanation. It says it only accepts latin-1 and utf-8 but hopefully one will work. Otherwise my `lapply()` comment will likely work fine. – Rich Scriven Oct 30 '15 at 03:30
  • I did that before I posted here but saw no mention of encoding in the help data. I also tried example(readHTMLTable) but did not see anything about setting the encoding there. – Paul Smith Oct 30 '15 at 03:35
  • Thank you. Didn't quite solve the problem but I learned from your answer and help – Paul Smith Oct 30 '15 at 16:46

1 Answers1

1

strsplit(temp$data1,"Â")

strsplit(temp$data2,"Â")

strsplit(temp$data3,"Â")

Hope this helps.....

Pankaj Sharma
  • 388
  • 7
  • 18