How to remover certain caracters from each row of a column

Question

I have this data table:

Year    GDP
1998–99 <U+20B9>1,668,739
1999–00 <U+20B9>1,858,205
2000–01 <U+20B9>2,000,743
2001–02 <U+20B9>2,175,260
2002–03 <U+20B9>2,343,864
2003–04 <U+20B9>2,625,819
2004–05 <U+20B9>2,971,464
2005–06 <U+20B9>3,390,503
2006–07 <U+20B9>3,953,276
2007–08 <U+20B9>4,582,086
2008–09 <U+20B9>5,303,567
2009–10 <U+20B9>6,108,903
2010–11 <U+20B9>7,248,860
2011–12 <U+20B9>8,391,691
2012–13 <U+20B9>9,388,876

What I want to do is to remove "" from all of the rows. How can I do it?

I was trying with grepl and grep, but did not work for me:

df[!grepl("<U+20B9>", df$GDP),]

df[ grep("REVERSE", df$Name, invert = TRUE) , ]

These do not work for me...

What I want is something like this:

Year    GDP
1998–99 1,668,739
1999–00 1,858,205
2000–01 2,000,743
2001–02 2,175,260
2002–03 2,343,864
2003–04 2,625,819
2004–05 2,971,464
2005–06 3,390,503
2006–07 3,953,276
2007–08 4,582,086
2008–09 5,303,567
2009–10 6,108,903
2010–11 7,248,860
2011–12 8,391,691
2012–13 9,388,876

I also tried using below solution but did not work for me either... How to identify/delete non-UTF-8 characters in R

x <- "<U+20B9>"
Encoding(x) <- "UTF-8"
iconv(x, "UTF-8", "UTF-8",sub='')

returns me "<U+20B9>" as it is...

Possible duplicate of [How to identify/delete non-UTF-8 characters in R](http://stackoverflow.com/questions/17291287/how-to-identify-delete-non-utf-8-characters-in-r) — r2evans, Mar 24 '17 at 20:14

score 1 · Answer 1 · answered Mar 24 '17 at 20:50

a data.table attempt with some example data

data <- setDT(data.frame(
 Year=c('1998–99', 
     '1999–00', 
     '2000–01', 
     '2001–02', 
     '2002–03', 
     '2003–04', 
     '2004–05', 
     '2005–06', 
     '2006–07', 
     '2007–08'),
 GDP=c('<U+20B9>1,668,739',
    '<U+20B9>1,858,205',
    '<U+20B9>2,000,743',
    '<U+20B9>2,175,260',
    '<U+20B9>2,343,864',
    '<U+20B9>2,625,819',
    '<U+20B9>2,971,464',
    '<U+20B9>3,390,503',
    '<U+20B9>3,953,276',
    '<U+20B9>4,582,086')))

data[,GDP:=sub("^\\s*<U\\+\\w+>\\s*",'',data$GDP)]

the regular epxression pattern for this can be viewed as:

U \ \ + part implies like a sequence of U+
\ \ w+ simply states letters or digitis, more than just 1
this is in part wrapped in < > and then \ \ s* which just removes any whitespaces

This doesn't work. Your sample data is using the literal string ``, which is how R is *represents* (but not *stores*) a unicode character. (Example: type in `"\u20b9"`.) As such, `sub`ing for the literal `` does not work. — r2evans, Mar 24 '17 at 21:04

score 0 · Answer 2 · answered Mar 30 '17 at 08:29

0

Smallest answer to above is:

df$GDP <- substring(df$GDP, 2)

answered Mar 30 '17 at 08:29

Madhu Sareen

549
1
8
20

How to remover certain caracters from each row of a column

2 Answers2