-2

Dataset used in this question is "Wage" from ISLR package

    library(ISLR)

    head(Wage)

   year age           maritl     race       education             region       jobclass         health
1 2006  18 1. Never Married 1. White    1. < HS Grad 2. Middle Atlantic  1. Industrial      1. <=Good
2 2004  24 1. Never Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good
3 2003  45       2. Married 1. White 3. Some College 2. Middle Atlantic  1. Industrial      1. <=Good
  health_ins  logwage      wage
1      2. No 4.318063  75.04315
2      2. No 4.255273  70.47602
3     1. Yes 4.875061 130.98218

3rd column to 9th column contains unwanted characters (first element) such as 1. or 2.

How to remove all unwanted characters and numbers for all mentioned columns

Jaap
  • 81,064
  • 34
  • 182
  • 193
Tuyen
  • 977
  • 1
  • 8
  • 23
  • 1
    Hi Tuyen, have a look here https://stackoverflow.com/help/how-to-ask and here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and revise your question. Also have a look at http://tidyverse.org/ for your direct problem – Jan Aug 27 '17 at 10:18

1 Answers1

1

mutate all "[1-9]. "

library(dplyr)
temp <- Wage
ans <- temp %>% 
         mutate_at(3:9, funs(sub("\\d. ", "", .)))

Output

head(ans)

  year age        maritl  race    education          region    jobclass      health
1 2006  18 Never Married White    < HS Grad Middle Atlantic  Industrial      <=Good
2 2004  24 Never Married White College Grad Middle Atlantic Information >=Very Good
3 2003  45       Married White Some College Middle Atlantic  Industrial      <=Good
4 2003  43       Married Asian College Grad Middle Atlantic Information >=Very Good
5 2005  50      Divorced White      HS Grad Middle Atlantic Information      <=Good
6 2008  54       Married White College Grad Middle Atlantic Information >=Very Good
  health_ins  logwage      wage
1         No 4.318063  75.04315
2         No 4.255273  70.47602
3        Yes 4.875061 130.98218
4        Yes 5.041393 154.68529
5        Yes 4.318063  75.04315
6        Yes 4.845098 127.11574
CPak
  • 13,260
  • 3
  • 30
  • 48
  • Thanks @Chi Pak. But any other ways to remove 4.,3., or many others wuthout writing too many " mutate_at(3:9, funs(sub("1. ", "", .)))" – Tuyen Aug 27 '17 at 10:28
  • @Tuyen You could also do it like this: `temp %>% mutate_at(3:9, funs(sub("[12]. ", "", .)))` – Jaap Aug 27 '17 at 10:33
  • Hi @ChiPak What does it mean when you put . after " " in temp %>% mutate_at(3:9, funs(sub("[12]. ", "", .))) – Tuyen Aug 27 '17 at 10:44
  • [piping](https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html) – CPak Aug 27 '17 at 10:48