Address <- c("#20 W Irving ST","@1 East Street",
"%222 Rockfard Avenue","-145 W Locust","& 99 East Locus")
Number <- c("A-1","A-2","A-3","A-4","A-5")
DF <- data.frame(Address,Number)

- 91,185
- 12
- 128
- 188

- 23
- 7
-
2And what have you tried first? – thelatemail May 26 '16 at 22:20
-
1Is the first element of each string a special character? If so, perhaps just delete the first element, i.e., delete the first 'space'. – Mark Miller May 26 '16 at 22:22
-
@MarkMiller the data set has around 70,000 records and I need to match the address column for destination and origin, but the data has some bad entries like the one I mentioned above which starts with special character, so I was trying to get rid of special characters only which are present at the start of address. – KGarg May 27 '16 at 00:37
-
Are you saying some of the addresses have a special character at the beginning and some of the addresses do not? – Mark Miller May 27 '16 at 00:43
-
1`gsub("^[[:punct:][:space:]]+","",DF$Address)` – thelatemail May 27 '16 at 00:52
-
@MarkMillerYes, that's right. – KGarg May 27 '16 at 00:54
-
@thelatemail I'll try that, Thanks! – KGarg May 27 '16 at 00:55
-
@thelatemail It worked, Thank you very much – KGarg May 27 '16 at 01:10
2 Answers
Just remove any repeated punctuation or space characters immediately following the start of the string. In regex speak:
gsub("^[[:punct:][:space:]]+","",DF$Address)
#[1] "20 W Irving ST" "1 East Street" "222 Rockfard Avenue" "145 W Locust"
#[5] "99 East Locus"

- 91,185
- 12
- 128
- 188
Will this do what you want? This assumes the first element of every Address
is a special character. Note also that for this code to work, the left-hand end of my.data$Address
must be flush with the left edge of the R GUI. There cannot be any empty characters at the start of Address
.
my.data <- read.csv(text = '
Address, Number
#20 W Irving ST, A-1
@1 East Street, A-2
%222 Rockfard Avenue, A-3
-145 W Locust, A-4
& 99 East Locus, A-5
', header = TRUE, stringsAsFactors = FALSE, na.string = 'NA')
my.data
my.data$Address <- substr(my.data$Address, 2, nchar(my.data$Address))
my.data
If the special characters can occur anywhere in Address
and you want to remove all of the special characters you can try one of the functions presented here:
Replace multiple arguments with gsub
I used the function written by Theodore Lytras with this line:
mgsub(c('#','@','%','-','&'), c('','','','',''), my.data$Address)
Note that with both approaches the address 99 East Locus
now begins with an empty space.
If some of the addresses have a special character in their first element and some of the addresses do not, this might work:
my.data <- read.csv(text = '
Address, Number
#20 W Irving ST, A-1
@1 East Street, A-2
222 W Locust, A-4
%222 Rockfard Avenue, A-3
-145 W Locust, A-4
5 East Street, A-2
& 99 East Locus, A-5
', header = TRUE, stringsAsFactors = FALSE, na.string = 'NA')
first.char <- substr(my.data$Address, 1, 1)
my.data$Address <- ifelse(first.char %in% c('#','@','%','-','&'), substr(my.data$Address, 2, nchar(my.data$Address)), my.data$Address)
my.data

- 1
- 1

- 12,483
- 23
- 78
- 132
-
Miler I tried but it is having the same output as input, its not getting rid of the special characters – KGarg May 27 '16 at 00:57
-
All of the examples I presented are working on my computer. I do not know what the problem is. Perhaps put your code in your post and I can take a look. – Mark Miller May 27 '16 at 01:01
-