-1

I have been working on some data cleaning program in R and I have run into a problem. I am trying to replace the special characters like "@" with their character counterparts "at".

I have tried sub, gsub and setNames and even replace. All of these produce the same result: it just gives me a ton of NAs in my data. I have a sample of what my data looks like just for reference.

enter image description here

Just imagine that I cannot see where all of the @ signs are, I want to search the entire data set and replace all of them. My actual dataset has 50 columns so going by column won't work.

################## EDIT ##################

aa <- read.csv("C:/Users/Zander Kent/Documents/Data Cleaning/sample dataset.csv", header = T, na.strings=c("", " ","NA"))
aaa <- data.frame(aa)

abc <- as.data.frame(apply(aaa, 2, function(x) gsub(" @ ", "at", x)))
write.csv(abc, file="C:/Users/Zander Kent/Documents/Data Cleaning/clean_2.csv")

link to data in google.drive Sample Data

I tried one of the answers and it worked on a very small data set 10x10 but when I tried it on my entire dataset it didn't do anything. all of the special characters were still there. There were no error messages the code ran through without any problems.

Zander
  • 65
  • 1
  • 9
  • 2
    This would be a great question if you made a reproducible example with code to produce the data frame and show what the desired result should look like. See [How to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Rich Scriven Apr 21 '16 at 01:54
  • The example data frame, not all 50 columns :) Also, what terms exactly should be changed? Do you have a list or is it just `@` and `-`? – Rich Scriven Apr 21 '16 at 02:01
  • We don't have access to your hard drive. Still not reproducible – Rich Scriven Apr 21 '16 at 03:02
  • I don't have access to your drive, so I can't load your document and see what happened. "didn't do anything" is quite unspecific... Why don't you try to better explain any error or warning messages, the structure of your data or just anything else we can use to help you? – PavoDive Apr 21 '16 at 03:19
  • When I ran the gsub function it completed with no errors but when I saved the data and loaded it in excel, the "@" were still there. I am trying to figure our how to link a document so you guys will be able to download it. @PavoDive – Zander Apr 21 '16 at 03:23
  • seems quite normal to me. In the code you posted you didn't save the result of the `apply` function to any variable. So, if you pushed the `aaa` (or even the `aa`) variables to excel, they won't have the replacement, should they? – PavoDive Apr 21 '16 at 03:29

1 Answers1

1

Lacking a reproducible example...

Let's create a vector with the undesired symbol:

a <- data.frame(x = c("1", "a", "3@"), y = c("5@", "2", "b"))

Now we can use gsub:

as.data.frame(apply(a, 2, function(x) gsub("@", "at", x)))

and obtain:

##    x   y
## 1   1 5at
## 2   a   2
## 3 3at   b

####### EDIT #####

If you want to replace "-" with "dash", then there is a nice function in the qdap package. Let's re-create the vector with the two bad guys:

a <- data.frame(x = c("1-", "a", "3@"), y = c("5@", "2", "b-"))

Then we do:

require(qdap)
as.data.frame(apply(a, 2, function(x) multigsub(c("@", "-"), 
                                                c("at", "dash"), 
                                                x))

####### EDIT 2 #######

This works, and is pretty big:

x <- sample(LETTERS, 1e6, TRUE)
y <- sample(c("", "", "", "@", "-"), 1e6, TRUE)
a <- data.frame(x, y)
b <- apply(a, 1, function(x) paste(x, collapse = ""))

df <- as.data.frame(matrix(b, ncol=50))
df[1:4, 1:10]
  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  T V-  H  L  T K@  M T@ M-   I
2  G  W F@ K@ W@ T@  R X-  G  G-
3  R E@  V O@  R  D  L  L C-   B
4  T G@  J  U  X H@  Q  Q  T  Z@

df2 <- apply(df, 2, function(x) multigsub(c("@", "-"), c("at", "dash"), x))

grep("-", df2)
integer(0)

grep("@", df2)
integer(0)

df2[1:4, 1:10]
     V1  V2      V3    V4    V5    V6    V7  V8      V9      V10    
[1,] "T" "Vdash" "H"   "L"   "T"   "Kat" "M" "Tat"   "Mdash" "I"    
[2,] "G" "W"     "Fat" "Kat" "Wat" "Tat" "R" "Xdash" "G"     "Gdash"
[3,] "R" "Eat"   "V"   "Oat" "R"   "D"   "L" "L"     "Cdash" "B"    
[4,] "T" "Gat"   "J"   "U"   "X"   "Hat" "Q" "Q"     "T"     "Zat" 
PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • If more that two `c("@", "-")` need to be changed, both the `pattern` and `replacement` vectors in `multigsub` could be as large as desired. – PavoDive Apr 21 '16 at 02:09