I have a CSV datafile called test_20171122
Often, datasets that I work with were originally in Accounting or Currency format in Excel and later converted to a CSV file.
I am looking into the optimal way to clean data from an accounting format "$##,###" to a number "####" in R using gsub().
My trouble is in the iteration of gsub() across all columns of a dataset. My first instinct run gsub() on the whole dataframe (below) but it seems to alter the data in a counterproductive way.
gsub("\\$", "", test_20171122)
The following code is a for loop that seems to get the job done.
for (i in 1:length(test_20171122)){
clean1 <- gsub("\\$","",test_20171122[[1]])
clean2 <- gsub("\\,","",clean1)
test_20171122[,i] <- clean2
i = i + 1
}
I am trying to figure out the optimal way of cleaning a dataframe using gsub(). I feel like sapply() would work but it seems to break the structure of the dataframe when I run the following code:
test_20171122 <- sapply(test_20171122,function(x) gsub("\\$","",x))
test_20171122 <- sapply(test_20171122,function(x) gsub("\\,","",x))