I have a large dataset with participants from all over the world. Some of these participants entered data using dots/periods/commas to indicate the thousand separators, but R reads them as commas which totally skews my data... e.g. 1234 become 1,234.
I want to remove all dots/periods/commas. My data is entirely composed of full numbers so there shouldn't be any decimals anywhere.
I tried using stringr, but can't quite figure out. Here is a (I hope) reproducible example with a small sample of my data:
structure(
list(
chnb = c(10L, 35L, 55L),
B1_1_77 = c(117.586,
4022, 4.921),
C1_1_88 = c(NA, 2206, 1.111),
C1_1_99 = c(6.172,
1884, 0),
C1_3_99 = c(5.62, 129, 0)
),
row.names = c(NA,-3L),
class = c("tbl_df",
"tbl", "data.frame")
)
I tried this:
prob1 <- prob %>% str_replace_all('\\.', '')
which gives me this:
> prob
[1] "c(10, 35, 55)" "c(117586, 4022, 4921)" "c(NA, 2206, 1111)"
[4] "c(6172, 1884, 0)" "c(562, 129, 0)"
It did indeed remove the dots but it gives me a simple list, and totally lost my data structure. An online search suggested I did this:
prob1 <- prob %>% mutate_all(list(str_replace(., '\\.', '')))
but I get an error message:
Error:
.fn
must be a length 1 string Callrlang::last_error()
to see a backtrace In addition: Warning message: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : argument is not an atomic vector; coercing
Am I approaching the whole thing wrong? Any help would be greatly appreciated. I hope my question is clear enough, my apologies if it isn't (I'm new to this).