The function you have transforms a sparse matrix to a full character matrix. If you have a large document term matrix this will result in long running times and a good chance of getting a memory error. Replacing values in a sparse matrix can be done quickly if you make use of how the matrix is built. A sparse matrix values are stored in the v
(values) part of the matrix. See ?slam::simple_triplet_matrix
.
Using any of the apply family on a sparse matrix, without using functions that are designed to work with a sparse matrix will turn it into a normal (dense) matrix. With accordingly long run times and memory issues.
To change all values different from 0 in your case, just use the following:
data_dtm$v[data_dtm$v > 0] <- 1
inspect(data_dtm) # show first 10 columns and rows
This replaces all the values to 1 and keeps the data as a document term matrix (aka nice and sparse).
Depending on your follow up data analysis you really should make use of sparse matrix functions. If you want to transform a large document term matrix into a data.frame or data.table you have a good chance of running out of memory.
For any follow up questions, please include a reproducible example and an expected output.