I have a CSV fwith several columns: Tweet, date, etc. The spaces in some Tweets is causing blank lines and undesired truncated lines.
What works:
1. Using Notepad++'s function "Line Operations>Remove Empty Lines (Containing Blank Characters)"
2. Search and replace: \r
with nothing.
However, I need to do this for a large number of files, and I can't manage to find a Regular Expression with gsub() in R that will do what the Notepadd++ function does.
Note that replacing ^[ \t]*$\r?\n
with nothing and then \r
with nothing does work in Notepad++, but not in R, as suggested here, but it does not work with g(sub) in R.
I have tried the following code:
tx <- readLines("tweets.csv")
subbed <-gsub(pattern = "^[ \\t]*$\\r?\\n", replace = "", x = tx)
subbed <-gsub(pattern = "\r", replace = "", x = subbed)
writeLines(subbed, "output.csv")
This is the input:
This is the desired output: