I have some .txt files that I would like to clean up before using. While I'm not new to regexs, I am new to Scala. I've written a short method that is supposed to remove all "\n" newline markups and replace them with spaces, yet when I run my function all of the "\n" are still there. Can anyone catch what I'm doing wrong?
// Scala 2.11.8
import scala.io.Source
import scala.util.matching.Regex
def cleanText(filename: String) {
val pattern = "\\n".r
for(line <- Source.fromFile(filename).getLines())
println(pattern replaceAllIn(line, " "))
//println (line.getClass) //String
}
cleanText("22453117_1.txt")
As you can see I am looping through the lines of the file and asking it to replace "\n" with a space. Here is a snippet from my text file:
['\n Mucormycoses are fungal infections caused by the ancient Mucorales. They are rare, but\nincreasingly reported. Predisposing conditions supporting and favoring mucormycoses in\nhumans and animals include diabetic ketoacidosis', ....
When I println(line)
, with or without the regex replaceAllIn
, I get this same result.
Something I believe that may be getting in the way is whether or not Scala is reading this file as one String to begin with or as many Strings. As you can see, I tried to test this with
println(line.getClass)
which simply returns "String", but I'm still unsure whether I'm dealing with multiple Strings or one big string here. Either way, my regex replaceAllIn()
should work, no? Is there a better way to find out if I am dealing with many Strings or just one? Would that even matter here?
Also if it helps you to know, my coding background is Python and I don't know any Java. Thus I'm finding Scala pretty difficult to pick up, since most tutorials try to explain Scala concepts to me in terms of Java.