-1

I have some .txt files that I would like to clean up before using. While I'm not new to regexs, I am new to Scala. I've written a short method that is supposed to remove all "\n" newline markups and replace them with spaces, yet when I run my function all of the "\n" are still there. Can anyone catch what I'm doing wrong?

// Scala 2.11.8
import scala.io.Source
import scala.util.matching.Regex


def cleanText(filename: String) {
  val pattern = "\\n".r
  for(line <- Source.fromFile(filename).getLines())
    println(pattern replaceAllIn(line, " "))
    //println (line.getClass) //String
}

cleanText("22453117_1.txt")  

As you can see I am looping through the lines of the file and asking it to replace "\n" with a space. Here is a snippet from my text file:

['\n Mucormycoses are fungal infections caused by the ancient Mucorales. They are rare, but\nincreasingly reported. Predisposing conditions supporting and favoring mucormycoses in\nhumans and animals include diabetic ketoacidosis', ....

When I println(line), with or without the regex replaceAllIn, I get this same result.

Something I believe that may be getting in the way is whether or not Scala is reading this file as one String to begin with or as many Strings. As you can see, I tried to test this with

println(line.getClass)

which simply returns "String", but I'm still unsure whether I'm dealing with multiple Strings or one big string here. Either way, my regex replaceAllIn() should work, no? Is there a better way to find out if I am dealing with many Strings or just one? Would that even matter here?

Also if it helps you to know, my coding background is Python and I don't know any Java. Thus I'm finding Scala pretty difficult to pick up, since most tutorials try to explain Scala concepts to me in terms of Java.

ntalbs
  • 28,700
  • 8
  • 66
  • 83
SnarkShark
  • 360
  • 1
  • 7
  • 20
  • 3
    Isn't it obvious that you cannot have a newline inside a line? Read the file in as a single string, and then replace newlines. Or read line by line, and then write them back adding a space after each line. – Wiktor Stribiżew Jun 07 '16 at 20:40
  • Actually, no, it isn't obvious to me or else I wouldn't be asking... I'm unsure if the Source iterator is reading my file in as a single string or as a series of strings, which is why I asked... – SnarkShark Jun 07 '16 at 20:48
  • 1
    Reopening b/c this is about Scala not Java, and I can confirm that "regex" intuitions don't always apply to "lines" style API, see https://github.com/scala/scala/pull/5160. – som-snytt Jun 08 '16 at 03:57

1 Answers1

0

If you want to replace string "\n"(not a newline character, but a string of '\' and 'n'), you should escape backslash, too.

scala> val text = "\\nHello\\nWorld!"
text: String = \nHello\nWorld!

scala> val pattern = "\\n".r
pattern: scala.util.matching.Regex = \n

The above code (which is logically the same as your code) doesn't replace "\n" because the pattern is trying to find \n (newline character).

If you also escape \ in your pattern like in the below code, you can replace the "\n" text in your string.

scala> pattern replaceAllIn(text, " ")
res0: String = \nHello\nWorld!

scala> val pattern2 = "\\\\n".r
pattern2: scala.util.matching.Regex = \\n

scala> pattern2 replaceAllIn(text, " ")
res1: String = " Hello World!"

But if you use replaceAll method, then you don't need to define a pattern separately.

scala> text.replaceAll("\\\\n", " ")
res2: String = " Hello World!"

Or as @som-snytt mentioned, you can use text.replaceAllLiterally, too.

scala> text.replaceAllLiterally("""\n""", " ")
res3: String = " Hello World!"
ntalbs
  • 28,700
  • 8
  • 66
  • 83
  • 1
    The pattern2 worked! Thank you :) ! Just to make sure that I understand properly, pattern = "\\n" doesn't work because it's looking for '\'+'n'' instead of the cohesive newline unit '\n'? I honestly didn't know there was a difference between '\'+'n' and '\n'. – SnarkShark Jun 07 '16 at 20:54
  • 2
    Kudos for figuring out what the OP wanted to do. But the correct answer is `"""\n\n\n""".replaceAllLiterally("""\n""", "X")`. I'm hypersensitized b/c https://github.com/scala/scala/pull/5194/commits/bff824a9f5c27a19cb0454a56a16a27d39beec46 Sanity check, note `"""\n""".length` – som-snytt Jun 08 '16 at 04:00