0

I have a big html file, i want to parse it in kotlin or java, i'm trying first to to match everything between <body and </body> using a simple regex like

<body(.|\n)+</body>

but surely i'm faced with the stackOverFlow error, here's the code in kotlin

//original html
val file= File("""/home/yazan/Documents/books.xml""")

//empty file
val file2= File("""/home/yazan/Documents/books2.xml""")

val reg="""<body(.|\n)+</body>""".toRegex()
val text= reg.find(file.readText())
text?.value?.let { file2.writeText(it) }

how can I regex large files in a memory-efficient way ?

yazan sayed
  • 777
  • 7
  • 24
  • 2
    See https://stackoverflow.com/questions/677038/how-to-use-regular-expressions-to-parse-html-in-java ... parsing html/xml with regular expressions ... isn't a good idea to begin with. And it doesn't get better when doing that with **large** input strings. – GhostCat Jun 24 '19 at 10:42
  • `""""""` or `"""(?s)"""` can be used to match a string between two strings. Or unroll the pattern as [described here](https://stackoverflow.com/a/38883498/3832970). – Wiktor Stribiżew Jun 24 '19 at 10:43
  • yeah i'm planning to use a dom parser but i was experimenting in regex,, wouldn't regex be faster anyway ? – yazan sayed Jun 24 '19 at 10:46

0 Answers0