1

I just try to use re2 to replace a regex in a file, the test was passed for a simple string.

# module Re2 = Re2.Std.Re2;;
# let re = Re2.create_exn "<key>Tags.*<\\/array>" ;;
# let orig =  "abc <key>Tags</key><array><string>OCaml</string></array> end";;
# Re2.replace_exn ~f:(fun _ -> "<key>Tags</key><array/>") re orig;;
- : string = "abc <key>Tags</key><array/> end"

However, when I put the contents into file as ss.xml:

<key>Starred</key>
<false/>
<key>Tags</key>
<array>
    <string>Think</string>
    <string>Performance Test</string>
    <string>Racket</string>
    <string>OCaml</string>
</array>
<key>Time Zone</key>
<string>Asia/Shanghai</string>

The OCaml source code:

open Core.Std
open Async.Std

module Re2 = Re2.Std.Re2

let trans_reg (input: string) : string =
  let re = Re2.create_exn "<key>Tags.*<\\/array>" in
  let target = "<key>Tags</key><array/>" in
  Re2.replace_exn ~f:(fun _ -> target) re input

let handle_file (filename: string) =
  let%bind text = Reader.file_contents filename in
  Writer.save (filename ^ ".xml") ~contents:(trans_reg text)

let () =
  Command.(run (async ~summary:"" Spec.empty (fun _ -> handle_file "ss.xml")))

Nothing's gonna change in my new file ss.xml.xml.

I was wondering:

  1. How to regex match in this case.
  2. When shall we use the parameter of replaceMatch.t in ~f:(Match.t -> string)? ()
alwaysday1
  • 1,683
  • 5
  • 20
  • 36

2 Answers2

2

Re2 has an option dot_nl which controls whether . will match \n. By default, dot_nl is false. You can set it true either using the flag syntax (?s)<key>Tag.*<\\/array> as documented here or in OCaml by calling

Re2.create ~options:[ `Dot_nl true ]

I do not believe the m flag is relevant here because m controls the interpretation of ^ and $. Your pattern does not use ^ or $.

Also, obligatory warning: You cannot parse XML with regex

Community
  • 1
  • 1
Wang
  • 36
  • 1
  • Thanks, you helped me a lot. I just want to modify the data of DayOne journal, the xml format is simple, regex works in this case. – alwaysday1 Mar 29 '17 at 02:28
0

Re2 matches your regular expression line-by-line by default, that's why your expression never matches.

The documention mentions a m flag for multiline matching. I don't know if JaneStreet's binding for re2 lets you set such a flag, but this is definitely what you're looking for here.

I'll keep a eye out for and I'll update my answer shoud I find something useful.

Richard-Degenne
  • 2,892
  • 2
  • 26
  • 43