-2

How to get from this: "{\n \"DNAHeader\": {},\n \"ItemsSaleable\": []\n}\n"

this: "\"DNAHeader\":{},\"ItemsSaleable\":[]"

I have for initiator this regex:

"<OWSP>{<OWSP>"

for terminator this:

"<OWSP>}<OWSP>"

where <OWSP> is optional white space, the same as in Swift regex \s* is.

I convert them to the Swift equivalent:

if let group = groupOrItem as? Group,
   let initiator = group.typeSyntax?.initiator?.literal.literalValue?.replacingOccurrences(of: "<OWSP>", with: "\\s*"),
   let terminator = group.typeSyntax?.terminator?.literal.literalValue?.replacingOccurrences(of: "<OWSP>", with: "\\s*")
{
    let captureString = "(.*?)"
    let regexString = initiator + captureString + terminator
    let regexPattern = "#" + regexString + "#"

Then regex pattern looks like this:

(lldb) po regexString
"\\s*{\\s*(.*?)\\s*}\\s*"

Question, how to apply it, how to cut off meaningful inner text? I tried this,

 var childText = text.replacingOccurrences(of: regexPattern, with: "$1", options: .regularExpression).filter { !$0.isWhitespace }

but does not remove the initiator / terminator texts, like the { and } parts from here:

(lldb) po text
"{\n    \"DNAHeader\": {},\n    \"ItemsSaleable\": []\n}\n"

(lldb) po childText
"{\"DNAHeader\":{},\"ItemsSaleable\":[]}"
János
  • 32,867
  • 38
  • 193
  • 353
  • See https://www.advancedswift.com/regex-capture-groups/ and SO https://stackoverflow.com/questions/42789953/swift-3-how-do-i-extract-captured-groups-in-regular-expressions – Wiktor Stribiżew Feb 17 '23 at 12:23
  • Actually, the regex won't work since the `{}` after `"DNAHeader":` will make it stop there. Regex does not work with JSON. – Wiktor Stribiżew Feb 17 '23 at 12:28
  • You want `"\"DNAHeader\":{},\"ItemsSaleable\":[]"` instead of having real JSON `"{\n \"DNAHeader\": {},\n \"ItemsSaleable\": []\n}\n"`? Are you sure about that? I'm guessing you have more or less "TAG-JSON-TAG-JSON-TAG-JSON-TAG-JSON-TAG", in a stream way, where the beginning of the sring coulld be partial tag, or partial JSON, same for the end?, and you want to get all the intermediary JSON? – Larme Feb 17 '23 at 12:32
  • Please forget about json, this example is json, yes, but it should work for any arbitrary text, only the initiator and terminator pattern is fix. – János Feb 17 '23 at 12:35
  • @WiktorStribiżew, it stop working because the `?` in `(.*?)` set a non-gready flag? – János Feb 17 '23 at 12:37
  • But is it a "Stream", as I described it? I had done that a long time ago, for a MJPEG stream, which is more or less "PartialJSEG-JPEG-JPEG-JPEG-PartialJPEG", etc. I've seen https://gist.github.com/cybrox/96a487fad05def624c6fcbf57578cb65 which seems to do the job, where you could get inspiration to do so. – Larme Feb 17 '23 at 12:40
  • 2
    What worries me is not the start and end, but the removal of white space around the commas in between the key/value pairs of the dictionary. What if you have a JSON value that is a string which has a comma inside the string? At first glance, regex seems so appealing, but it really is not well suited for complex parsing tasks. I'd suggest using a proper parser, e.g. `JSONDecoder` or `JSONSerialization`, and then output the results however you want from there. Perhaps describe the broader challenge you are trying to solve, rather than focusing on regex. – Rob Feb 17 '23 at 12:43
  • Then it must be `let regexString = initiator + "((?:(?!(?:" + initiator + "|" + terminator + "))(?s:.))*)" + terminator`. And you will still need the capturing code. Note that the problem with nested subpatterns will still bug you. Regex won't help here unless you can be more specific with your pattern requirements. – Wiktor Stribiżew Feb 17 '23 at 12:44
  • Can you use `RegexBuilder` (available from iOS 16 and macOS 13)? It provides a more declarative type of regex and helps you manage these cases. – HunterLion Feb 17 '23 at 13:36
  • is this the same question as your previous one at: https://stackoverflow.com/questions/75470319/how-to-find-relevant-text-between-and-in-swift or an additional one? Or is it the same question you deleted, where the answer mentioned `replacingOccurrences`? – workingdog support Ukraine Feb 17 '23 at 14:04

2 Answers2

0

As said in comments, you currently have JSON (but you say to not focus on it, but...), which make the Regex construction quite hard.

As I suspect a Stream, I create fake tests values for that, but it's not necessary the case:

let startSeparator = "<OWS>{<OWS>"
let endSeparator = "<OWS>}<OWS>"

//Fake structure
struct Object: Codable {
    let id: Int
    let NDAHeader: Header
    let ItemsSaleable: [Saleable]
}
struct Header: Codable {}
struct Saleable: Codable {}

let encoder = JSONEncoder()
encoder.outputFormatting = .prettyPrinted
let str0 = embedStr(codable: Object(id: 0, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str1 = embedStr(codable: Object(id: 1, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str2 = embedStr(codable: Object(id: 2, NDAHeader: Header(), ItemsSaleable: []), with: encoder)
let str3 = embedStr(codable: Object(id: 3, NDAHeader: Header(), ItemsSaleable: []), with: encoder)

//Replace starting `{` & closing `}` of JSON with surroundnig <OWS>
func embedStr(codable: Codable, with encoder: JSONEncoder) -> String {
    let jsonData = try! encoder.encode(codable)
    var value = String(data: jsonData, encoding: .utf8)!
    value = startSeparator + String(value.dropFirst())
    value = String(value.dropLast()) + endSeparator
    return value
}

//Create a fake stream, by joining multiple JSON values, and "cut it"
func concate(strs: [String], dropStart: Int, dropEnd: Int) -> String {
    var value = strs.joined()
    value = String(value.dropFirst(dropStart))
    value = String(value.dropLast(dropEnd))
    return value
}

//Fake Streams
let concate0 = concate(strs: [str0], dropStart: 0, dropEnd: 0)
let concate1 = concate(strs: [str0, str1, str2], dropStart: 13, dropEnd: 13)
let concate2 = concate(strs: [str0, str1, str2, str3], dropStart: 20, dropEnd: 13)

The "extract/find" code:

//Here, if it's a stream, you could return the rest of `value`, because it might be the start of a message, and to concatenate with the next part of the stream
//Side note, if it's a `Data`, `range(of:range:)` can be called on `Data`, avoiding you a strinigification if possible (like going back to JSON to remove the pretty printed format)
func analyze(str: String, found: ((String) -> Void)) {
    var value = str
    var start = value.range(of: startSeparator)

    //Better coding might be applied, it's more a proof of concept, but you should be able to grasp the logic:
    // Search for START to next END, return that captured part with closure `found`
    // Keep searching for the rest of the string.
    guard start != nil else { return }
    var end = value.range(of: endSeparator, range: start!.upperBound..<value.endIndex)

    while (start != nil && end != nil) {
        let sub = value[start!.upperBound..<end!.lowerBound]
        found("{" + String(sub) + "}") //Here is hard encoded the part surrounded by <OWS> tag
        value = String(value[end!.upperBound...])
        start = value.range(of: startSeparator)
        if start != nil {
            end = value.range(of: endSeparator, range: start!.upperBound..<value.endIndex)
        } else {
            end = nil
        }
    }
}

To test:

func test(str: String) {
    print("In \(str.debugDescription)")
    analyze(str: str) { match in
        print("Found \(match.debugDescription)")

        //The next part isn't beautiful, but it's one of the safest way to get rid of spaces/\n which are part of the pretty printed
        let withouthPrettyPrintedData = try! (JSONSerialization.data(withJSONObject: try! JSONSerialization.jsonObject(with: Data(match.utf8))))
        print("Cleaned: \(String(data: withouthPrettyPrintedData, encoding: .utf8)!.debugDescription)")
    }
    print("")
}

//Test the fake streams
[concate0, concate1, concate2].forEach {
    test(str: $0)
}

I used debugDescription in order to see in console the "\n".

Output:

$>In "<OWS>{<OWS>\n  \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS>"
$>Found "{\n  \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":0,\"ItemsSaleable\":[],\"NDAHeader\":{}}"

$>In " \"id\" : 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  "
$>Found "{\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":1,\"ItemsSaleable\":[],\"NDAHeader\":{}}"

$>In " 0,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n<OWS>}<OWS><OWS>{<OWS>\n  \"id\" : 3,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  "
$>Found "{\n  \"id\" : 1,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":1,\"ItemsSaleable\":[],\"NDAHeader\":{}}"
$>Found "{\n  \"id\" : 2,\n  \"ItemsSaleable\" : [\n\n  ],\n  \"NDAHeader\" : {\n\n  }\n}"
$>Cleaned: "{\"id\":2,\"ItemsSaleable\":[],\"NDAHeader\":{}}"
Larme
  • 24,190
  • 6
  • 51
  • 81
0

If you can use RegexBuilder, available from iOS 16 and macOS 13, the following code works (see function extracted()):

import RegexBuilder
import SwiftUI


struct MyView: View {
    let text = "{\n    \"DNAHeader\": {},\n    \"ItemsSaleable\": []\n}\n"
    
    var body: some View {
        VStack {
            Text(text)
            Text(extracted(text) ?? "Not found")
        }
    }
    
    private func extracted(_ text: String) -> String? {
        
        let initiator = Regex {
            ZeroOrMore(.horizontalWhitespace)
            "{"
            ZeroOrMore(.horizontalWhitespace)
        }

        let terminator = Regex {
            ZeroOrMore(.horizontalWhitespace)
            "}"
            ZeroOrMore(.horizontalWhitespace)
        }

        // Read the text and capture only what's in between the
        // initiator and terminator
        let searchJSON = Regex {
            initiator
            Capture { OneOrMore(.any) }     // The real contents
            terminator
        }

        // Extract the whole string and the extracted string
        if let match = text.firstMatch(of: searchJSON) {
            let (wholeMatch, extractedString) = match.output
            
            print(wholeMatch)
            
            // Replace whitespaces and line feeds before returning
            return String(extractedString
                .replacing("\n", with: "")
                .replacing(" ", with: ""))
            
        } else {
            return nil
        }

    }
}

enter image description here

HunterLion
  • 3,496
  • 1
  • 6
  • 18