1

I am trying to convert XML with special characters (Tab) to Json for below xml :

<Request>
 <HEADER>
    <uniqueID>2019111855545921230</uniqueID>
 </HEADER>
 <DETAIL>
<cmnmGrp>
  <coNm>IS XYZ INC.</coNm>
  <embossedNm>ANNA ST       UART</embossedNm>
  <cMNm>ST      UART/ANNA K</cMNm>
  <cmfirstNm>ANNA</cmfirstNm>
  <cmmiddleNm>K</cmmiddleNm>
  <cm2NdLastNm>ST       UART</cm2NdLastNm>
</cmnmGrp>
</DETAIL>
</Request>

I am getting below output in Json :

{
  "Request": {
    "HEADER": { "uniqueID": "2019111855545921230" },
    "DETAIL": {
      "cmnmGrp": {
      "coNm": "IS XYZ INC.",
      "embossedNm": "ANNA ST\t\tUART",
      "cMNm": "ST\t\tUART/ANNA K",
      "cmfirstNm": "ANNA",
      "cmmiddleNm": "K",
      "cm2NdLastNm": "ST\t\tUART"
    }
  }
 }
}

Above response contains special characters. How can I remove \t which is coming for tab spaces. I am using below code for xml to Json conversion :

var xml = @"Input xml";
XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.LoadXml(xml);

        string json = JsonConvert.SerializeXmlNode(xmlDoc, Newtonsoft.Json.Formatting.None);

I am expecting final Json output as below :

{
  "Request": {
    "HEADER": { "uniqueID": "2019111855545921230" },
    "DETAIL": {
      "cmnmGrp": {
      "coNm": "IS XYZ INC.",
      "embossedNm": "ANNA ST        UART",
      "cMNm": "ST       UART/ANNA K",
      "cmfirstNm": "ANNA",
      "cmmiddleNm": "K",
      "cm2NdLastNm": "ST        UART"
    }
  }
 }
}

Can anyone help with this. Thanks.

Amit Sinha
  • 566
  • 7
  • 22
  • 1
    `\t` is how JSON escapes tabs... I see nothing wrong here. See https://stackoverflow.com/a/19176131/2957232 – Broots Waymb Nov 18 '19 at 13:55
  • 4
    It's not clear what you want the end result to be. The tabs entirely removed? It's probably simplest to modify the XML document before you convert it – Jon Skeet Nov 18 '19 at 13:55
  • If you want the tabs back in (which I can't see why) you can do something like `json.Replace("\\t", "\t");` – Steve Nov 18 '19 at 13:56
  • 3
    Please edit the question with your expectations rather than using comments. But fundamentally, having a literal tab in the JSON and having `\t` should be equivalent under all parsers. See https://tools.ietf.org/html/rfc8259#section-7 – Jon Skeet Nov 18 '19 at 14:01

2 Answers2

1

Don't confuse data and representation!

ANNA ST\t\tUART - is a JSON representation of the string "ANNA ST UART".

Do now JSON parsing and you will get a string without \t.

var obj = JObject.Parse(json);
var value = obj["Request"]["DETAIL"]["cmnmGrp"]["embossedNm"];
Console.WriteLine(value); // ANNA ST  UART
Alexander Petrov
  • 13,457
  • 2
  • 20
  • 49
0

\t is not just fixed amount of spaces, it depends on the position from the start of the line and reader's setting of max tab size in spaces (usually 8). If you expect them to appear in JSON like they appear in XML, then you have to read the XML file in text format and programmatically replace tabs with spaces according to their position before converting to JSON. Assuming you know reader's max tab size: it could be 4.

Below are two identical lines with the same "abc\t" value with assumption of max 8 spaces per tab:

<value>abc      </value>
   <value>abc   </value>

Generally, keeping tabs is correct, although it doesn't work for you.

JSON spec defines tabs as two characters \t, and your snapshot is correct. If you retrieve a value containing \t, they should be replaced by tab characters by JSON parser. Depends on what you need; if you don't mind the initial tab positions in XML file, you may be OK already.