-1

I have an xml of a Charles proxy recording from which I am trying to find and extract couple of values by using XPath but not quite sure to extract them without hardcoding the xpath. Below is an xml of a single transaction

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE charles-session SYSTEM "https://www.charlesproxy.com/dtd/charles-session-1_2.dtd">

<charles-session>
<transaction status="COMPLETE" method="POST" protocolVersion="HTTP/1.1" protocol="https" host="awcmex.ssdevrd.com" actualPort="443" path="/awcm" remoteAddress="aw.ss.com/12.70.19.19" clientAddress="/127.0.0.1" clientPort="50581" startTime="2020-02-09T12:01:23.043+05:30" startTimeMillis="1581229883043" requestBeginTime="2020-02-09T12:01:26.417+05:30" requestBeginTimeMillis="1581229886417" requestTime="2020-02-09T12:01:26.422+05:30" requestTimeMillis="1581229886422" responseTime="2020-02-09T12:02:59.516+05:30" responseTimeMillis="1581229979516" endTime="2020-02-09T12:02:59.517+05:30" endTimeMillis="1581229979517" duration="96084" dnsDuration="30" connectDuration="704" sslDuration="2250" requestDuration="5" responseDuration="1" latency="93094" overallSpeed="251" requestSpeed="376600" responseSpeed="22323000" totalSize="24206">
<ssl protocol="TLSv1.2" cipherSuite="TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384" />
<request handshake="1334" headers="324" body="225" mime-type="application/json">
<headers>
<first-line><![CDATA[POST /awcm HTTP/1.1]]></first-line>
<header>
<name>Host</name>
<value>aw.ss.com:443</value></header>
<header>
<name>Content-Type</name>
<value>application/json</value></header>
<header>
<name>User-Agent</name>
<value>awc/1.0.0 CFNetwork/978.2 Darwin/18.7.0 (x86_64)</value></header>
<header>
<name>Connection</name>
<value>keep-alive</value></header>
<header>
<name>Accept</name>
<value>*/*</value></header>
<header>
<name>Accept-Language</name>
<value>en-us</value></header>
<header>
<name>Content-Length</name>
<value>225</value></header>
<header>
<name>Accept-Encoding</name>
<value>br, gzip, deflate</value></header>
<header>
<name>awsession</name>
<value>8594797088EBFD8EC21948508AC910F6F06FB2DF</value></header></headers>
<body><![CDATA[{
  "messageid" : "WSzJUUoYsr9YcP2mYuVP",
  "destinationuuid" : "",
  "durable" : "true",
  "priority" : "0",
  "life" : "0",
  "originapplicationid" : "",
  "originuuid" : "9AEC3865782F5849B21B34C0516F515E",
  "type" : "1"
}]]></body></request>
<response status="200" handshake="22308" headers="0" body="15" mime-type="text/plain" charset="UTF-8">
<headers>
<first-line><![CDATA[HTTP/1.1 200 OK]]></first-line>
<header>
<name>Content-Type</name>
<value>text/plain; charset=UTF-8</value></header>
<header>
<name>Content-Length</name>
<value>15</value></header>
<header>
<name>aw-host</name>
<value>AWC201 [ AWC201.AWSS.DEV ]</value></header>
<header>
<name>Connection</name>
<value>keep-alive</value></header></headers>
<body><![CDATA[{"messages":[]}]]></body></response></transaction></charles-session>

Basically I want to parse and get the number of transactions in a session, the User-Agent in each transaction and it's value. I tried PROCESS1=$(xmllint --xpath 'string(/charles-session/transaction[1]/request/headers/header[1])' $XML_FILE ) But, I am not able to fetch the two in each transaction without hardcoding header[]. Any suggestions?

macrbu
  • 1
  • 1

1 Answers1

0

You can use this to group and count the user agent headers:

  xmllint --xpath '//request/headers/header[name/text()[contains(.,"User-Agent")]]/value/text()' sample.txt | sort | uniq -c

The result would be something like this:

  5 awc/1.0.0 CFNetwork/978.2 Darwin/18.7.0 (x86_64)

Note: you need a fairly recent version of xmllint. There is a issue with xmllint, old versions don't add newlines to each node output - see How to append a newline after every match using xmlint --xpath for a possible solution.

Sorin
  • 5,201
  • 2
  • 18
  • 45