I'm trying to write a script to build a few tables for my Unicode library, one of the tables I need to build is of a list of all of the numeric codepoints in the Unicode standard, with their values.
To do that, I'm using xmllint
in a shell script, I'm stuck on building up the list of codepoints and their values because my xpath query isn't working.
I've tried customizing a bunch of xpath query strings I've seen here at stackoverflow on a bunch of other questions.
here's the current query string I'm trying to use: ucd/repertoire/char/@nv[.!='NaN']
and xmllint --xpath /*[local-name()='ucd']/*[local-name()='repertoire']/*[local-name()='char']/@nv
Here's an example codepoint so you can see it's layout.
<ucd xmlns="http://www.unicode.org/ns/2003/ucd/1.0">
<description>Unicode 10.0.0</description>
<repertoire>
<char cp="0000" age="1.1" na="" JSN="" gc="Cc" ccc="0" dt="none" dm="#" nt="None" nv="NaN" bc="BN" bpt="n" bpb="#" Bidi_M="N" bmg="" suc="#" slc="#" stc="#" uc="#" lc="#" tc="#" scf="#" cf="#" jt="U" jg="No_Joining_Group" ea="N" lb="CM" sc="Zyyy" scx="Zyyy" Dash="N" WSpace="N" Hyphen="N" QMark="N" Radical="N" Ideo="N" UIdeo="N" IDSB="N" IDST="N" hst="NA" DI="N" ODI="N" Alpha="N" OAlpha="N" Upper="N" OUpper="N" Lower="N" OLower="N" Math="N" OMath="N" Hex="N" AHex="N" NChar="N" VS="N" Bidi_C="N" Join_C="N" Gr_Base="N" Gr_Ext="N" OGr_Ext="N" Gr_Link="N" STerm="N" Ext="N" Term="N" Dia="N" Dep="N" IDS="N" OIDS="N" XIDS="N" IDC="N" OIDC="N" XIDC="N" SD="N" LOE="N" Pat_WS="N" Pat_Syn="N" GCB="CN" WB="XX" SB="XX" CE="N" Comp_Ex="N" NFC_QC="Y" NFD_QC="Y" NFKC_QC="Y" NFKD_QC="Y" XO_NFC="N" XO_NFD="N" XO_NFKC="N" XO_NFKD="N" FC_NFKC="#" CI="N" Cased="N" CWCF="N" CWCM="N" CWKCF="N" CWL="N" CWT="N" CWU="N" NFKC_CF="#" InSC="Other" InPC="NA" PCM="N" vo="R" RI="N" blk="ASCII" isc="" na1="NULL">
<name-alias alias="NUL" type="abbreviation"/>
<name-alias alias="NULL" type="control"/>
</char>
</repertoire>
</ucd>
I'm trying to check each char
to see if it's nv
attribute equals a valid number by telling it to ignore "NaN", if it's attribute is anything but NaN, I assume it's valid, grab the cp
value, and it's nv
value and put them into a C table, but I haven't really gotten to the higher level scripting parts yet, I'm stuck on the xpath portion.
So, where am I going wrong? I've tried all kinds of different versions that search for the structured version (by that I mean //ucd/repertoire/char
, ucd/repertoire/char
, just the //char@nv
, and a bunch of other versions I don't even remember).