0

I am trying to write an XQuery on the dataset given by https://clinicaltrials.gov/ct2/resources/download .

The query is supposed to do the following. Find the clinical studies of which the brief title contains a given phrase, from the results get those who have intervention type Drug, group them and keep the 10 most used drugs.

I am extremely new to XQuery and so far I have managed to write the following:

/clinical_study/brief_title[contains(.,'Heart')] 

I know that I should utilize the for loop in some way but I can't figure out how.

  • You need to provide some clarifications. (1) **"...group them..."**, by what? (2) **"...keep the 10 most used drugs..."**, what XML element signifies most used drugs? – Yitzhak Khabinsky Jul 23 '20 at 01:34
  • Grouping is usually done by the `group by` clause in a FLOWR expression https://www.w3.org/TR/xquery-31/#id-group-by. – Martin Honnen Jul 23 '20 at 12:37

1 Answers1

1

I am using BaseX XML database.

I downloaded XML files by using the following URL: https://clinicaltrials.gov/ct2/download_studies?term=heart

It provided me with a search_result.zip file. I unzipped all 10,000 XML files into the following directory: e:\Temp\clinicaltrials.gov

After that I created a new BaseX database and named it 'clinicaltrials'. You can check it here: Reading Multiple XML Files with BaseX

While waiting for the clarifications on grouping, here is a relevant XQuery as a starting point. It is applying your first two requirements: "...Find the clinical studies of which the brief title contains a given phrase, from the results get those who have intervention type Drug..." via where clause.

xquery version "3.1";
declare option output:omit-xml-declaration "no";
declare option output:encoding "UTF-8";

for $doc in collection('clinicaltrials')
let $t := $doc/clinical_study/brief_title
where $doc/clinical_study/brief_title[contains(.,'Heart')]
  and $doc/clinical_study/intervention/intervention_type = 'Drug'
return $t

XQuery with grouping and ranking (top 10)

xquery version "3.1";
declare option output:omit-xml-declaration "no";
declare option output:encoding "UTF-8";

<root>
{
  for $drug in collection('clinicaltrials')/clinical_study[brief_title[contains(.,'Heart')]]/intervention[intervention_type eq 'Drug']
  let $drugName := fn:lower-case($drug/intervention_name)
  group by $drugName
  order by count($drug) descending, $drugName
  count $rank
  where $rank <= 10
  return <drug>
      <drug_name>{$drugName}</drug_name>
      <counter>{count($drug)}</counter>
    </drug>
}
</root>

Output

    <?xml version="1.0" encoding="UTF-8"?>
    <root>
      <drug>
        <drug_name>placebo</drug_name>
        <counter>592</counter>
      </drug>
      <drug>
        <drug_name>ivabradine</drug_name>
        <counter>60</counter>
      </drug>
      <drug>
        <drug_name>nesiritide</drug_name>
        <counter>46</counter>
      </drug>
      <drug>
        <drug_name>furosemide</drug_name>
        <counter>40</counter>
      </drug>
      <drug>
      ...
   </root>
Yitzhak Khabinsky
  • 18,471
  • 2
  • 15
  • 21
  • Yes after some thinking I ended up using BaseX to insert the XML files. I understand I was not very clear about grouping. The grouping I suppose would be to group them by the number of results so we can get the 10 most used drugs. For example if we have 10 clinical studies using Drug X1. Then group those studies and get the number 10. Do the same for drugs X2,X3,....,Xn and keep the 10 most used –  Jul 23 '20 at 11:42
  • Ps. Sorry for the delay of my answer, I posted the question at like 4.30am local hour and fell asleep. –  Jul 23 '20 at 11:47
  • @thriteenchess, I updated the answer with grouping and count by the drug name. – Yitzhak Khabinsky Jul 23 '20 at 19:03