1

I'm trying to convert data from RDF sources into the dictionary format expected by the @note2 biomedical text mining application

Specifically, I'm trying to collapse all synonyms for a concept onto one line

@note2 uses dictionaries in this format

+----------+-----------+---------------+-------------------------+
|  class   |   term    |   synonyms    |      external IDs       |
+----------+-----------+---------------+-------------------------+
| food     | bread     | pan|brot      | source1;idA             |
| nutrient | vitamin C | ascorbic acid | source1;idC|source2;idD |
+----------+-----------+---------------+-------------------------+

I can get one synonym per line with a query like this at bioportal

SELECT ?term ?syn ?extid
FROM <http://bioportal.bioontology.org/ontologies/BTO>
WHERE
{
  ?extid <http://bioportal.bioontology.org/metadata/def/prefLabel> ?term .
  ?extid <http://www.geneontology.org/formats/oboInOWL#hasRelatedSynonym> ?syn .
}

Returning something like this:

+-------------------------+-------------------------+-----------------+
|          term           |           syn           |      extid      |
+-------------------------+-------------------------+-----------------+
| "stomach smooth muscle" | "gastric muscle"        | bto:BTO_0001818 |
| "stomach smooth muscle" | "gastric smooth muscle" | bto:BTO_0001818 |
| "stomach smooth muscle" | "stomach muscle"        | bto:BTO_0001818 |
+-------------------------+-------------------------+-----------------+

So... is it possible, WITHIN SPARQL, to concatenate the synonyms and end up with something like

+-----------------------+----------------------------------------------------------------------------+-----------------+
|         term          |                                    syn                                     |      extid      |
+-----------------------+----------------------------------------------------------------------------+-----------------+
| stomach smooth muscle | gastric muscle|gastric smooth muscle|stomach smooth muscle |stomach muscle | bto:BTO_0001818 |
+-----------------------+----------------------------------------------------------------------------+-----------------+

I'll be using virtuoso open source, if it makes any difference.

Community
  • 1
  • 1
Mark Miller
  • 3,011
  • 1
  • 14
  • 34
  • 1
    I haven't tried it out, but "group_concat" looks like it would be promising: http://stackoverflow.com/a/18214142/4154134 – jdussault Jul 17 '15 at 21:06
  • 1
    @jdussault that looks promising. Do you want to submit an answer, or should I submit what I was able to do with your suggestion? – Mark Miller Jul 17 '15 at 21:14
  • Go ahead and do it! I'll be interested to see what it looks like. :) – jdussault Jul 17 '15 at 21:17

1 Answers1

2

Thanks, @jdussault !

SELECT ?term (group_concat(distinct ?syn ; separator = "|") AS ?synset) ?extid
FROM <http://bioportal.bioontology.org/ontologies/BTO>
WHERE
{
  ?extid <http://bioportal.bioontology.org/metadata/def/prefLabel> ?term .
  ?extid <http://www.geneontology.org/formats/oboInOWL#hasRelatedSynonym> ?syn .
}
group by ?term

.

+-----------------------+-----------------------------------------------------------------------------+-----------------+
|         term          |                                   synset                                    |      extid      |
+-----------------------+-----------------------------------------------------------------------------+-----------------+
| "3T3-F442A cell"      | "F442A cell|3T3-442A cell"                                                  | bto:BTO_0001169 |
| "stria terminalis"    | "terminal stria|Tarins tenia|tenia semicircularis|Fovilles fasciculus"      | bto:BTO_0004616 |
| "intervertebral disc" | "spinal disk|spinal disc|intervertebral fibrocartilage|intervertebral disk" | bto:BTO_0003625 |
+-----------------------+-----------------------------------------------------------------------------+-----------------+
Mark Miller
  • 3,011
  • 1
  • 14
  • 34