do I use netbeans or Sparql in protege?

Question

I have a question in my project. I do not know whether I need to work netbeans or not. My work is about library book of recommendation systems . that as input I need book Classification ontology . in my ontology classify library books. this classification has 14 categories, beside the sibling classes Author, book, Isbn. Individuals in book class are book’s subject(about 600 subjects) , and individuals in author class are name’s author and also isbn class.

also I collected and Have got in part of belong book to categories manually. That a object properties is name “hasSubject” related individual book class with categories. Example book “A” hasSubject Categories “S” and “F” and…. But as a finally result I want to apply this formula:

sim(x,y)=(C1,1)/(C1,0+ C0,1+ C1,1)

where C1,1 represents the number of categories that book “X” and book”Y” belongs it.(they) and C1,0 represents the number of categories that book “X” belongs them but book “Y” does not belong them. And C0,1 represents the number of categories that book “y” belongs them but book “x” does not belong them. Finally Similarity is obtained between two book (“A”and”B”) . no again apply this formula to book”A” and book”C” and so on. Until Similarity is obtained between all books. Now Your opinion this work done by netbeans or sparql in protégé?

I think that maybe I tell that if I make hasSibinling properties that represented, in every book Compute The group has shared the books with her.( What do you think I am)

It's not really clear what you're asking. You can use Protégé and SPARQL without using Netbeans. If I understand what you're asking about the similarity metric between X and Y, you can probably do that in SPARQL, because SPARQL includes some mathematical operations. For evaluating mathematical formulae, there's an [answer](http://stackoverflow.com/a/17319546/1281433) that involves evaluating some formulae on sequences that might be some help, if you need to do some tricky manipulations. Your case might be simpler, though. — Joshua Taylor, Jul 20 '13 at 18:06
I don't know when you would want to use NetBeans. I tend to use Eclipse, myself, but there are plenty of command line tools for working with SPARQL, so I can actually do a lot without using any IDE, to be honest. I've added an answer that shows how you can compute these similarity metrics using just SPARQL queries (which I've executed by using Jena's command line tools). — Joshua Taylor, Jul 20 '13 at 19:17
yes I would like to calculate similarity between X and Y . but i also apply this formulae for all books. example X and y , again x and z again x and s ,,,,, and y and z ,... so I do not know how to implementation in sparql that Calculate the pairwise. — sima412, Jul 20 '13 at 19:20
The answer that I've posted shows how to compute the similarity using SPARQL. — Joshua Taylor, Jul 20 '13 at 19:21
I'm sorry, I'm not sure what you're asking. I believe that the formula being computed is the one that you described; I don't think that the values are too high. Can you clarify what you mean? — Joshua Taylor, Jul 20 '13 at 19:41

score 1 · Accepted Answer · answered Jul 20 '13 at 19:15

You can compute this kind of metric using SPARQL, though it's a bit ugly. Let's assume some data like this:

prefix dcterms: <http://purl.org/dc/terms/>
prefix : <http://example.org/books/>

:book1 a :Book ; dcterms:subject :subject1 , :subject2, :subject3 .
:book2 a :Book ; dcterms:subject :subject2 , :subject3, :subject4 .
:book3 a :Book ; dcterms:subject :subject4 , :subject5 .

There are three books. Books 1 and 2 have two subjects in common, and one each that the other does not have. Books 2 and 3 have one subject in common, but Book 2 has 2 that Book 3 does not have, while Book 3 has only one that Book 2 does not have, Books 1 and 3 have no subjects in common.

The trick here is to use some nested subqueries, and to grab the different values (C10, C01, and C11) at different levels in the nesting. The innermost query is

select ?book1 ?book2 (count(?left) as ?c10) where {
  :Book ^a ?book1, ?book2 .
  FILTER( !sameTerm(?book1,?book2) )
  OPTIONAL { 
    ?book1 dcterms:subject ?left .
    FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
  }
}
group by ?book1 ?book2

which grabs each pair of distinct books and computes the number of subjects that the left book has that the right doesn't. By wrapping this in another query, we can then grab the number of subjects that the right book has that the left doesn't. This makes the query:

select ?book1 ?book2 (count(?right) as ?c01x) (sample(?c10) as ?c10x) where {
  {
    select ?book1 ?book2 (count(?left) as ?c10) where {
      :Book ^a ?book1, ?book2 .
      FILTER( !sameTerm(?book1,?book2) )
      OPTIONAL { 
        ?book1 dcterms:subject ?left .
        FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
      }
    }
    group by ?book1 ?book2
  }

  OPTIONAL { 
    ?book2 dcterms:subject ?right .
    FILTER NOT EXISTS { ?book1 dcterms:subject ?right }
  }
}
group by ?book1 ?book2

Note that we still have to select ?book1 and ?book2, and sample(?c10) as ?c10x in order to pass the values outward. (We have to use ?c10x because the name ?c10 has already been used at this scope. Finally, we wrap this in one more query to get the common subjects, and to do the computation, which gives us:

prefix dcterms: <http://purl.org/dc/terms/> 
prefix : <http://example.org/books/> 

select ?book1 ?book2 
       (count(?both) as ?c11)
       (sample(?c10x) as ?c10)
       (sample(?c01x) as ?c01)
       (count(?both) / (count(?both) + sample(?c10x) + sample(?c01x)) as ?sim)
where {
  {
    select ?book1 ?book2 (count(?right) as ?c01x) (sample(?c10) as ?c10x) where {
      {
        select ?book1 ?book2 (count(?left) as ?c10) where {
          :Book ^a ?book1, ?book2 .
          FILTER( !sameTerm(?book1,?book2) )
          OPTIONAL { 
            ?book1 dcterms:subject ?left .
            FILTER NOT EXISTS { ?book2 dcterms:subject ?left }
          }
        }
        group by ?book1 ?book2
      }

      OPTIONAL { 
        ?book2 dcterms:subject ?right .
        FILTER NOT EXISTS { ?book1 dcterms:subject ?right }
      }
    }
    group by ?book1 ?book2 
  }

  OPTIONAL { 
    ?both ^dcterms:subject ?book1, ?book2 .
  }
}
group by ?book1 ?book2
order by ?book1 ?book2

This rather monstrous query, applied to our data, computes these results:

$ arq --data data.n3 --query similarity.sparql
--------------------------------------------
| book1  | book2  | c11 | c10 | c01 | sim  |
============================================
| :book1 | :book2 | 2   | 1   | 1   | 0.5  |
| :book1 | :book3 | 0   | 3   | 2   | 0.0  |
| :book2 | :book1 | 2   | 1   | 1   | 0.5  |
| :book2 | :book3 | 1   | 2   | 1   | 0.25 |
| :book3 | :book1 | 0   | 2   | 3   | 0.0  |
| :book3 | :book2 | 1   | 1   | 2   | 0.25 |
--------------------------------------------

If the FILTER( !sameTerm(?book1,?book2) ) line is removed, so that similarity of each book to itself is also computed, we see the correct value (1.0):

$ arq --data data.n3 --query similarity.sparql
--------------------------------------------
| book1  | book2  | c11 | c10 | c01 | sim  |
============================================
| :book1 | :book1 | 3   | 0   | 0   | 1.0  |
| :book1 | :book2 | 2   | 1   | 1   | 0.5  |
| :book1 | :book3 | 0   | 3   | 2   | 0.0  |
| :book2 | :book1 | 2   | 1   | 1   | 0.5  |
| :book2 | :book2 | 3   | 0   | 0   | 1.0  |
| :book2 | :book3 | 1   | 2   | 1   | 0.25 |
| :book3 | :book1 | 0   | 2   | 3   | 0.0  |
| :book3 | :book2 | 1   | 1   | 2   | 0.25 |
| :book3 | :book3 | 2   | 0   | 0   | 1.0  |
--------------------------------------------

If you don't need to preserve the various Cmn values, then you might be able to optimize this, e.g., by computing C01 in the innermost query, and the C10 in the next to middle query, but then instead of projecting both up individually, product just their sum (C10+C01) so that in the outermost query where you compute C11, you can just do (C11 / (C11 + (C10+C01))).

do I use netbeans or Sparql in protege?

1 Answers1

Linked