Finding authors with no coäuthors
If I understand your question correctly, you're trying to ask for authors (and their papers)who have never coauthored a paper with someone else. You don't actually need to match the author list to do this, if the papers are related to the authors by the :author
property. These problems are always much easier if we have some data to work with, so consider this data:
@prefix : <http://stackoverflow.com/q/21391444/1281433/> .
:p1 :author :a, :b .
:p2 :author :a .
:p3 :author :b, :c .
:p4 :author :d .
A has written a paper with B, and also alone. B has written a paper with A, and also with C. C has written a paper with B. D has written a paper alone.
We can use a query like this to find all the authors who have never coauthored a paper (in this case, D):
prefix : <http://stackoverflow.com/q/21391444/1281433/>
select ?author ?paper where {
?paper :author ?author .
filter not exists {
?paper2 :author ?author, ?otherAuthor .
filter ( ?author != ?otherAuthor )
}
}
This corresponds to the English:
Find papers with authors such that there is no paper by that author with another author.
We get the expected results:
------------------
| author | paper |
==================
| :d | :p4 |
------------------
If you still wanted to pick and exclude based on regular expressions in the author list string, you can do that with
prefix : <http://stackoverflow.com/q/21391444/1281433/>
select ?author ?paper where {
# find authors of papers with no coauthors
?paper :author ?author ; :listAuthor ?list .
filter(!regex(?list," and "))
# and remove those that coauthored some paper
filter not exists {
?paper2 :author ?author ; :listAuthor ?list2 .
filter(regex(?list2," and "))
}
}
Debugging the original query
The original query can be abbreviated as the following, which is exactly the same, except for some syntactic sugar.
SELECT DISTINCT ?x ?y WHERE {
?x swrc:listAuthor ?y ; swrc:author ?w.
FILTER (!regex(?y, " and ")).
?a swrc:listAuthor ?b ; swrc:author ?c.
FILTER regex(?b, " and ").
FILTER(?c != ?w).
}
Aside from the filter
at the end, the pattern on ?x
, ?y
and ?w
is completely separate from the pattern on ?a
, ?b
, and ?c
. From the first pattern, you'll get one binding for each author of each paper with just one author (which means one binding for each paper with just one author). From the second pattern, you'll get one binding for each author of each paper with multiple authors. Then you're essentially taking the cartesian product of these two sets of (author,paper) pairs, to get a bindings of the form (paper1,author1,paper2,author2), and then the final filter
says "remove any bindings where author1 is the same as author2.
Consider what this means for the data I gave above, but let's look just at papers :p1
and :p2
. Since :a
authored :p1
alone, we'll have :a
as ?w
and :p1
as ?x
:
?x ?w
-------
:p1 :a
However, since :a
also authored paper :p2
with :b
, we'll have some rows for ?a
and ?c
:
?a ?c
-------
:p2 :a
:p2 :b
Now the cartesian product is:
?x ?w ?a ?c
--------------
:p1 :a :p2 :a
:p1 :a :p2 :b
The filter removes the first of these rows, leaving us with
?x ?w ?a ?c
--------------
:p1 :a :p2 :b
and this has :a
as ?w
, even though :a
coäuthored papers with someone. In general:
- Each
?x
is a paper with a single author (?w
).
- Each
?w
is an author who has written a paper alone (?x
).
- Each
?a
is paper with multiple authors, one of which is ?c
, and one of which (?w
) wrote a paper (?x
) alone.
- Each
?c
is an author who has coauthored a paper (?a
) with someone (?w
) who has written a paper alone (?x
).