2

I have written a custom crawler to index all the data from the connections seedlists

https:///forums/seedlist/myserver

When we started utilizing subcommunities, I double checked to make sure subcommunities behave practically the same as communities. They seem to, they have all the same properties in the Connections DB, just subs have a parent uuid. Got it.

I expected my crawler to find the sub communities discussions (basically just iterating through the atom feed with a Java XML parser) and pulling out the relevant information. Are subcommunities not published to this seedlist? If not, there does not seem to be a subcommunity specific seedlist.

We are currently on Connections 4.5

Thank you.

M. A. Kishawy
  • 5,001
  • 11
  • 47
  • 72
darethas
  • 7,377
  • 3
  • 23
  • 32

1 Answers1

1

I have found the answer here.

http://www-10.lotus.com/ldd/appdevwiki.nsf/xpDocViewer.xsp?lookupName=IBM+Connections+4.5+API+Documentation#action=openDocument&res_title=Community_entry_content_ic45&content=pdcontent

There seems to be an additional element that links to the sub-community feed from within the community. A crawler will need to send a GET request to that link.

darethas
  • 7,377
  • 3
  • 23
  • 32
  • 1
    that's right. you get the parent, and pull in the subcommunties – Paul Bastide Oct 02 '14 at 01:36
  • @PaulBastide actually, this didn't answer it 100%. It explains how to get it from the my/all communities feed, but not from the seedlist feed. There is an issue here because the atom apisource link is encoded in utf-8, and the seedlists are in utf-16 – darethas Oct 06 '14 at 18:04
  • I believe it's because connections supports double byte character sets, and therefore requires UTF16. http://www-10.lotus.com/ldd/appdevwiki.nsf/xpDocViewer.xsp?lookupName=IBM+Connections+5.0+API+Documentation#action=openDocument&res_title=Seedlist_response_ic50&content=pdcontent might provide more details. I've asked a colleague to confirm. – Paul Bastide Oct 06 '14 at 18:21
  • @PaulBastide another difference I noticed, when you follow the link and drill down into the sub community, suddenly the opening tag of the document after the xml declaration goes from to . Not sure if this has implications on parsing, but my parser is throwing an unexpected EOF. I appreciate you asking a colleague for me. – darethas Oct 06 '14 at 19:22
  • my colleague says that the seedlist lists subcommunities and communities. Feeds are for collections, and Entries are for the individual descriptions, you'll want to look specifically at the communities API - look here - https://greenhouse.lotus.com/sbt/SBTPlayground.nsf/Explorer.xsp#. for UTF16/UTF8 i don't yet have an answer – Paul Bastide Oct 06 '14 at 19:39
  • @PaulBastide okay thank you. This is turning into a real nightmare. (The Greenhouse link is under maintenance it seems, lol) – darethas Oct 07 '14 at 19:09
  • @PaulBastide I am now led to believe something is going on with WAS and the response it sends for the different seedlists vs a single instance of a community's atom feed. The UrlConnection gives back an EmptyBuffer, which is why the parser giving the unexpected EOF – darethas Oct 13 '14 at 13:23
  • You might want to check to see if your indexer is interrupting the process, and maybe it's causing the blip. You should contact support if that's the case. – Paul Bastide Oct 13 '14 at 13:47
  • @PaulBastide thank you, we filed a PMR. I appreciate you trying to help – darethas Oct 13 '14 at 15:07
  • @PaulBastide I managed to get around my issue. Once you arrive at the subcommunity atom entry, I see a way to get to the members list and the forum posts, but how do you get the news items (blog posts) and events? – darethas Oct 14 '14 at 04:23
  • you can get the instance of your community, and then iterate over the links to services in the instance - http://www-10.lotus.com/ldd/appdevwiki.nsf/xpDocViewer.xsp?lookupName=IBM+Connections+4.5+API+Documentation#action=openDocument&res_title=Retrieving_a_community_ic45&content=pdcontent – Paul Bastide Oct 14 '14 at 13:02