0

i have spark app which insert data into titan with goblin. but it insert duplicate vertexes with same name. the test condition 'if not result:' not match, and i am in the same session.

def savePartition(p):
    print ('savePartition', p)
    from goblin import element, properties

    class Brand(element.Vertex):
        name = properties.Property(properties.String)

    import asyncio

    loop = asyncio.get_event_loop()

    from goblin.app import Goblin
    app = loop.run_until_complete(Goblin.open(loop))
    app.register(Brand)

    async def go(app):
        session = await app.session()

        for i in p:
            if i['brand']:
                traversal = session.traversal(Brand)
                result = await traversal.has(Brand.name, i['brand']).oneOrNone()

                if not result:  # TODO: Remove Duplicates
                    print(i)
                    brand = Brand()
                    brand.name = i['brand']
                    session.add(brand)
                    session.flush()

        await app.close()

    loop.run_until_complete(go(app))

rdd = rdd.foreachPartition(savePartition)

how to fix it? thanks a lot.

Akmal
  • 132
  • 1
  • 16
softwarevamp
  • 827
  • 10
  • 14

1 Answers1

1

I am not sure how this would work with Goblin but if you want Titan to prevent duplicates based on a vertex property you can just use Titan composite indices and specify that they must be unique. For example, you could do the following:

mgmt = graph.openManagement()
name = mgmt.makePropertyKey('name').dataType(String.class).make()
mgmt.buildIndex('byNameUnique', Vertex.class).addKey(name).unique().buildCompositeIndex()
mgmt.commit()

The above will specify that the name property on vertices must be unique.

Filipe Teixeira
  • 3,565
  • 1
  • 25
  • 45
  • thanks! I have one question more. If i want the name unique by label, how to achieve that? – softwarevamp Dec 08 '16 at 10:21
  • If you asking how to ensure labels are unique then Titan can't help you there. Labels are not mean to be unique. Checkout [this](http://stackoverflow.com/a/36295205/1457059) answer for more info. – Filipe Teixeira Dec 08 '16 at 10:23
  • Also i use Elastic Search for external index, but titan tells `An external index cannot be unique`, any thoughts? – softwarevamp Dec 08 '16 at 11:52
  • Elastic search is used for Mixed Indices and as stated [here](http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html): `**Note** Unlike composite indexes, mixed indexes do not support uniqueness.` For uniqueness you are better off using a simple composite index as I defined above. – Filipe Teixeira Dec 08 '16 at 11:56