0

Hi I have two csv data sources. The csv contains name of employees.

csv1 with heading: Name

Contents: Jack, Tom, Andy, Jim, Stella.

csv2 with heading: EmployeeName

Contents: Bella, Stefan, Jim, Cathy, Jack

Now I need a SPARQL query, where I can search the two csv, and have a variable where both the data can be combined avoiding duplicates(for ex: avoiding Jim and Jack in this instance, but getting data of these names only once)

Rahul
  • 11
  • 2
  • 2
    The next step is to transform and load your CSV data into RDF, and load that into an RDF Database. You could write some code to iterate over your CSV, using an open source library such as https://rdf4j.org/ or https://jena.apache.org/ or https://rdflib.readthedocs.io/en/stable/index.html There is a tutorial in the Amazon Neptune graph-notebook github project to learn the basics of SPARQL here: https://github.com/aws/graph-notebook/tree/main/src/graph_notebook/notebooks/06-Language-Tutorials/01-SPARQL. You can run these open source notebooks locally or on AWS. – Charles Jan 25 '23 at 11:52
  • @Charles ?bgnTeamInstance a bgn:BgnTeamInstance ; bgn:employeeName ?Name . ?docupediaDataInstance a doc:docupediaDataInstance ; doc:teamMemberName ?employeeName . Now I want to have name from both without duplicates. What statement do I need to give?? – Rahul Jan 25 '23 at 13:30
  • This answer may help here @rahul - https://stackoverflow.com/a/75234963/19461455 – HES Jan 25 '23 at 15:56
  • 1
    @Rahul not that I understand your query given that you didn't provide the RDF data or the the RDF schema, but you should obviously join on both names if that is really your identifier for entities from different CSV files. To get all names form two different triple patterns, then you should use `UNION` and simply the same variable name, i.e. `SELECT DISTINCT ?Name WHERE { { ?bgnTeamInstance a bgn:BgnTeamInstance ; bgn:employeeName ?Name . } UNION { ?docupediaDataInstance a doc:docupediaDataInstance ; doc:teamMemberName ?Name . } }`, otherwise your current query does return the cartesian – UninformedUser Jan 26 '23 at 07:23
  • @UninformedUser thank you for the code, this is useful but the only problem I see is that in bgn I have Name and in Docupedia I have employeename, so in this case how do I join and take distinct name? – Rahul Jan 26 '23 at 10:31
  • So the query is: SELECT DISTINCT ?employeeName WHERE { { ?bgnTeamInstance a orgai:BgnTeamInstance ; orgai:employeeName ?employeeName. } UNION { ?docupediaDataInstance a orgai:docupediaDataInstance ; orgai:teamMemberName ?teamMemberName . } } So I want distinct names from union of both employeename and teammembername. How do I do it?@UninformedUser – Rahul Jan 26 '23 at 10:37
  • that's why I just used the same name for the variable with `?Name`, note this is just a place holder for the bindings, in each part of the `UNION` you bind names from your different data sources to the same variable. – UninformedUser Jan 27 '23 at 12:00
  • @UninformedUser SELECT DISTINCT ?name WHERE { { ?bgnTeamInstance a bgn:BgnTeamInstance ; bgn:employeeName ?employeeName . } UNION { ?teamMemberInstance rdfs:subClassOf ?teamInstance ; a orgai:TeamMember ; orgai:teamMemberName ?teamMemberName . } BIND(?employeeName AS ?name) BIND(?teamMemberName AS ?name) } Have given this name and I only get name from the second csv(teammembername) and not both the csv – Rahul Jan 27 '23 at 13:22

0 Answers0