3

How can I use the bleve text-indexing library, https://github.com/blevesearch/bleve, to index XML content?

I thought about using code like this XML parser in Go: https://github.com/dps/go-xml-parse, but then how do I pass what is parsed to Bleve to be indexed?

Update: My XML:

My XML looks like the following:

<page>
    <title>Title here</title>
    <image>image url here</title>
    <text>A sentence of two about the topic</title>
    <facts>
        <fact>Fact 1</fact>
        <fact>Fact 2</fact>
        <fact>Fact 3</fact>
    </facts>
</page>
themihai
  • 7,903
  • 11
  • 39
  • 61
wordSmith
  • 2,993
  • 8
  • 29
  • 50
  • For trivial cases, it looks like you load the XML into a struct with [`encoding/xml`](http://godoc.org/encoding/xml) and let bleve do the rest; see http://godoc.org/github.com/blevesearch/bleve . For complex docs I don't know if pointing bleve at the "root" object of each document is sufficient or if you have to "flatten" it into a simple object first. bleve looks like a cool project, so I'm interested in seeing what the answer turns out to be. – twotwotwo Sep 16 '14 at 04:47
  • 1
    Perhaps you could show some example XML that you've got? – topskip Sep 16 '14 at 10:24
  • @topskip yeah, I don't know why I didn't do that. I have updated the question with my XML. – wordSmith Sep 16 '14 at 13:31
  • @twotwotwo can you show me some examples of how to do this. I've posted what my XML looks like. There are multiple `` elements that look like that. – wordSmith Sep 16 '14 at 13:32
  • Sorry, I've never really used either `encoding/xml` or `bleve`; if no one else shows up to answer, maybe start playing around with them based on examples from their docs, etc., and if you get stuck, post a question with the code/expected behavior/actual behavior. – twotwotwo Sep 16 '14 at 20:01

1 Answers1

2

You would create a struct defining the structure of your XML. You can then use the standard "encoding/xml" package to unmarshal XML into the struct. And from there you can index the struct with Bleve as normal.

http://play.golang.org/p/IZP4nrOotW

package main

import (
    "encoding/xml"
    "fmt"
)

type Page []struct {
    Title string `xml:"title"`
    Image string `xml:"image"`
    Text  string `xml:"text"`
    Facts []struct {
        Fact string `xml:"fact"`
    } `xml:"facts"`
}

func main() {
    xmlData := []byte(`<page>
    <title>Title here</title>
    <image>image url here</image>
    <text>A sentence of two about the topic</text>
    <facts>
        <fact>Fact 1</fact>
        <fact>Fact 2</fact>
        <fact>Fact 3</fact>
    </facts>
</page>`)

    inputStruct := &Page{}
    err := xml.Unmarshal(xmlData, inputStruct)
    if nil != err {
        fmt.Println("Error unmarshalling from XML.", err)
        return
    }

    fmt.Printf("%+v\n", inputStruct)
}
MDrollette
  • 6,887
  • 1
  • 36
  • 49
  • So according to the playground result, the struct looks good except it doesn't have all the facts. It has only fact 3, like so: `Facts:[{Fact:Fact 3}]}]` At first, I want it to have all facts. But, later on and in reality, I want it to have certain facts based on an if condition. Can you show me how to do: if x == 1 show the first fact or if x == 2 show the second fact in the struct. – wordSmith Sep 17 '14 at 21:42