2

I have an array which stores huge data and I need to insert those data into MongoDB.

I am able to achieve this using below code. But it takes 1.5 min. I need to push within fraction of seconds. Is there any other way to push huge array data into MongoDB?

HeadDet is an array and has 3 million record.

session, err := mgo.Dial(“localhost”)
if err != nil {
  panic(err)
}
defer session.Close()
// Optional. Switch the session to a monotonic behavior.
session.SetMode(mgo.Monotonic, true)

c := session.DB("Test").C("Indicators")

for i := 0; i < len(HeadDet); i++ {
  err = c.Insert(HeadDet[i])
}
if err != nil {
  log.Fatal(err)
}

I have referred this link

James Z
  • 12,209
  • 10
  • 24
  • 44
Alex Roz
  • 63
  • 8
  • @alexrox I think it's better to ask mongoDB authors by filling a github issue, usually drivers connnect db using network, and transferring can be pretty slow, some drivers need to convert data to native format, it depends on the implementation. – nilsocket Oct 04 '18 at 09:55
  • thanks.I will check. – Alex Roz Oct 04 '18 at 10:13
  • I doubt authors of mgo2 could help. It's unmaintained for quite long time already. Not sure how large 1 lakh is, but inserting "huge data" takes time. Just transferring 100Mb of bson over 1GB network cable with a single hop between client and server can't be less than a second, so be realistic aiming "fraction of a seconds" performance. – Alex Blex Oct 04 '18 at 11:43
  • Thanks Alex Blex.now it is inserting within 1.5 min.Is it possible reduce 20 to 30 sec? – Alex Roz Oct 04 '18 at 12:13

1 Answers1

3

First, drop labix.org/mgo (aka gopkg.in/mgo.v2), it's obsolte, unmaintained. Instead use the community supported fork: github.com/globalsign/mgo.

Next, to perform inserts or updates in masses, use the Bulk API introduced in MongoDB 2.6. The mgo driver has support for bulk operations using the mgo.Bulk type.

You want to insert "30 lakhs records". For those who don't know, "lakh" is a unit in the Indian numbering system equal to one hundred thousand (100,000). So 30 lakhs is equal to 3 million.

Using the Bulk API, this is how you can insert all those efficiently:

c := session.DB("Test").C("Indicators")

// BULK, ORDERED
bulk := c.Bulk()
for i := 0; i < len(HeadDet); i++ {
    bulk.Insert(HeadDet[i])
}
res, err := bulk.Run()

Note that if you don't care about the order of inserts, you may put the bulk operation in unordered mode which may speed things up:

// BULK, UNORDERED
bulk := c.Bulk()
bulk.Unordered()
for i := 0; i < len(HeadDet); i++ {
    bulk.Insert(HeadDet[i])
}
res, err := bulk.Run()

For comparison, on my computer (client-server is the same machine so no network latency) a loop with 3 million individual inserts takes 5 minutes and 43 seconds.

The ordered Bulk operation to insert 3 million documents takes 18.6 seconds!

The unordered Bulk operation to insert 3 million documents takes 18.22 seconds!

icza
  • 389,944
  • 63
  • 907
  • 827
  • Hi András, what happened to [Each group of operations can have at most 1000 operations](https://docs.mongodb.com/manual/reference/method/Bulk/#ordered-operations) limit? Does the driver handles it internally? Just curios since I always chunk my bulks explicitly. – Alex Blex Oct 04 '18 at 14:26
  • 1
    @AlexBlex Yes, the `mgo` driver handles that internally (actually as I see it, MongoDB itself also takes care of it). You don't have to to worry about that. Please test the bulk solution, and report back your test results. Although note that if all the documents are not yet available in your memory, you may "chunk" it up manually to not have to keep all in memory. – icza Oct 04 '18 at 14:27
  • Thanks.I tried with globalsign also.Taking same time .I can try this with go channel and go routine.I got input that create small chunk using make functionality of input data and spread the data over multiple instances.How to achieve this? I haven't used go channel and go routine much. – Alex Roz Oct 09 '18 at 15:31
  • @AlexRoz Making insertion of large amount of documents concurrent will most likely not yield significant performance boost, if any, the bottleneck is most likely the network bandwidth or the MongoDB CPU or disk I/O. – icza Oct 09 '18 at 15:37
  • Thanks for your input – Alex Roz Oct 09 '18 at 15:53
  • Just try only.If you have any example for go routine and go channel for mongodb.Please share – Alex Roz Oct 09 '18 at 15:59
  • @AlexRoz See: [Is this an idiomatic worker thread pool in Go?](https://stackoverflow.com/questions/38170852/is-this-an-idiomatic-worker-thread-pool-in-go/38172204#38172204) – icza Oct 09 '18 at 16:06
  • Thanks for your comment. – Alex Roz Oct 09 '18 at 16:10
  • My problem is because my var is `data interface{}` and i can`t insert into mongo this type, bulk.Insert(data[i]) return me this message: invalid operation: data[i] (type interface {} does not support indexing)go . any idea how can solve this ? – Hernan Humaña Apr 09 '19 at 19:56