0

I am with a project of statistics analysis with Apache logs with MongoDB with Java. Apache logs comes like:

Tue, 13 Feb 2018 11:39:26.081 ;; ProcessId = 28889 ;; IPRequest = 10.160.74.43 ;; IPLocal = 10.160.85.46 ;; SizeResponseBytes = 2968 ;; TimeResponse = 14213 ;; Protocol = HTTP/1.1 ;; Port = 80 ;; Method = GET ;; Url = /login/ ;; Query =  ;; HTTPstatus = 200 ;; BytesReceived = 479 ;; ByteSend = 3509 ;; Referer = - ;; ServerName = www.managercapture.com ;; UseCanonicalServerName = 10.160.85.46 ;; User-Agent = Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 ;; SessionID = -

This part:

BytesReceived = 479 ;; ByteSend = 3509

are data in bytes received and sent in a HTTP request.

In Mongo, I have a collection like:

{
    date: yyyy/MM/dd HH:00:00
    data: [
        {second: 1, byteSent: 100, bytesReceived: 200},
        {sedond: 44, byteSent: 322, bytesReceived: 150},
        ...
    ]
}

Now comes another line with {second: X, byteSent: 555, bytesReceived: 300}.

I wonder if I can do this in one query:

  • Search for the document with date and data.second, for example, 1 or 3.
  • If found, sum the value of same seconds, to get the total bytes data in one second(in the same second there may be more than one requests). (second 1 has previous data, so sum up: {second: 1, byteSent: 555+100, bytesReceived: 300+200})
  • If not found, add this document into the list. (second 3 has no previous data, so add the document: {second: 3, byteSent:555, bytesReceived:300}.)

Answers like "It is not possible because ..." is also welcome, with ref, etc.

WesternGun
  • 11,303
  • 6
  • 88
  • 157
  • In order to get the sum that you're looking for you would need to use the aggregation framework which, however, cannot update data as of MongoDB 3.6. So you will probably need to split the two parts into two separate queries (one to find the information you are looking for and then a second one to insert) which should be pretty feasible. – dnickless Feb 19 '18 at 16:28
  • I imagine so... if not I would get an answer in less time.. thanks anyway. I am checking the doc and noticed that in the navigation sidebar, the volume of captions dedicated to aggregations is much more than those about `$update`.... Hope that they can enable some aggregators in the `$update` phase. – WesternGun Feb 19 '18 at 16:33
  • But, if I have to insert/update millions of records every hour, will the query speed become a performance bottleneck? – WesternGun Feb 19 '18 at 16:38
  • Query performance is largely driven by your indexing strategy and MongoDB has been designed for both lots of records and high performance. So, no, not necessarily but it depends on your implementation... – dnickless Feb 19 '18 at 16:54
  • OK it depends on my implementation, but Mongo on its own has done his best. So.. setting indexes on the fields to search is necessary. – WesternGun Feb 20 '18 at 08:58
  • I have found [this question](https://stackoverflow.com/questions/4669178/how-to-update-multiple-array-elements-in-mongodb?rq=1), which has an answer using `aggregate` and javascript code. So if I want to do all this, I have to wrap js code into query and send them into Mongo directly, with a role with all access to all resources; it should be created on admin db. So, maybe going the hard way works?? – WesternGun Feb 20 '18 at 09:17

0 Answers0