2

Running neo4j 2.0.1 community version on an AWS EC2 instance. Neo4J server is getting stuck close to 100% CPU after some read requests.

The CPU continue to stuck close to 100% even when there are no read or write.

The ubuntu 'top' command just shows a java process consuming the CPU. How do I debug this? How do I know what neo4j is doing to keep CPU close to 100%

Update: I see below GC logs continously:

70356.833: [GC 485305K->421306K(590488K), 0.0023720 secs]
70356.873: [GC 485498K->421273K(590488K), 0.0023950 secs]
70356.917: [GC 485465K->421152K(590488K), 0.0027120 secs]
70356.961: [GC 485344K->421407K(590488K), 0.0023500 secs]
70357.004: [GC 485599K->421205K(590488K), 0.0034150 secs]
70357.049: [GC 485397K->421174K(590488K), 0.0027470 secs]
70357.097: [GC 485366K->421335K(590488K), 0.0022430 secs]
70357.140: [GC 485527K->421615K(590488K), 0.0024140 secs]
70357.189: [GC 485807K->421826K(590488K), 0.0025360 secs]
70357.237: [GC 486018K->422124K(590488K), 0.0031070 secs]
70357.285: [GC 486316K->421844K(590488K), 0.0024500 secs]
70357.325: [GC 486036K->421985K(590488K), 0.0024550 secs]
70357.365: [GC 486177K->422020K(590488K), 0.0028860 secs]
70357.411: [GC 486212K->421787K(590488K), 0.0025340 secs]
70357.457: [GC 485979K->421863K(590488K), 0.0027430 secs]
70357.505: [GC 486055K->422085K(590488K), 0.0023570 secs]
70357.553: [GC 486277K->422297K(590488K), 0.0024670 secs]
70357.601: [GC 486489K->422474K(590488K), 0.0023700 secs]

I see GC logs for very long time even though there are no queries hitting. I think GC is consuming close to 100% CPU(or something else?).

Java-neo4j thread dump when CPU is close to 100%: https://onedrive.live.com/redir?resid=49F6403CD7EC37D4!107&authkey=!AM_esZ8nS-iPRCQ&ithint=file%2clog

Krishna Shetty
  • 1,361
  • 4
  • 18
  • 39
  • One possible reason is a lack of memory for your operations. Do you get any errors like Out of Memory or GC overhead limit exceeded? – Mirko Ebert Sep 09 '14 at 07:42
  • I don't see Out of Memory or GC overhead limit exceeded errors. – Krishna Shetty Sep 09 '14 at 08:49
  • So you have a 1Gb Heap, how much data do you have and what is/was running when you see this? – JohnMark13 Sep 09 '14 at 08:56
  • Thanks. We have around 300MB of data. There were some read queries ran. But GC logs continue even after completion of read queries. – Krishna Shetty Sep 09 '14 at 09:33
  • Are you sure that there is no query running? If you get a timeout message using the browser interface, the query hasn't stopped. You won't get a result in the browser, but the server is continuing to process the query. – Jim Biard Sep 09 '14 at 13:36
  • I think so, I don't see any requests in data/log/http.log but, I still see GC logs. I think my read queries are not timing out, functionality is working fine, but slow. – Krishna Shetty Sep 09 '14 at 14:52
  • I think confirmed - I see GC logs for very long time even though there are no queries hitting. I think GC is consuming(or something else?) the CPU Any hints to fix this? – Krishna Shetty Sep 10 '14 at 05:01
  • @Harali It will be hard to offer a solution without understanding more about the environment. Obviously something is doing something, but it could be a background process in Neo. Can you produce a thread dump then so we can see what is running? – JohnMark13 Sep 10 '14 at 07:41
  • I have got the thread dump : https://onedrive.live.com/redir?resid=49F6403CD7EC37D4!107&authkey=!AM_esZ8nS-iPRCQ&ithint=file%2clog .Please let me know if you need any other info. Thanks – Krishna Shetty Sep 10 '14 at 10:25

1 Answers1

0

Looking at the thread dump that you have provided I can see 6 open queries running requests that have come in over the rest endpoint (or at least that is how I am interpreting the lines - at org.neo4j.server.rest.repr.CypherResultRepresentation.serialize(CypherResultRepresentation.java:83) all of which occur in RUNNABLE state) .

Like @JimBaird says I think that you probably have some queries that you thought had run but are really hanging around in the background thrashing your machine.

Unfortunately I do not think that you can kill a slow query, so you might need to try restarting it.

JohnMark13
  • 3,709
  • 1
  • 15
  • 26
  • Thank you, I think this is what is happening. I am trying to fix the queries which is taking time. Is there any way to find out which query is taking time(may be similar to mysql slow query log)? – Krishna Shetty Sep 10 '14 at 14:11
  • That is a separate question, but see here for a clue http://stackoverflow.com/questions/21262004/cypher-profile-via-neo4j-rest-api . Of course if you're running the queries it should be obvious which is slow and if you're not sure why, raise another question. – JohnMark13 Sep 10 '14 at 14:19