4

I am doing server-side rendering with wkhtmltoimage and each run is locking up at "88% loading" for 1-2 minutes. I decided to debug what was happening through strace but in a completely bizarre twist the program did NOT lock up. I've found this to be completely repeatable and consistent. Why on earth would strace make a program faster when by all rights the program should be slower ?!

Run without strace:

user@server:~/public_html/shapes$ time wkhtmltoimage --disable-smart-width --width 970 --format jpg '[THE URL]' '[THE PATH].jpg'
Loading page (1/2)
Rendering (2/2)
Done

real    1m45.724s
user    1m42.887s
sys     0m0.623s

Run WITH strace:

user@server:~/public_html/shapes$ time strace wkhtmltoimage --disable-smart-width --width 970 --format jpg '[THE URL]' '[THE PATH].jpg'
execve("/usr/local/bin/wkhtmltoimage", ["wkhtmltoimage", "--disable-smart-width", "--width", "970", "--format", "jpg", "[THE URL]"..., "[THE PATH]"...], [/* 21 vars */]) = 0
brk(0)                                  = 0x311a000
...
exit_group(0)                           = ?
+++ exited with 0 +++

real    0m6.526s
user    0m0.582s
sys     0m0.377s

Server is private so I've redacted the URL and PATH but otherwise the output is correct. Also both runs create the correct output and I've cleared up temporary files to ensure it's not a caching issue. I've done these runs 10 times each to ensure it's not a random artifact but it happens consistently, THEREFORE the only logical conclusion is that strace is somehow changing the behaviour of wkhtmltoimage and I was really hoping somebody could tell me what. If I knew why strace makes the program not lockup I could probably find a solution.

Here is the hung process:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  772 sigapp    20   0 1485600  45760  14524 R  99.8  2.2   0:57.02 wkhtmltoimage

As root I can attach to the hung image using strace -p $(pidof wkhtmltoimage) and the result is:

gettimeofday({1436692542, 446246}, NULL) = 0
gettimeofday({1436692542, 556958}, NULL) = 0
gettimeofday({1436692542, 557161}, NULL) = 0
gettimeofday({1436692542, 659238}, NULL) = 0
gettimeofday({1436692542, 771381}, NULL) = 0
gettimeofday({1436692542, 771686}, NULL) = 0
gettimeofday({1436692542, 875783}, NULL) = 0
gettimeofday({1436692542, 987490}, NULL) = 0
gettimeofday({1436692542, 987781}, NULL) = 0
gettimeofday({1436692543, 84764}, NULL) = 0
...

System: Linux server 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux (Running on 2 CPU Xen Guest)

Software: wkhtmltoimage 0.12.2.1 (with patched qt) <- installed from official website via .deb file

I've read this but not sure if relevant: Hung processes resume if attached to strace

UPDATE:

Tried with another program webkit2pdf and seeing lockups as well. This time strace output is different. Both programs use webkit.:

root@server:~# strace -p `pidof webkit2pdf`
Process 4699 attached
restart_syscall(<... resuming interrupted call ...>

On a successful run of strace wkhtmltoimage ... I can see similar calls but not NEARLY as many preceeded by an mmap:

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9dbbbc1000
gettimeofday({1436696839, 361942}, NULL) = 0
gettimeofday({1436696839, 362254}, NULL) = 0
gettimeofday({1436696839, 362505}, NULL) = 0
gettimeofday({1436696839, 362787}, NULL) = 0
gettimeofday({1436696839, 363172}, NULL) = 0
gettimeofday({1436696839, 363568}, NULL) = 0
gettimeofday({1436696839, 363913}, NULL) = 0
mmap(NULL, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9db5000000
gettimeofday({1436696839, 364701}, NULL) = 0
mmap(NULL, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9db4ff9000
gettimeofday({1436696839, 365612}, NULL) = 0
gettimeofday({1436696839, 365956}, NULL) = 0

So per pvg's comment below this is most probably the locking mechanism for setAttribute working correctly under strace vs. not working normally.

Community
  • 1
  • 1
SpliFF
  • 38,186
  • 16
  • 91
  • 120
  • Try using the `-c` option with `strace`. It might give you a better idea of what might be happening. – l'L'l Jul 12 '15 at 09:18
  • Stack trace might tell you more than `strace`. Attach `gdb` to that process and do `bt`. See: https://wiki.debian.org/HowToGetABacktrace – Piotr Praszmo Jul 12 '15 at 09:21
  • 3
    This suggests some sort or sync/race condition, the repeated calls to gettimeofday kind of smell of someone re-implementing some concurrency primitive and failing. stracing it from the get-go slows the whole thing down enough that you don't get the race. Is there a lot of JS on the page you're trying to render? – pvg Jul 12 '15 at 09:23
  • I will try those suggestions. There is a lot of JS and interaction with SVG so that might be the issue. I just tried another program `webkit2pdf` and it also seems to hang. Of course they both use webkit. – SpliFF Jul 12 '15 at 09:46
  • @pvg It appears webkit has SERIOUS performance problems with updating attributes on SVG objects. I turned on javascript tracing `--debug-javascript` and huge delays appear when I call `svg_el.setAttribute('data-color', to_color_id)`. That solves my problem so I'm guess you are close to the mark about this being a race condition or something. If you add your comment as an answer I'll accept it since your recommendation to check JS put me on the right track. – SpliFF Jul 12 '15 at 10:24
  • 1
    @SpliFF glad it worked out. My thing is definitely in the comment rather than answer category so I'll leave it as is. If I had to guess, I'd lean towards it being in the WK wrapper than WK itself (or the version of WK being used? Or both? Who knows). If you're feeling adventurous, you could try it in PhantomJS (http://phantomjs.org/screen-capture.html), if _that_ works, I'll totally write it as an answer and collect my fat loot of internet points. – pvg Jul 12 '15 at 10:53

0 Answers0