I am doing server-side rendering with wkhtmltoimage and each run is locking up at "88% loading" for 1-2 minutes. I decided to debug what was happening through strace but in a completely bizarre twist the program did NOT lock up. I've found this to be completely repeatable and consistent. Why on earth would strace make a program faster when by all rights the program should be slower ?!
Run without strace:
user@server:~/public_html/shapes$ time wkhtmltoimage --disable-smart-width --width 970 --format jpg '[THE URL]' '[THE PATH].jpg'
Loading page (1/2)
Rendering (2/2)
Done
real 1m45.724s
user 1m42.887s
sys 0m0.623s
Run WITH strace:
user@server:~/public_html/shapes$ time strace wkhtmltoimage --disable-smart-width --width 970 --format jpg '[THE URL]' '[THE PATH].jpg'
execve("/usr/local/bin/wkhtmltoimage", ["wkhtmltoimage", "--disable-smart-width", "--width", "970", "--format", "jpg", "[THE URL]"..., "[THE PATH]"...], [/* 21 vars */]) = 0
brk(0) = 0x311a000
...
exit_group(0) = ?
+++ exited with 0 +++
real 0m6.526s
user 0m0.582s
sys 0m0.377s
Server is private so I've redacted the URL and PATH but otherwise the output is correct. Also both runs create the correct output and I've cleared up temporary files to ensure it's not a caching issue. I've done these runs 10 times each to ensure it's not a random artifact but it happens consistently, THEREFORE the only logical conclusion is that strace is somehow changing the behaviour of wkhtmltoimage and I was really hoping somebody could tell me what. If I knew why strace makes the program not lockup I could probably find a solution.
Here is the hung process:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
772 sigapp 20 0 1485600 45760 14524 R 99.8 2.2 0:57.02 wkhtmltoimage
As root I can attach to the hung image using strace -p $(pidof wkhtmltoimage)
and the result is:
gettimeofday({1436692542, 446246}, NULL) = 0
gettimeofday({1436692542, 556958}, NULL) = 0
gettimeofday({1436692542, 557161}, NULL) = 0
gettimeofday({1436692542, 659238}, NULL) = 0
gettimeofday({1436692542, 771381}, NULL) = 0
gettimeofday({1436692542, 771686}, NULL) = 0
gettimeofday({1436692542, 875783}, NULL) = 0
gettimeofday({1436692542, 987490}, NULL) = 0
gettimeofday({1436692542, 987781}, NULL) = 0
gettimeofday({1436692543, 84764}, NULL) = 0
...
System: Linux server 3.13.0-30-generic #55-Ubuntu SMP Fri Jul 4 21:40:53 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux (Running on 2 CPU Xen Guest)
Software: wkhtmltoimage 0.12.2.1 (with patched qt) <- installed from official website via .deb file
I've read this but not sure if relevant: Hung processes resume if attached to strace
UPDATE:
Tried with another program webkit2pdf
and seeing lockups as well. This time strace output is different. Both programs use webkit.:
root@server:~# strace -p `pidof webkit2pdf`
Process 4699 attached
restart_syscall(<... resuming interrupted call ...>
On a successful run of strace wkhtmltoimage ...
I can see similar calls but not NEARLY as many preceeded by an mmap:
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9dbbbc1000
gettimeofday({1436696839, 361942}, NULL) = 0
gettimeofday({1436696839, 362254}, NULL) = 0
gettimeofday({1436696839, 362505}, NULL) = 0
gettimeofday({1436696839, 362787}, NULL) = 0
gettimeofday({1436696839, 363172}, NULL) = 0
gettimeofday({1436696839, 363568}, NULL) = 0
gettimeofday({1436696839, 363913}, NULL) = 0
mmap(NULL, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9db5000000
gettimeofday({1436696839, 364701}, NULL) = 0
mmap(NULL, 28672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9db4ff9000
gettimeofday({1436696839, 365612}, NULL) = 0
gettimeofday({1436696839, 365956}, NULL) = 0
So per pvg's comment below this is most probably the locking mechanism for setAttribute
working correctly under strace
vs. not working normally.