IBM BPM abnormal time gap between saving changes and availability of these changes

Question

Our IBM BPM DEV environment has been facing some issues we cannot understand and resolve for about a week. Could You please have a look and consult me on these issues?

An abnormal time gap between saving changes to Process Applications/their process flows/services in Process Center and availability of these changes in Process Portal is detected. It varies but can be as much as 40 mins before saved changes are delivered to the Process Portal. Until that happens, users (developers, testers) continue working with old coaches, services, processes etc. while working with the Tip version of the application/process. It's like nothing was changed by the developer at all, which makes the process of development/technical testing very inefficient and frustrating.
Dashboards and Task forms from Tip versions have been taking significantly more time to load since Jan 13th than they used to be. We face this problem while working with Tip versions, there are no issues while working with snapshot versions.

I suspect it might be somehow related to the internal usage of the DB by IBM BPM, but our DBAs do not see any critical changes/performance issues on the DB side. Thus, I have no clues how to solve the aforementioned issues.

Our configuration:
BPM: 8.6.0.201803
Server: 2 CPU, 16GB RAM
$ df -h
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   90G   61G   26G  71% /
devtmpfs               7.8G     0  7.8G   0% /dev
tmpfs                  7.8G   84K  7.8G   1% /dev/shm
tmpfs                  7.8G  8.9M  7.8G   1% /run
tmpfs                  7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sda1              488M  185M  268M  41% /boot
tmpfs                  1.6G   16K  1.6G   1% /run/user/42
tmpfs                  1.6G     0  1.6G   0% /run/user/0
tmpfs                  1.6G     0  1.6G   0% /run/user/1006
tmpfs                  1.6G     0  1.6G   0% /run/user/1008
tmpfs                  1.6G     0  1.6G   0% /run/user/1007
tmpfs                  1.6G     0  1.6G   0% /run/user/1005

DB: Oracle, run in a supercluster.

Thanks in advance for Your help!

We recently ran into the same and it was due to two reasons. 1-We had too many snapshots of toolkits and process apps +100 as we've been working for months without cleanup, you'll need to start archiving and deleting these snapshots if not needed. 2-Our core dump directory was at full due to some errors which I can't see seems the issue in your case. — AbdelRahman Badr, Jan 20 '20 at 16:26
Hi, The Wizard Of Code! 1.I came to the same conclusion about archiving snapshots of our process apps and toolkits. 2.To be sure we are on the same page, could You please specify the path (maybe, relative path) to BPM’s core dump directory? The server still has some spare disk space, but it would be nice to monitor the directory You’ve mentioned to avoid disk space shortage in the future. When I finish archiving old snapshots, I’ll update the post with the results concerning performance boost I hope to achieve. Thanks! — Dmytro, Jan 21 '20 at 09:46
On our dev server, it's on /IBM/BPM86PS/profiles/BPMSrv01/ It's a .dmp file. On our case, it was 12 GBs at one point, we didn't know what caused such a thing. — AbdelRahman Badr, Jan 21 '20 at 16:17

score 1 · Answer 1 · answered Feb 01 '20 at 06:17

I'm currently maintaining an IBM BPM system across multiple environments, and I have seen this type of performance degradation after a certain period of time. In most cases, it is because the BPM system accumulates a lot of data over time, and these are not cleaned up regularly. I cannot be sure that the performance issues in your case are due to the same reason, but I still recommend starting from here.

This IBM developerWorks article is a good starting point for this activity: Purging data in IBM Business Process Manager.

On your Development environment, you would have a Process Center. A Process Center primarily accumulates snapshots of applications. Named snapshots are one thing, but the Process Center also keeps a delta-type of snapshot each time a Process Application is saved (from the Web Process Designer). These are called unnamed snapshots, and they can quickly accumulate to extremely large numbers.

The cleanup approach I use for a Process Center is as follows. I remove all the process instances first. Then, I remove unnamed snapshots beyond a certain count (100, to be specific). Then, I remove named snapshots which are archived. This task is scripted, and I run this on a weekly basis.

I have already communicated the effects of these actions to my development team. They are aware that they will lose process instances, but the value of these instances are already limited on a Process Center instance. I periodically remind them to archive old named snapshots, so that these are cleaned up as well.

I would also recommend that you investigate the disk usage on your system. IBM BPM primarily writes all of its data to its database, so there's really no reason for the filesystem to grow significantly. If your BPM instance has a tendency to crash, then you'll likely find a number of dump files (core dump/heap dump/thread dump) under your profile directory. You can remove these dump files to recover the space, but you should resolve the issue that causes the crash in the first place.

If you do find evidence of crashing, I recommend looking at your heap sizes as well as the branch and snapshot caches within BPM. Basically, this is a cache that loads the most recent versions of your process applications and their snapshots into memory, so that developers can work on them more quickly. While this sounds fine in theory, the default size of these caches are 64 - 64 branches, and 64 snapshots per branch. That's potentially 4096 process snapshots loaded into memory at once, which can easily cause an OutOfMemoryException and a crash.

You can tune the size of this cache by using a 100Custom.xml file. See this article for more details: Tuning branch and snapshot cache sizes in IBM Business Process Manager. Reducing the cache size will allow you to save on memory and avoid crashing. The tradeoff is that more database calls will be needed in case of a cache miss.

Hopefully, this information should help you narrow down the problems with your IBM BPM Process Center and restore your earlier levels of performance. Good luck!

IBM BPM abnormal time gap between saving changes and availability of these changes

1 Answers1