-1

I absolutely have no knowledge on servers, but I have to fix this problem by myself. If anyone can point me to where I start, that would be really appreciated...

So here is what happened.. in my weblogic server, there are some apps deployed as WAR files.. I will call these apps A,B,C,D. They all uses JNDI to connect to a database.

Let's say... A and B uses JNDI called "123" to connect to Oracle DB "hello" & C and D uses JNDI called "456" to connect to Oracle DB "world"


When the DB "hello" went offline for maintenance, A&B failed due to no connection, but C & D were not affected. - This was expected and I was ok with it..

but, when the DB "world" went down for the maintenance, the entire app A~D failed and the jvm went to "unreachable" state.. I was not able to resume the jvm (it was saying jvm was in imcomplete state). I had to bounce the jvm to bringing it back. After I bounced the jvm, I was able to get the apps, A&B, working while the DB "world" was on the maintenance..

I am guessing there are some extra configuration done on the server for the DB "world", like keep trying to connect infinitely which ended up failing the whole JVM.. But I have very minimal knowledge on how the server works, and I have no clue where to start or look at to prevent this from happening in the future...


Any help is very much appreciated. Thank you.

RedA
  • 73
  • 3
  • 9
  • This question is way too broad. One first hint: get yourself a copy of 'Release it' by Michael Nygard. That book talks extensively about patterns that prevent whole systems going down when sub systems crack. And then you will probably have to spend months to dig into application code and fix things. Or you get yourself a new job which better fits your skills. – GhostCat Jun 29 '17 at 03:48
  • Hi. Thanks for the book recommendation. I will definitely take a time to look into it. I know this question is broad and was hoping to get some insights where to start from. And yes, i am definitely not hired as a server admin, but things do not always go as it is said ;) – RedA Jun 29 '17 at 03:55
  • It is about improving the service - so in the end it is a development position. You might learn a lot there - but it is a difficult position you are in. Because the people that put you there must be a) not knowing what they are doing or b) desperate. They will probably give you a few weeks and then start blaming you for whatever goes wrong. Just saying: be prepared for a lot of problems on various levels. – GhostCat Jun 29 '17 at 03:59

1 Answers1

0

Usually, it requires a master-master, master-slave, or cluster database architectures to build such a backup (distributed database model) or to provide high availability and scalability. It really depends on what you or your organization needs. This SO answers might help: Master-master vs master-slave database architecture?

The simplest way might be to hire cloud service since they are handling such a problem on your behalf.