I would like to be able to see all of the various things that happened to a kube cluster on a timeline, including when nodes were found to be dead, when new nodes were added, when pods crashed and when they were restarted.
So far the best that we have found is kubectl get event
but that seems to have a few limitations:
- it doesn't go back in time that far (I'm not sure how far it goes back. A day?)
- it combines similar events and orders the resulting list by the time of the latest event in each group. This makes it impossible to know what happened during some time range since events in that range may have been combined with later events outside the range.
One idea that I have is to write a pod that will use the API to watch the stream of events and log them to a file. This would let us control retention and it seems that events that occur while we are watching will not be combined, solving the second problem as well.
What are other people doing about this?