7

Is there a way to disable WAL replay on crash for Prometheus?

It takes a while for a pod to come back up due to WAL replay:

We can afford to lose some metrics if it meant faster recovery after the crash.

level=info ts=2021-04-22T20:13:42.568Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=449 maxSegment=513
level=info ts=2021-04-22T20:13:57.555Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=450 maxSegment=513
level=info ts=2021-04-22T20:14:12.222Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=451 maxSegment=513
level=info ts=2021-04-22T20:14:25.491Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=452 maxSegment=513
level=info ts=2021-04-22T20:14:39.258Z caller=head.go:714 component=tsdb msg="WAL segment loaded" segment=453 maxSegment=513
Steve
  • 863
  • 3
  • 9
  • 21
  • Another option is to try VictoriaMetrics instead of Prometheus - it doesn't have WAL and doesn't corrupt data on crashes. See https://valyala.medium.com/wal-usage-looks-broken-in-modern-time-series-databases-b62a627ab704 – valyala May 02 '21 at 22:47

1 Answers1

6

Not specifically that I'm aware of. You would have to rm -rf wal/ before starting Prom. Usually better to run multiple via Thanos or Cortex than to go down this path.

coderanger
  • 52,400
  • 4
  • 52
  • 75