For a very simple beat tracker you probably want to use librosa's built-in beat tracking:
import librosa
y, sr = librosa.load(librosa.util.example_audio_file())
tempo, beats = librosa.beat.beat_track(y=y, sr=sr)
# beats now contains the beat *frame positions*
# convert to timestamps like this:
beat_times = librosa.frames_to_time(beats, sr=sr)
That gives you the beat positions. But you actually have been asking for downbeat estimation. Your idea to find the beat with the highest energy is good, but you might want to incorporate some additional knowledge and average over corresponding beats. E.g., if you know the track is in 4/4 time, you could sum up the energy of every fourth beat and then conclude that the beat position with the highest energy sum is the downbeat.
Roughly like this:
import librosa
import numpy as np
y, sr = librosa.load('my file.wav')
# get onset envelope
onset_env = librosa.onset.onset_strength(y, sr=sr, aggregate=np.median)
# get tempo and beats
tempo, beats = librosa.beat.beat_track(onset_envelope=onset_env, sr=sr)
# we assume 4/4 time
meter = 4
# calculate number of full measures
measures = (len(beats) // meter)
# get onset strengths for the known beat positions
# Note: this is somewhat naive, as the main strength may be *around*
# rather than *on* the detected beat position.
beat_strengths = onset_env[beats]
# make sure we only consider full measures
# and convert to 2d array with indices for measure and beatpos
measure_beat_strengths = beat_strengths[:measures * meter].reshape(-1, meter)
# add up strengths per beat position
beat_pos_strength = np.sum(measure_beat_strengths, axis=0)
# find the beat position with max strength
downbeat_pos = np.argmax(beat_pos_strength)
# convert the beat positions to the same 2d measure format
full_measure_beats = beats[:measures * meter].reshape(-1, meter)
# and select the beat position we want: downbeat_pos
downbeat_frames = full_measure_beats[:, downbeat_pos]
print('Downbeat frames: {}'.format(downbeat_frames))
# print times
downbeat_times = librosa.frames_to_time(downbeat_frames, sr=sr)
print('Downbeat times in s: {}'.format(downbeat_times))
Your mileage with code like this will vary. Success depends on the kind of music, genre, meter, quality of beat detection, etc. That's because it's not trivial.
In fact, downbeat estimation is a current Music Information Retrieval (MIR) research topic and not entirely solved. For a recent review of advanced deep learning-based automatic downbeat tracking you might want to check out this article.