Only the field polynomial has to be prime or primitive, not the generating polynomial. If you wanted a RS(10000, 8000, 2000) (n = 10000, k = 8000, n-k = 2000) code, GF(2^16) with primitive reducing polynomial x^16 + x^12 + x^3 + x^1 + 1 could be used. The generating polynomial would be of degree 2000. Assuming first consecutive root is 2, then generating polynomial = (x-2)(x-4)(x-8) ... (x-2^2000) (all of this math done in GF(2^16), + and - are both xor). Correction would involve generating 2000 syndromes and using Berlekamp Massey or Sugiyama's extended Euclid decoder. I don't know if there are Python libraries that support GF(2^16).
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Berlekamp%E2%80%93Massey_decoder
https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction#Euclidean_decoder
Large n and k can be avoided by interleaving. Tape drives like LTO treat large data blocks as matrices interleaved across rows (called C1) and down columns (called C2) using GF(2^8). LTO-8 uses RS(249,237,13), which I assume is the ECC used down columns to correct rows. With 32 read|write heads, there's an interleave of 32, probably across rows. I don't know what the RS() code is across rows, or what the interleave down columns is.