I am currently extracting a BigQuery table into sharded .csv's in Google Cloud Storage -- is there any way to shuffle/randomize the rows for the extract? The GCS .csv's will be used as training data for a GCMLE model, and the current exports are in a non-random order as they are bunched up by similar "labels".
This causes issues when training a GCMLE model as you must hand the model random examples of all labels within each batch. While GCMLE/TF has the ability to randomize the order of rows WITHIN individual .csv's, but there is not (to my knowledge) any way to randomize the rows selected within multiple .csv's. So, I am looking for a way to ensure that the rows being output to the .csv are indeed random.