0

I am going through this blog: https://huggingface.co/blog/fine-tune-xlsr-wav2vec2 aimed for speech recognition. When I run the code in Google Colab, it works fine, WER is lower then 0.5. But when I try the same code with MacBook Pro (Apple M2 max), I get WER equal to 1.0. it seems like it has to do something of how numbers are treated, but cannot understand how to have the same results. Predictions are empty strings. The only change that I make is on the model:

mps_device = torch.device("mps")
model.to(mps_device)
user1680859
  • 1,160
  • 2
  • 24
  • 40
  • *Predictions are empty strings*, then the WER calculation is correct and the error rate is %100. Are the predictions same if you run the **same** code without `.device("mps")`? – doneforaiur Aug 03 '23 at 09:16
  • Yes, I do: logits=model(input_dict.input_values.to("mps")).logits. So, yes WER is correct, but prediction not the same as with Colab – user1680859 Aug 03 '23 at 09:20
  • I suppose the versions are the same. Are the audio files identical? – doneforaiur Aug 03 '23 at 09:24
  • Yes, everything the same. I download jupyter from Colab and run the same on Macbook Pro. Maybe it is in this part: def prepare_dataset(batch): audio = batch["audio"] # batched output is "un-batched" batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0] batch["input_length"] = len(batch["input_values"]) batch["labels"] = processor(text=batch["sentence"]).input_ids return batch; something there needs .to(device) also? – user1680859 Aug 03 '23 at 09:39

1 Answers1

0

I moved everything to cpu (torch, model, training arguments) and it worked fine. Much slower but correct WER. It seems Apple's gpu/mps can not process the correct training.

user1680859
  • 1,160
  • 2
  • 24
  • 40