6

I have a machine with a Quadro P5000 graphics card, running Windows 10. I'd like to train a TTS voice on this system. What do I need to install to make this work?

GuyPaddock
  • 2,233
  • 2
  • 23
  • 27

1 Answers1

27

Here's what to install/do:

  1. Download and install Python 3.8 (not 3.9+) for Windows. During the installation, ensure that you:
  • Opt to install it for all users.
  • Opt to add Python to the PATH.
  1. Download and install CUDA Toolkit 10.1 (not 11.0+).
  2. Download "cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.1" (not cuDNN v8+), extract it, and then copy what's inside the cuda folder into C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1.
  3. Download the latest 64-bit version of eSpeak NG (no version constraints :-) ).
  4. Download the latest 64-bit version of Git for Windows (no version constraints :-) ).
  5. Open a PowerShell prompt to a folder where you'd like to install Coqui TTS.
  6. Run git clone https://github.com/coqui-ai/TTS.git.
  7. Run cd TTS.
  8. Run python -m venv ..
  9. Run .\Scripts\pip install -e ..
  10. Run the following command (this differs from the command you get from the PyTorch website because of a known issue):
.\Scripts\pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
  1. Put the following into a script called "test_cuda.py" in the TTS folder:
import torch
x = torch.rand(5, 3)
print(x)
print(torch.cuda.is_available())
  1. Run the script via .\Scripts\python ./test_cuda.py and confirm the output looks like this (the first part should have just random numbers, but the last line must read True; if it does not, CUDA is not installed properly):
tensor([[0.2141, 0.7808, 0.9298],
        [0.3107, 0.8569, 0.9562],
        [0.2878, 0.7515, 0.5547],
        [0.5007, 0.6904, 0.4136],
        [0.2443, 0.4158, 0.4245]])
True
  1. Put the following into a script called "train.bat" in the TTS folder, and then customize it for your configuration file:
set PYTHONIOENCODING=UTF-8
set PYTHONLEGACYWINDOWSSTDIO=UTF-8
set PHONEMIZER_ESPEAK_PATH=C:/Program Files/eSpeak NG/espeak-ng.exe

.\Scripts\python.exe ./TTS/bin/train_tacotron.py --config_path "C:/path/to/your/config.json"
  1. Run the script via .\train.bat.

If you are using a different model than Tacotron or need to pass other parameters into the training script, feel free to further customize train.bat.

If you are just getting started with TTS training in general, take a peek at How do I get started training a custom voice model with Mozilla TTS on Ubuntu 20.04?.

GuyPaddock
  • 2,233
  • 2
  • 23
  • 27
  • 1
    If you get "UnicodeEncodeError: ‘charmap’ codec can’t encode characters in position : character maps to " during training, you may need to apply changes from https://github.com/coqui-ai/TTS/pull/394 – GuyPaddock Mar 20 '21 at 20:52
  • How are you supposed to get this working on RTX cards then that are CUDA 11? – Skyler Feb 13 '23 at 07:01
  • 1
    i had to additionally do `.\Scripts\pip install networkx==2.8.8` because gruut requires networkx <3 and by the given command above by default, networkx was installed in version 3 – para Mar 18 '23 at 14:02
  • 1
    I have followed these steps very carefully, but I'm still getting False to CUDA being available. I've moved the folder as well as installed the CUDA toolkit. My GPU is a 1070 Ti if that matters. Any idea? – Tessa Painter Mar 26 '23 at 01:29
  • 1
    Any chance this could get an update? There are too many conflicting versions with packages now. – BrunoLM Jul 08 '23 at 12:21
  • I haven't been doing anything with TTS since 2021 and I have an older card so I don't have the context for the updates, but if someone has updates to suggest (or even a video) I would be happy to update these steps. – GuyPaddock Jul 21 '23 at 14:23