voice_kal_diphone
and voice_ral_diphone
work correctly in singing mode (there's vocal output and the pitches are correct for the specified notes).
voice_cmu_us_ahw_cg
and the other CMU voices do not work correctly--there's vocal output but the pitch is not changed according to the specified notes.
Is it possible to get correct output with the higher quality CMU voices?
The command line for working (pitch-affected) output is:
text2wave -mode singing -eval "(voice_kal_diphone)" -o song.wav song.xml
The command line for non-working (pitch-unaffected) output is:
text2wave -mode singing -eval "(voice_cmu_us_ahw_cg)" -o song.wav song.xml
Here's song.xml
:
<?xml version="1.0"?>
<!DOCTYPE SINGING PUBLIC "-//SINGING//DTD SINGING mark up//EN" "Singing.v0_1.dtd" []>
<SINGING BPM="60">
<PITCH NOTE="A4,C4,C4"><DURATION BEATS="0.3,0.3,0.3">nationwide</DURATION></PITCH>
<PITCH NOTE="C4"><DURATION BEATS="0.3">is</DURATION></PITCH>
<PITCH NOTE="D4"><DURATION BEATS="0.3">on</DURATION></PITCH>
<PITCH NOTE="F4"><DURATION BEATS="0.3">your</DURATION></PITCH>
<PITCH NOTE="F4"><DURATION BEATS="0.3">side</DURATION></PITCH>
</SINGING>
You may also need this patch to singing-mode.scm
:
@@ -339,7 +339,9 @@
(defvar singing-max-short-vowel-length 0.11)
(define (singing_do_initial utt token)
- (if (equal? (item.name token) "")
+ (if (and
+ (not (equal? nil token))
+ (equal? (item.name token) ""))
(let ((restlen (car (item.feat token 'rest))))
(if singing-debug
(format t "restlen %l\n" restlen))
To set up my environment I used the festvox fest_build script. You can also download voice_cmu_us_ahw_cg separately.