0

I am running the audiolm implementation from github and facing error in the following

audiolm = AudioLM(
    wav2vec = wav2vec,
    codec = soundstream,
    semantic_transformer = semantic_transformer,
    coarse_transformer = coarse_transformer,
    fine_transformer = fine_transformer
)

text = "The sound of a violin playing a sad melody"
generated_wav = audiolm(text=text, batch_size=1)

I have tried changing the dimensions in the transformers, but the issue is still there,

fine_transformer = FineTransformer( num_coarse_quantizers = 3, num_fine_quantizers = 5, codebook_size = 1024, dim = 1024, depth = 6, audio_text_condition=True,      # this must be set to True (same for SemanticTransformer and FineTransformer) )
coarse_transformer = CoarseTransformer( num_semantic_tokens = wav2vec.codebook_size, codebook_size = 1024, num_coarse_quantizers = 3, dim = 1024, depth = 6, audio_text_condition=True,      # this must be set to True (same for SemanticTransformer and FineTransformer) )
semantic_transformer = SemanticTransformer( num_semantic_tokens = wav2vec.codebook_size, dim = 1024, depth = 6, audio_text_condition = True      # this must be set to True (same for CoarseTransformer and FineTransformers) ).cuda()

but I am still get the following error,

AssertionError: you had specified a conditioning dimension of 1024, yet what was received by the transformer has dimension of 768

0 Answers0