0

I am trying to write some code that prints something, but it keeps printing something else. Below is the code, what it prints, and what I want it to print.

def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
    phoneme_list = phonemes.split()
    beam2 = [[['^'],1.0]]
    i = 0
    for phoneme in phoneme_list:
        beam = beam2*len(bigrams[phoneme])
        for value in bigrams[phoneme]:
            beam[i][0].append(value)
            if i == len(beam)-1:
                i = 0
            else:
                i += 1
            print(beam)





from collections import defaultdict
bigrams = defaultdict(dict, {'AH': {'u': 0.4, 'l': 0.2, 'ous': 0.2, 'e':       0.2}, 'IH': {'y': 0.16666666666666666, 'i': 0.6666666666666666, 'e': 0.16666666666666666}, 'AE': {'a': 1.0}, 'K': {'c': 0.4, 'x': 0.2, 'q': 0.2, 'ch': 0.2}, 'H': {}, 'G': {'g': 1.0}, 'SH': {'sh': 1.0}, 'Z': {'se': 1.0}, 'AA': {'o': 1.0}, 'JH': {'ge': 1.0}, 'W': {'u': 0.5, 'w': 0.5}, 'V': {'v': 1.0}, 'M': {'me': 0.2, 'm': 0.8}, 'N': {'ne': 0.2, 'n': 0.8}, 'F': {'f': 1.0}, 'B': {'b': 1.0}, 'D': {'de': 0.16666666666666666, 'dd': 0.16666666666666666, 'd': 0.6666666666666666}, 'OW': {'o': 1.0}, 'L': {'l': 0.8333333333333334, 'e': 0.16666666666666666}, 'T': {'te': 0.16666666666666666, 'tt': 0.08333333333333333, 't': 0.75}, 'EH': {'ea': 0.3333333333333333, 'a': 0.3333333333333333, 'e': 0.3333333333333333}, 'S': {'ss': 0.125, '_': 0.25, 's': 0.625}, 'R': {'re': 0.16666666666666666, 'r': 0.8333333333333334}, 'ER': {'or': 0.25, 'er': 0.75}, 'EY': {'ai': 0.2, 'a': 0.8}, 'P': {'p': 1.0}, 'IY': {'y': 0.5, 'e': 0.5}, 'AY': {'i': 1.0}}) 
trigrams = defaultdict(dict, {('T', 'u'): {'tt': 1.0}, ('S', '^'): {'s': 1.0}, ('D', '^'): {'d': 1.0}, ('K', 'e'): {'x': 1.0}, ('M', '^'): {'m': 1.0}, ('T', 'a'): {'te': 1.0}, ('S', 'x'): {'_': 1.0}, ('T', 'o'): {'t': 1.0}, ('T', 's'): {'t': 1.0}, ('AA', 'm'): {'o': 1.0}, ('IH', '^'): {'i': 0.6666666666666666, 'e': 0.3333333333333333}, ('D', 'n'): {'d': 1.0}, ('B', 'o'): {'b': 1.0}, ('IY', 'f'): {'e': 1.0}, ('K', 'i'): {'c': 1.0}, ('K', '^'): {'c': 0.3333333333333333, 'ch': 0.3333333333333333, 'q': 0.3333333333333333}, ('IH', 't'): {'i': 1.0}, ('S', 'or'): {'s': 1.0}, ('R', 'ch'): {'r': 1.0}, ('D', 'l'): {'d': 1.0}, ('IY', 'r'): {'y': 0.5, 'e': 0.5}, ('IH', 'm'): {'y': 1.0}, ('L', 'c'): {'l': 1.0}, ('EH', 'd'): {'a': 0.5, 'e': 0.5}, ('G', 'o'): {'g': 1.0}, ('V', 'n'): {'v': 1.0}, ('AE', 's'): {'a': 1.0}, ('S', 'y'): {'s': 1.0}, ('OW', 'r'): {'o': 1.0}, ('L', 'e'): {'l': 1.0}, ('N', 'i'): {'ne': 0.3333333333333333, 'n': 0.6666666666666666}, ('OW', 'l'): {'o': 1.0}, ('Z', 'n'): {'se': 1.0}, ('ER', 'm'): {'er': 1.0}, ('P', '^'): {'p': 1.0}, ('IH', 'u'): {'i': 1.0}, ('R', 'a'): {'re': 1.0}, ('R', '^'): {'r': 1.0}, ('T', 'e'): {'t': 1.0}, ('L', 'l'): {'e': 1.0}, ('EY', 't'): {'ai': 0.5, 'a': 0.5}, ('AY', 'l'): {'i': 1.0}, ('EY', 'b'): {'a': 1.0}, ('IY', 't'): {'y': 1.0}, ('ER', 'n'): {'er': 1.0}, ('OW', '^'): {'o': 1.0}, ('M', 'o'): {'me': 1.0}, ('S', 'u'): {'s': 1.0}, ('OW', 'g'): {'o': 1.0}, ('W', 'q'): {'u': 1.0}, ('T', '^'): {'t': 1.0}, ('S', 'ous'): {'_': 1.0}, ('AH', 'b'): {'u': 1.0}, ('EH', 'l'): {'ea': 1.0}, ('OW', 'm'): {'o': 1.0}, ('M', 'e'): {'m': 1.0}, ('EY', 'v'): {'a': 1.0}, ('EY', 'p'): {'a': 1.0}, ('AH', 'er'): {'ous': 1.0}, ('JH', 'er'): {'ge': 1.0}, ('ER', 'tt'): {'er': 1.0}, ('R', 't'): {'r': 1.0}, ('L', '^'): {'l': 1.0}, ('B', 'e'): {'b': 1.0}, ('SH', '^'): {'sh': 1.0}, ('ER', 'w'): {'or': 1.0}, ('W', '^'): {'w': 1.0}, ('T', 'i'): {'t': 1.0}, ('L', 'o'): {'l': 1.0}, ('B', '^'): {'b': 1.0}, ('F', '^'): {'f': 1.0}, ('AH', 'r'): {'u': 1.0}, ('L', 'ai'): {'l': 1.0}, ('N', 'ea'): {'n': 1.0}, ('AH', 'dd'): {'l': 1.0}, ('S', 'a'): {'ss': 0.5, 's': 0.5}, ('AH', 'd'): {'e': 1.0}, ('N', 'o'): {'n': 1.0}, ('AE', 'b'): {'a': 1.0}, ('AA', 'sh'): {'o': 1.0}, ('D', 'a'): {'de': 0.5, 'dd': 0.5}})
speech2text("M IH T", bigrams, trigrams, alpha=0.5)

Here is what it prints

[[['^', 'm'], 1.0], [['^', 'm'], 1.0]]
[[['^', 'm', 'me'], 1.0], [['^', 'm', 'me'], 1.0]]
...... and so on

Here is what I want it to print

[[['^', 'm'], 1.0], [['^', 'me'], 1.0]]
...... and so on

Basically, why is it appending the term onto both lists? I thought it had something to do with the fact that beam and beam2 'point' to the same list, I tried making beam2 = beam2*len(bigrams[phoneme]) and then beam = list(beam2), which I believe makes them point to two separate lists in the memory but maybe not?

Thanks for your help

EDIT:

So after some help from Gassa, my code now looks like this but I have a new problem:

def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
    beam = [[[['^'],1.0]] for k in range (len(bigrams[phoneme]))]
    for value in bigrams[phoneme]:
        beam[i][0].append(value)
        if i == len(beam)-1:
            i = 0
        else:
            i += 1
    beam2 = beam
    print(beam2)

here it prints beam2 which contains two sets, then three, then three, when really I need it to contain two, then six, then 18 sets. Which would work with this code:

def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
phoneme_list = phonemes.split()
beam2 = [[['^'],1.0]]
i = 0
for phoneme in phoneme_list:
    beam = [beam2 for k in range (len(bigrams[phoneme]))]
    for value in bigrams[phoneme]:
        beam[i][0].append(value)
        if i == len(beam)-1:
            i = 0
        else:
            i += 1
    beam2 = beam
    print(beam2)

But then of course we are back to the original problem.

Thanks again for your help!

Joseph Farah
  • 2,463
  • 2
  • 25
  • 36
Tehmo3
  • 51
  • 6

1 Answers1

1

The line

beam = beam2*len(bigrams[phoneme])

creates the list beam as len(bigrams[phoneme]) references to one and the same list beam2[0].

You can instead use a line like

beam = [[['^'],1.0] for k in range (len(bigrams[phoneme]))]

Note that beam2 is no longer used. This way, you get the output

[[['^', 'me'], 1.0], [['^'], 1.0]]
[[['^', 'me'], 1.0], [['^', 'm'], 1.0]]
...

Which is not exactly what you want, but at least the contents of beam are different lists now.


EDIT: As for the second part of your problem, this code seems to do what you want:

def speech2text(phonemes, bigrams, trigrams, alpha, topn=10):
    phoneme_list = phonemes.split()
    beam2 = [[['^'],1.0]]
    i = 0
    for phoneme in phoneme_list:
        beam = [copy.deepcopy (j) for j in beam2 for k in range (len(bigrams[phoneme]))]
        for j in range (len (beam2)):
            for value in bigrams[phoneme]:
                beam[i][0].append(value)
                if i == len(beam)-1:
                    i = 0
                else:
                    i += 1
        beam2 = beam
        print(beam2)
  1. The copy.deepcopy part ensures that all lists inside lists are copied properly, and you don't have to deal with the copying yourself.

  2. The for j in beam2 for k in range part is to put all the contents into the same list, not as a list of lists.

  3. The new for j in range (len (beam2)): part is to apply your changes to the whole beam, not only to its prefix.

Gassa
  • 8,546
  • 3
  • 29
  • 49
  • Super helpful but I am still running into one problem, my beam two starts with 2 lists, as it should when [['^'],1.0] is multiplied by two, but on my next iteration, I need to have 6 lists, which is the 2 lists multiplied by len(bigrams[phoneme]) which is now 3, doing it this way. I will update the main body with the new problem now :) – Tehmo3 Apr 29 '16 at 10:58
  • Thank you so much for your help! – Tehmo3 Apr 30 '16 at 00:08