-1

Solved

I have a string which has a conversation between two people along with their speaker tag.

I want to split the string into two sub strings containing speaker 1 and speaker 2 conversation only.

This is the code I am using to obtain the transcript.

operation = client.long_running_recognize(config, audio)
response = operation.result(timeout=10000)
result = response.results[-1]
words_info = result.alternatives[0].words
transcript = ''
tag=1
speaker=""
for word_info in words_info:
    if word_info.speaker_tag==tag:
        speaker=speaker+" "+word_info.word
    else:
        transcript += "speaker {}: {}".format(tag,speaker) + '\n'
        tag=word_info.speaker_tag
        speaker=""+word_info.word
transcript += "speaker {}: {}".format(tag,speaker)

This transcribes both speaker 1 and speaker 2 in the same file.

Solved: The Solution was much simpler. Thanks for the help.

transcript_1 = ''
transcript_2 = ''

for word_info in words_info:
    if word_info.speaker_tag==1:
        #speaker += " "+word_info.word
        transcript_1 += " " + word_info.word
    elif word_info.speaker_tag==2:
        #speaker += " "+word_info.word
        transcript_2 += " " + word_info.word
  • What have you tried so far? Can you show us some of your code? And a sample odf your input data? – chefhose Nov 12 '19 at 11:35
  • Your question is very unclear :( Both the "current result" and "expected result" don't make much sense. As already said, to give a better answer this really needs some example input data. Try to split up the code that does TTS and the code splitting the data. That way it will be easier for you to provide example data. – exhuma Nov 12 '19 at 12:58
  • Thanks a lot chefhose and exhuma. The solution was much simpler than I though and did not look into before when transcribing the original text. – Yasir Ahmed Pirkani Nov 12 '19 at 15:51

1 Answers1

2

Depending of how do you get the data, I mean, if you get an unique raw string with all the messages from both speakers or you get the messages from each speaker separately.

A basic approach would be to establish the string "speaker X:" (where N is the speaker number) as the speaker tag for the first speaker then you could extract each message from each speaker using tools like NLTK and/or built-in functions like find().

Note: When I talk about a tag, I refer to some expression that would allow us to determine if a message is from a certain speaker or not.

Example: You get the whole text that includes all the interventions of the speakers.

  • Steps to follow:

1) Set all speakers tags to distinguish their interventions in the whole text. Example: The speaker tag for the first speaker could be "speaker 1:"

2) Find all the interventions of a speaker using str.find("speaker_tag")

3) Add all the interventions of each speaker to different data structures. I think that a list of interventions of the speaker could be useful and then if you want to get all these interventions in one text message again, you could use some built-in function like str.join() to joining them into one string again.

Other option to solve this would be using a tool like NLTK (I think this tool is great to classify text)

It has very useful features like tokenization that I think it's would be useful to solve your problem.

In the following example, I am going to use find() and slicing for a basic example about text tokenization:

Text data:

text = "speaker 1: hello everyone, I am Thomas speaker 2: Hello friends, I am John speaker 1: How are you? I am great being here speaker 2: It's the same for me"

Code example:

from itertools import islice, tee

FIRST_SPEAKER_TAG = "speaker 1:"
SECOND_SPEAKER_TAG = "speaker 2:"

def get_speaker_positions(text, speaker_tag):

    total_interventions = text.count(speaker_tag)
    positions = []
    position = 0
    for i in range(total_interventions):
        positions.append(text.find(speaker_tag, position))
        # we increase the position by the addition of all the previous 
        # positions to reach the following occurrences through the list of 
        # positions
    position += sum(positions) + 1

    return positions

def slices(iterable, n):
    return zip(*(islice(it, i, None) for i, it in enumerate(tee(iterable, n))))

def get_text_interventions(text, speaker_tags):

    # speakers' interventions of the text
    interventions = { speaker_tag: "" for speaker_tag in speaker_tags }

    # positions where start each intervention in the text
    # (the last one is used to get the rest of the text, because it's the 
    # last intervention)
    # (we need to sort the positions to get the interventions in the correct 
    # order)
    speaker_positions = [
        get_speaker_positions(text, speaker) for speaker in speaker_tags
    ]
    all_positions = [
        position for sublist in speaker_positions for position in sublist
    ]
    all_positions.append(len(text))
    all_positions.sort()

    # generate the list of pairs that match a certain intervention
    # the pairs are formed by the initial and the end position of the 
    # intervention
    text_chunks = list(slices(all_positions, 2))

    for chunk in text_chunks:

        # we assign the intervention according to which 
        # list of speaker interventions the position exists
        # when slicing we add the speaker tag's length to exclude 
        # the speaker tag from the own intervention
        if chunk[0] in speaker_positions[0]:
            intervention = text[chunk[0]+len(speaker_tags[0]):chunk[1]]
            interventions[speaker_tags[0]] += intervention

        elif chunk[0] in speaker_positions[1]:
            intervention = text[chunk[0]+len(speaker_tags[1]):chunk[1]]
            interventions[speaker_tags[1]] += intervention

    return interventions

text_interventions = get_text_interventions(text, [ FIRST_SPEAKER_TAG, SECOND_SPEAKER_TAG ])

Notes:

If you have any doubt, you can read more details in the itertools documentation:

  • Documentation about itertools.islice and itertools.tee: islice tee

Feel free to ask me anything you didn't understand about the example. I hope you find it useful! =)

Rafael VC
  • 406
  • 5
  • 12
  • I am very new to python so bear with me. the return from function are first_speaker_interventions, second_speaker_interventions right ? also find_speaker_position needs to be first_speaker_position if I'm right ? text = text[first_speaker_position:] can you explain this after the if statements ? i tried running with the sample text from you and its not loading – Yasir Ahmed Pirkani Nov 12 '19 at 13:20
  • @YasirAhmedPirkani About the mispelling issue, I have fixed it. About the return issue, it was a modifier method, so the intention is only to modify the text, not to return the interventions. Give me some time to review it and give you a more explanatory example. – Rafael VC Nov 12 '19 at 16:14
  • @YasirAhmedPirkani Hi Yasir, I am glad you solved it, here I let you my approach in case that you want to review it =) – Rafael VC Nov 13 '19 at 10:38