0

I have a webpage with a web form with flask. Currently, users will need to manually enter their information into the webpage. Then it's appended to a table that they are redirected to once clicking submit. The setup is basically: video is autoplayed and asks user questions, the user fills out their answers manually, once clicking submit, they see their answers are appended to a table.

I want to reduce the clutter of the page and make it so that the user can verbally give their responses to the video questions. I've read about getusermedia, websockets, and WebRTCs, but am getting confused about them. I've looked all over here, youtube, reddit, and the like. Specifically, here, here, here, and here without much luck.

I'm thinking a simply for loop with speech recognizer with the different variable in a dict and then passing the data as is, but i'm not sure how to connect that microphone action with the frontend in particular. Isn't the front end where all of the data resides, so we need an http request to obtain it and analyze it? Here's my code:

main.py:

from flask import render_template, Flask, request
import os
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer as SIA
import nltk
import io
import os
from nltk.corpus import stopwords
import speech_recognition as sr

app = Flask(__name__, static_folder = 'static')

# # set the stopwords to be the english version
# stop_words = set(stopwords.words("english"))
# # create the recognizer
# r = sr.Recognizer()
# # define the microphone
# mic = sr.Microphone(device_index=0)
# r.energy_threshold = 300
# # vader sentiment analyzer for analyzing the sentiment of the text
# sid = SIA()
# user = []
# location = []
# state = []
# info = [user, location, state]
# # patient.name?


@app.route("/", methods=["GET", "POST"])
def home():
    user = request.values.get('name')
    location = request.values.get('location')
    state = request.values.get('state')
    # if request.method == "POST":
        # with mic as source:
        #     holder = []
        #     for x in info:
        #         audio_data = r.listen(source)
        #         r.adjust_for_ambient_noise(source)
        #         text = r.recognize_google(audio_data, language = 'en-IN')
        #         holder.append(text.lower())
        #         if x == "state":
        #             ss = sid.polarity_scores(holder)
        #             if ss == "neg":
        #                 x.append(str("sad"))
        #             else:
        #                 x.append(str("not sad"))
        #         else:
        #             filtered_words = [words for words in holder if not words in stop_words] # this filters out the stopwords
        #             x.append(filtered_words.lower())

        # return redirect(url_for('care', user = user))

    return render_template('index.html', user = user, location=location, state=state)

@app.route("/care", methods=["POST"])
def care():
    user = request.values.get('name')
    location = request.values.get('location')
    state = request.values.get('state')
    return render_template('list.html', user = user, location=location, state=state)


if __name__ == "__main__":
    #app.run(debug=True)    
    app.run(debug=True, threaded=True)

index.html:

{% extends "base.html" %}
{% block content %}

<!---------Therapist Section--------->
    <section id="therapist">
        <div class="container" id="therapist_container">
            <script>
              window.onload = function() {
            </script>
            <div id="button">
              <button type="button" class="btn btn-primary" id="therapist-button" data-toggle="modal" data-target="#myModal">Talk with Delphi</button>
            </div>
            
            <!-- Modal -->
            <div class="modal fade" id="myModal" tabindex="-1" role="dialog" aria-labelledby="vid1Title" aria-hidden="true">
              <div class="modal-dialog modal-dialog-centered" role="document">
                <div class="modal-content">
                  <div class="modal-body">
                    <video width="100%" id="video1">
                      <source src="./static/movie.mp4" type="video/mp4">
                    </video>
                    <form action="/care" method="POST">
                      <input type="text" name="name" placeholder="what's your name?" id="name">
                      <input type="text" name="location" placeholder="Where are you?" id="location">
                      <input type="text" name="state" placeholder="how can I help?" id="state">
                      <input id="buttonInput" class="btn btn-success form-control" type="submit" value="Send">
                    </form>
                  </div>
                </div>
              </div>
            </div>
            <script>
              $('#myModal').on('shown.bs.modal', function () {
              $('#video1')[0].play();
              })
              $('#myModal').on('hidden.bs.modal', function () {
                $('#video1')[0].pause();
              })
              video = document.getElementById('video1');
              video.addEventListener('ended',function(){       
              window.location.pathname = '/care';})

              function callback(stream) {
                  var context = new webkitAudioContext();
                  var mediaStreamSource = context.createMediaStreamSource(stream);
              }

              $(document).ready(function() {
                  navigator.webkitGetUserMedia({audio:true}, callback);
              }

            </script>
        </div>
    </section>
{% endblock content %}

list.html:

{% extends "base.html" %}
{% block content %}

<!----LIST------>
<section id="care_list">
    <div class="container" id="care_list_container">
        <h1 class="jumbotron text-center" id="care_list_title">{{ user }} Care Record</h1>
        <div class="container">
            <table class="table table-hover"> 
                <thead>
                  <tr>
                    <th scope="col">Session #</th>
                    <th scope="col">Length</th>
                    <th scope="col">Location</th>
                    <th scope="col">State</th> 
                  </tr>
                </thead>
                <tbody>
                  <tr>
                    <th scope="row">1</th>
                    <td>{{ length }}</td>
                    <td>{{ location }}</td>
                    <td>{{ state }}</td>
                  </tr>
                  <tr>
                    <th scope="row">2</th>
                    <td></td>
                    <td></td>
                    <td></td>
                  </tr>
                  <tr>
                    <th scope="row">3</th>
                    <td colspan="2"></td>
                    <td></td>
                  </tr>
                </tbody>
              </table>
        <ul class="list-group list-group-flush" id="care_list">
            <li class="list-group-item">Please email tom@vrifyhealth.com for help.</li>
        </ul>
    </div> 
</section> 
{% endblock content %}
Tom
  • 196
  • 1
  • 10

2 Answers2

2

Its more easy than we think as like we create new Array(), there is a code new SpeechRecognition() to create a voice to text converter. No external library is need to do this. Here is code:-

            /* JS comes here */
            function SpeechRecog() {
                var output = document.getElementById("output");
                var action = document.getElementById("action");
                var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;
                var recognition = new SpeechRecognition();
            
                // This runs when the speech recognition service starts
                recognition.onstart = function() {
                    action.innerHTML = "<small>listening, please speak...</small>";
                };
                
                recognition.onspeechend = function() {
                    action.innerHTML = "<small>stopped listening, hope you are done...</small>";
                    recognition.stop();
                }
              
                // This runs when the speech recognition service returns result
                recognition.onresult = function(event) {
                    var transcript = event.results[0][0].transcript;
                    var confidence = event.results[0][0].confidence;
                    output.value=transcript;
                };
              
                 // start recognition
                 recognition.start();
            }
button{
  color:white;
  background:blue;
  border:none;
  padding:10px;margin:5px;
  border-radius:1em;
}
input{
  padding:.5em;margin:.5em;
}
<p>I'm Aakash1282,<br> Are you lazy, here is voice writer for your</p>
<p><button type="button" onclick="SpeechRecog()">Write By Voice</button> &nbsp; <span id="action"></span></p>
        <input type="text" id="output">

These codes giving some problem in Stack Overflow, but working perfectly at local files, here is Codepen working codes : https://codepen.io/aakash1282/pen/xxqeQyM

Take its reference and make your form as per you want.

  • If I wanted to output the speech recognition to a specific input box, then do I just augment the output.value code? – Tom Jun 22 '21 at 03:46
  • See JS -> var output = document.getElementById("output"); See HTML -> . i think, you got it. –  Jun 22 '21 at 05:15
1

Actually you cannot use flask for speech recognition. Flask is a backend framework and runs on the server you host it on. Since you want that the speech said by the user should be recognized, you need to use something that is on the client side, i.e, using JavaScript. You could use this tutorial to complete your task.

  • yeah I was expecting to use js to access the microphone and actually gather the users data, while cleaning and preprocessing the data in python...can I do that? – Tom Jun 22 '21 at 02:13
  • 1
    Actually I have never tried doing that, so not sure if possible – MananGandhi1810 Jun 22 '21 at 03:50