Assignment – 5: Voice Assistant
100 points
In this assignment, you will be designing your own customized voice assistant (“Hey
Google”, “Hey Siri” and Now! “Your creative phrase here”). Believe me, it is a very
simple implementation. You have all the commands at your disposal to accomplish
this task.
Training your voice with Key Phrase [30 points]
Script File: run_record_voices.m
At first, begin with the starter code “run_record_voices.m” provided as a part of this
assignment. This code is designed in a way that you record five different
modulations of the same phrase for instance it could be “Hey Jarvis” and make sure
to repeat the same phrase 5 times. Your recording is only for 1.5 seconds; make
sure your word does not exceed the duration. Visualize the Key phrase each time
you record and observe the plot, re-do the task in case you find any disturbances.
Make sure that no disturbances are present in any of those 5 words. Within the loop,
written in this code, make sure to apply the function “voice_to_envelopes.m”
(provided as a part of this assignment) with input arguments as the trigger_word and
100. This function acts like a filter. It converts your voices to envelopes (set the
variable name to be training_envelopes). Make sure to store the values of envelope
returned by the function in a column of a matrix (training_envelopes). Visualize the
envelope for each key phrase. After recording five times, variable
training_envelopes would be of size 12000 x 5. Please make sure that your envelope
meets these requirements. Now, save the envelopes in a MAT file as shown below.
“save training_words training_envelopes”
Check the comments in “run_record_voices.m” and Recording Voice Sample.mov
(present in resources) for further information.
Testing Phase [40 points]
Script file: run_test_voices.m
Create a new Script file named “run_test_voices.m”. Make sure to clear workspace
before you proceed further. Load the MAT file training_words as mentioned below:
“load training_words”
This command loads the envelopes you computed in your previous code (training
samples). Now, initiate a while loop where you would allow the user to utter phrases
for a maximum of three attempts or until he/she says the exact same key phrase
[whichever is earlier]. You can copy and paste the code that is present in
“run_record_voices.m” to record test words [make sure to assign ‘fs’ as 8000].
Apply voice_to_envelope to each phrase that user utters and save the variable as
testing_envelope.
testing_envelope=voice_to_envelope(test_audio,100)
Compare the similarity by computing the correlation between the testing_envelope
with each column of training_envelope. Maximum value of xcorr() provides the
similarity measure between 2 envelopes. Input arguments to xcorr() are
testing_envelope and training_envelope (one column each time).
Hint: You can initiate a ‘for’ loop (for the number of training words) within the
‘while’ loop to compute the similarity between the test envelope and each training
envelope.
You would have a single similarity value for a given a test phrase from each training
word [5 values in total]. If any of those values exceed the threshold (0.9) then you
would display to the user that he/she has decoded the password, else you would
recommend the user to repeat until he/she meets the maximum number of attempts.
Check Test Result Sample #1.mov, Test Result Sample #2.mov and Test Result
Sample #3.mov (in resources) for some of the possible results.
Report [30 points]
Now, write a report about the same. You may include plots you got for your key
phrase along with their envelopes. You can include plots of the envelope where the
user said something different from the key phrase and when he exactly mentioned
the same phrase. You may also study the performance of this algorithm by varying
the threshold instead of 0.9 and report the same.
Things to be submitted:
run_record_voices.m
run_test_voices.m
training_words.mat
Report