How It Works
Voice to Time-Frequency Spectrogram
Now, this problem can be considered as an image classification problem of spectrograms.
Rather than height and width pixels, we simply have time and frequence pixels in the image.
Neural Network Architecture
Feed the spectrogram into convolutional neural network as you would do so in an image problem.
Note that this model is chosen because it's comparatively simple and quick to train, rather than being state of the art.
Related Work: Sainath, T., & Parada, C. (2015). Convolutional neural networks for small-footprint keyword spotting.
PAC-MAN implementation is borrowed from Google LLC
PAC-MAN™ © BANDAI NAMCO Entertainment Inc.