Yuan Gong, Yu-An Chung, James Glass MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA
Prompt: “usage of AST: Audio Spectrogram Transformer”
After attaching the above PDF source to ChatGPT, I’ve prompted with `usage of AST: Audio Spectrogram Transformer`
A spectrogram is a visual representation of the spectrum of frequencies in a sound or other signal as they vary with time.
For Visual Learners ♥
Being kind is always a good idea! specially to a machine. :)
vGPT4: How AST Works
Visualization captures the sequence from raw audio through to the output after transformer processing. top: Raw audio is converted into a spectogram. middle: The Transformer processes spectogram data points through multiple layers, focusing on different features in the spectrogram, learning complex patterns and relationships. bottom: output can be used for various purposes, such as classifying audio into different categories (e.g., music, speech, environmental sounds) or detecting specific events within the audio.