Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

Jianwei Zhang, Julie Liss, Suren Jayasuriya, and Visar Berisha

Supplemental - Interactive UMAP Projection of SVD In-corpus Validation Dataset

Introduction
This website provides an interactive version of the SVD in-corpus validation dataset UMAP projection. This material contains the raw audio of some subjects and can be played by clicking the points on the figure. In general, voices located in the top-right corner present with marked dysphonia. Voice samples located on the top-left and bottom-right present with healthy voice quality or mild dysphonia. We indicate two axes on the embeddings distribution: (1) from the bottom-left to the top-right and (2) from the bottom-left to the top-left, which are related to vocal quality and voice pitch respectively. These two axes are also consistent with our training design: (1) a contrastive loss is used to separate dysphonic and healthy voice, and (2) samples either from male or female subjects are selected in one training batch. This result further illustrates the proposed acoustic feature embeddings are sensitive to the vocal quality and voice characteristics.

Usage
When the mouse pointer hovers over the target point, the icon of that point will be enlarged for easily recognize. Then click the left mouse button to hear the /a/ phonation voice audio corresponding to the target point.