In this page, I have listed results, example waveforms and other details for some of the projects I have worked on. Click on one of the projects listed below.
Below are examples of lead vocalist/guitar separated from music clips using probabilistic latent decomposition. Different attempts point to different combinations of the number of basis vectors used to model foreground and background.
Song clip - from the song "Raise my Rent" by David Gilmour.
Attempt 1 Lead Guitar Background Music
Attempt 2 Lead Guitar Background Music
Song clip - from the song "Bande" from the soundtrack of Hindi Movie "Black Friday".
Attempt 1 Lead Singer Background Music
Attempt 2 Lead Singer Background Music
Attempt 3 Lead Singer Background Music
Song clip - from the song "Sayonee" by the band Junoon.
Attempt 1 Lead Singer Background Music
Song clip - from the song "Sunrise" by Norah Jones.
Attempt 1 Lead Singer Background Music
Song clip - from the song "Super Freak" by Rick James.
Attempt 1 Lead Singer Background Music
Below are example reconstructions using a compact code of 100 basis functions (denoted as CC), and a sparse-distributed code of 1000 basis functions (denoted as SC). In brackets are given two metrics that give an idea of the quality of separation. SNR is the Signal-to-Noise ratio improvement and SER is the speaker energy ratio (refer to the paper for details). Notice the improvements obtained by using the sparse code as compared to the compact code.
A set of utterances from the TIMIT database comprising approximately 25 seconds of speech was used as training data for each speaker. All signals were normalized to 0 mean and unit variance to ensure uniformity of signal level. Signals were analyzed in 64 ms windows with 32 ms overlap between windows. Spectral vectors were modelled by a mixture of 25 multinomial distributions. The training samples are given below
Mixed signals were obtained by digitally adding test signals for both speakers. The length of the mixed signal was set to the shorter of the two signals. The component signals were all normalized to 0 mean and unit variance prior to addition, resulting in mixed signals with 0dB SNR for each speaker. The results of separating mixed signals are given below. Next to the reconstructions are given the average SNR improvement over the mixed signal.NOTE: SNR improvements reported in Figure 4 in the paper are erroneous due to a bug in the calculation, please disregard them.
Mixture12 Reconstructed 1 5.0648 dB Reconstructed 2 4.8282 dB
Mixture23 Reconstructed 2 6.1495 dB Reconstructed 3 4.1943 dB
Mixture31 Reconstructed 3 5.3696 dB Reconstructed 1 4.4613 dB
Mixture14 Reconstructed 1 5.2382 dB Reconstructed 4 5.5066 dB
Mixture35 Reconstructed 3 4.7242 dB Reconstructed 5 4.9549 dB
Mixture45 Reconstructed 4 4.1662 dB Reconstructed 5 6.6353 dB
Mixture25 Reconstructed 2 4.6745 dB Reconstructed 5 4.5936 dB