Vocal removal rant

I strongly believe that if a brain (“trained ear”) can filter out different tracks from a song – a computer can. (Implied by the more controversial belief that there’s nothing too special about a brain – its an advanced computer, the best parallel processor we know, but we will have technological singularity one day, and we can probably do this separation with today’s technology anyway.)

Philosophy aside, someone else is writing this post now.

I have no idea what this for.  Something about brains?

Goodness. The original author has been gone for so long.

^^ that was a member of Team Panda LOL. Back to the point:

We can actively listen to the guitars in a song, or the vocals – its the cocktail party problem all over again. On the other hand, vocal extraction ideas proposed in an online community can be a little misleading:

Note on the analogy in #4. Its hard to predict the background information because it was lost. Is that applicable to a music file? Even with images, Photoshop has a content-aware tool that can guess what’s missing from the image. Still, the image doesn’t have the original information.

Its like projecting a vector (x, y, z) onto the xy plane and then using (x, y) to predict z. Photoshop looks around and makes an educated guess. On the other hand, for the song – can we say that information was actually lost in the first place?

There must have been some compression (and resulting noise), but I can still hear different instruments. Maybe our brain also analyses slices of the song, and we only hear a set of frequencies at a time – but we should still be able to analyze these slices (and interpolate the gaps – if there’s the need.)

On the whole, I don’t like the claim that the information has been completely lost. Then, another problem is overlap in frequencies:

"You can't separate vocals from the music without losing the frequencies in the vocals because many of the frequencies that are in the music are the same frequencies in the vocals."

I think that the problem with using an equalizer / non-adaptive filter to isolate the vocals is that these won’t shoot down a moving target! Adaptive filters would (given two versions of the song, atleast)! I am tempted to say that a neural net / ICA would do the job with one file too – but I have to read up on that.

So, lacking sufficient evidence, but knowing that my brain often can tell the difference, I disagree with:

"You can try all sorts of tricks but you can never achieve true layer separation."

I will soon have the voice track I am working on (I hope!)

And here’s a post with the old subtraction idea:

-(instrumental) + (instrumental + voice) = voice.


Center panned stuff:



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s