The Ceiling of a Browser BPM Detector I've Maintained for Ten Years

June 1, 20266 min read
open-sourceweb-audiosignal-processingmachine-learning

I wanted harmonic playlists. Tracks that flow from one tempo to the next, key-compatible, energy rising and falling on purpose instead of by accident. To build that, I needed one number for every track: its BPM.

There was a library on npm that could give it to me — but it swallowed the entire audio file, then analyzed it. That's fine if you have the file. I didn't. I listened to most of my music through YouTube, which doesn't hand you a file — it hands you an audio node: a live stream you can read but never fully download. I needed a tempo from that stream, chunk by chunk, as it played.

Nothing did that in the browser. So I built it. That was over ten years ago, and I'm still maintaining it today — 301 stars, 8,000 downloads a month. This is the story of the one thing about it I've never been able to fix: its ceiling.

Learning the Signal From Scratch

I had no signal-processing background. Some half-remembered high-school physics, and that was it. FFT, low-pass filters, amplitude thresholding — I learned all of it by reading papers and blog posts at night, one confused Google search at a time. Two resources carried me further than the rest: Tornqvist's bpm-detective and Joe Sullivan's Beat Detection Using Web Audio. (I also tried to detect musical key with HPCP and ran out of weekend; the tempo half I finished.)

The method I landed on is amplitude thresholding. Filter the signal down to its low frequencies, where the kick drum and bass live. Count how often the signal crosses a threshold. The rhythm of those crossings is the rhythm of the beat, and the intervals between them give you a BPM candidate. Collect enough candidates and the most frequent one is your tempo.

The same math, one chunk at a time

Here's the part I'm quietly proud of, and it's almost embarrassingly simple: the algorithm never needed the whole file.

Counting threshold crossings is the exact same computation whether you run it across a four-minute track or across one small block of samples that just landed. So I don't run it once over a file — I run it continuously over the stream. Each chunk of PCM arrives, gets filtered, gets its crossings counted, and drops its candidates into a running tally. The tempo isn't computed in one shot; it emerges from the accumulation. Nothing about the math changed — only when it runs.

That's the whole trick. And it's why it works on a YouTube node it can never fully download: it never needed the full file, only the next chunk.

It's a clean idea. It also has a ceiling baked in from the first line of code, and it took me years of maintaining the thing to fully respect where that ceiling sits.

Where the Ceiling Is

Amplitude thresholding lives and dies by the bass. When low frequencies are loud, regular, and sit on the grid, it's excellent. Electronic music is the perfect case — kicks are quantized, repetitive, and impossible to miss. On that genre the library holds around 85–90% accuracy, fully in the browser, which I still think is a good result for counting threshold crossings.

Push outside that and it degrades. Drum and bass has fast, syncopated kick patterns that don't land cleanly on the beat; the crossing intervals come out irregular and the BPM candidates get noisy. Some DnB tracks work, some don't, and it depends entirely on the track. Jazz, ambient, anything without a consistent kick signature — the method has almost nothing to grab onto. The tempo is in there, but not in a form threshold-counting can extract.

The important thing I had to internalize: this isn't a bug. It's the architecture. Amplitude thresholding is a statistical tempo estimator. It doesn't know where a beat is — it infers, over time, how often beats tend to happen.

The Request That Proved It

A couple of users asked for the same feature: fire an event on every kick drum so they could drive a stroboscope, flash lights in sync with the music. It's a great idea. I had to say no, and explaining why is the cleanest way I know to describe the ceiling.

The library can't do per-kick events because it isn't detecting kicks. It's accumulating crossings and waiting until it's confident about a rate. By the time it's sure the tempo is 128 BPM, the individual hits that led to that conclusion are long gone, and the timing it produces was never precise enough to flash a light on the downbeat. Asking it to fire on each kick is asking a statistical estimator to behave like a real-time detector. Different tool, different architecture.

Saying no to the stroboscope taught me something I value: knowing exactly where your tool stops, and saying it out loud, beats pretending it can do everything.

Engineering Everything Except the Ceiling

For ten years I've improved every part of this library that isn't the ceiling. Those chunks have to come from somewhere, and at first they came through ScriptProcessor — which runs on the main thread, fighting the UI for time and jittering the sampling under load. When a user opened an issue asking for AudioWorklet, I migrated: the worklet delivers raw PCM in 128-sample blocks on a dedicated audio thread, isolated from rendering, and the sampling steadied the moment it shipped.

Then there was the release that didn't work at all — not degraded, broken — which I found out about from user reports. That one rewired how I read my own diffs before pushing to npm. The cost of breaking a stranger's project lands differently than breaking your own.

The biggest jump came from documentation. For years the docs were auto-generated JSDoc — correct, and useless for getting started. When I rebuilt them with real examples and an honest account of what the library can and can't do, stars and downloads climbed. Nobody wants to try and die on your library: they have a need, they found you, and your only job is to make the path to working code frictionless.

But none of this raised the ceiling. It made the library trustworthy up to the ceiling. Two different kinds of work, and it took me a while to stop confusing them.

The Only Way Through Is a Model

For a while I imagined I could break the ceiling with more classical signal processing — combine several algorithms, cross their outputs. In a browser, in real time, it isn't sustainable: those approaches are hungry for CPU a client-side library doesn't have.

The only path I believe in now is a learned one. I'm building a browser-inference pipeline alongside the heuristic engine: a Beat This!–derived transformer trained on log-mel spectrograms, exported through PyTorch → ONNX so it runs entirely client-side and keeps the philosophy the whole project was founded on — your audio never leaves your machine.

Where it stands, honestly: preparation stage. I have around 1,000 tracks with high-quality BPM metadata — enough for a real prototype — and I plan to generate derivatives from them to widen the set. The wall isn't the architecture, it's the data. Accurate, human-verified BPM labels across genres are rare, often private, and valuable precisely because they're so hard to produce — the same reason good commercial detectors stay closed. It's a hobby; it'll get there when motivation and time line up. But for the first time the ceiling has a door in it.

What the Ceiling Taught Me

Corporate work rarely makes you live with one decision for ten years. You build, you ship, you move on. You don't often sit with a single method long enough to map its boundary precisely, defend that boundary to users who want more, and feel the weight of strangers depending on you to be right.

That's what a decade of maintaining one library in public gave me. Not the satisfaction of a thing that works — the discipline of knowing exactly where it stops, saying so plainly, and treating that edge not as a failure but as the brief for what to build next.