Building a Real-Time BPM Analyzer That Runs Entirely in the Browser

December 10, 20255 min read
open-sourceweb-audiotypescripttensorflowaudio-engineering

In 2015, I was building LoveLiveMusic, a music event aggregation platform, and I needed a way to analyze audio tracks in the browser. Specifically, I wanted to detect tempo — beats per minute — without sending audio to a server. The use case was simple: let DJs and music enthusiasts preview tracks and get instant BPM readings.

The problem was that nothing existed for this. Server-side audio analysis had mature tools. Python had librosa. Command-line tools like aubio worked well. But in the browser? The Web Audio API was still relatively new, and nobody had built a reliable real-time BPM detection library that worked entirely client-side. So I built one.

That library, realtime-bpm-analyzer, is now at 244+ GitHub stars and over 1,000 monthly downloads on npm. It's been integrated into audio toolboxes, DJ applications, and music production workflows I never anticipated. Here's what I've learned building and maintaining it over the years.

How Tempo Detection Works in a Browser

The Web Audio API gives you access to raw audio data through AnalyserNode and, more recently, AudioWorklet. The fundamental approach to BPM detection is peak detection in the frequency domain: you analyze the audio signal, identify rhythmic peaks that correspond to beats, measure the intervals between them, and calculate the tempo.

The first version used a straightforward approach. I'd connect an audio source to an AnalyserNode, pull frequency data at regular intervals, apply a low-pass filter to isolate the kick drum frequencies (which carry the strongest rhythmic signal in most music), and then run peak detection on the filtered signal. When you find consistent intervals between peaks, you have a BPM candidate.

It sounds simple in theory. In practice, it's full of edge cases. Music with complex polyrhythms produces multiple competing BPM candidates. Tracks with breakdowns or tempo changes confuse interval-based detection. And the quality of the audio source — compressed streaming versus lossless files — affects the accuracy of the frequency analysis significantly.

The Evolution: From AnalyserNode to AudioWorklet

The biggest technical leap came with migrating to AudioWorklet. The original AnalyserNode approach had a fundamental limitation: it runs on the main thread. If the UI is doing anything — rendering visualizations, handling user input — the audio analysis competes for resources. You get dropped frames and inconsistent sampling, which degrades BPM accuracy.

AudioWorklet solves this by processing audio in a dedicated thread. You write a processor class that runs in an AudioWorkletNode, completely isolated from the main thread. The processor receives raw PCM audio data in blocks of 128 samples, which you can accumulate and analyze without worrying about UI jank.

The migration wasn't trivial. The AudioWorklet API has a different programming model — you're working with Float32Array buffers, managing your own state, and communicating with the main thread via MessagePort. Error handling is tricky because exceptions in the worklet don't propagate naturally. I had to build a message protocol between the worklet processor and the main thread to report results, handle errors, and manage the lifecycle.

But the payoff was real. Analysis became more consistent, the library could handle longer audio streams without degrading UI performance, and the separation of concerns made the codebase easier to reason about.

What Open Source Teaches You

Reaching 244+ stars and consistent monthly downloads is gratifying, but the real value of maintaining an open-source library is what it teaches you about API design, backwards compatibility, and documentation.

API design is a commitment. Every public method signature, every configuration option, every event name — once people depend on it, changing it has a cost. I learned to be conservative about what I expose. Early versions of the library had too many configuration knobs because I thought flexibility was always good. It isn't. Most users want sensible defaults. Power users want escape hatches. Nobody wants to read 40 configuration options to get started.

Backwards compatibility is harder than new features. When I migrated from AnalyserNode to AudioWorklet, I had to maintain both code paths because not all browsers supported AudioWorklet at the time. That meant running two parallel implementations, with a feature detection layer that chose the right one. It would have been cleaner to drop the old approach. But real users had real code depending on it.

Documentation is product work. The issues that reduced the most over time weren't bug fixes — they were documentation improvements. A clear "Getting Started" section with a working code example eliminated more support requests than any code change. I learned to treat README updates with the same seriousness as code changes.

The library also gained visibility through integration with the allegro-project.com audio toolbox, which introduced it to a wider audience of audio developers and producers. That kind of organic adoption — someone finds your tool useful enough to include in their own project — is the best validation open source can offer.

Current R&D: Teaching a Neural Network to Hear Tempo

The peak detection approach works well for most electronic music, but it struggles with genres where the rhythmic structure is less obvious — jazz, ambient, progressive rock. The intervals between peaks are inconsistent, and the algorithm produces noisy BPM candidates.

I'm currently experimenting with a TensorFlow.js model trained to detect tempo from spectrograms. The idea is straightforward: convert audio to a mel spectrogram, feed it to a convolutional neural network, and predict BPM directly. The model should learn rhythmic patterns that are too subtle for rule-based peak detection.

The hardest part isn't the model architecture — it's the training data. I need thousands of audio samples with accurate BPM labels across diverse genres. Existing datasets are either too small, genre-limited, or have licensing restrictions. So I'm generating synthetic training data: programmatically creating audio samples with known tempos, varying the instrumentation, time signatures, and audio quality to build a diverse dataset.

It's early days. The model shows promise on electronic music (where the existing algorithm already works well) but hasn't yet cracked the harder genres reliably. The goal is to run inference entirely in the browser using TensorFlow.js, keeping the same client-side philosophy that makes the library useful.

What Corporate Work Doesn't Teach You

Maintaining an open-source library has made me a better engineer in ways that professional work alone wouldn't have. In a corporate setting, your users are colleagues or internal stakeholders. They have context. They can ask you questions in Slack. They read the same Confluence pages. They tolerate rough edges because they know you'll fix them next sprint.

Open-source users have none of that context. They find your library through npm search, read your README for 30 seconds, try the quickstart example, and either adopt it or move on. If your API is confusing, they won't file an issue — they'll find an alternative. That brutal feedback loop teaches you to think about developer experience with a clarity that's hard to develop otherwise.

It also teaches patience. Issues come in at random times, from people with varying levels of expertise, often without enough information to reproduce the problem. Learning to respond helpfully, ask the right diagnostic questions, and close issues gracefully is a skill that transfers directly to engineering leadership.

If you're an engineer who hasn't tried maintaining an open-source project, I'd encourage it. Pick something small that solves a problem you actually have. The code is the easy part. The maintenance — documentation, issue triage, API evolution, community — is where the real learning happens.