Lessons from Migrating 2 Million Assets to Cloudinary

February 15, 20265 min read
cloudinarymigrationnode.jsazurearchitecture

When someone tells you they need to migrate "a lot of images," your first question should be "how many?" When the answer is two million assets totaling several terabytes, the follow-up is: "and what's the API rate limit?"

I spent a significant stretch of my career working with FIBA — the International Basketball Federation. They manage competitions worldwide, and with that comes an enormous volume of digital assets: game photos, team logos, player headshots, competition branding, broadcast stills. All of it lived in a legacy system that wasn't scaling. The plan was to move everything to Cloudinary.

What I learned during that migration applies to any large-scale data move, not just media assets.

The Scale Problem

Two million assets sounds like a number you can brute-force. Write a script, loop through the files, upload them. Except Cloudinary, like every managed service, has rate limits. And quota limits. And bandwidth limits. And when you're pushing terabytes through an API, every one of those limits becomes a wall you hit at 3 AM.

The first version of the migration script was naive. It pulled assets from the source, uploaded them to Cloudinary, and logged the result. It worked fine for the first ten thousand. Then it started hitting 429s. Then it started timing out. Then it started losing track of which assets had been uploaded and which hadn't.

Lesson one: a migration at this scale is not a script. It's a system.

Building the Migration Pipeline

I rewrote the migration as a proper pipeline in Node.js. The key design decisions:

  • Chunked processing with configurable concurrency. Instead of firing requests as fast as possible, the pipeline processed assets in batches with tunable parallelism. Too aggressive and you hit rate limits. Too conservative and the migration takes months.
  • Idempotent operations with a state store. Every asset got tracked in a database. If the script crashed or hit a limit, it could resume exactly where it left off. No duplicates, no gaps.
  • Exponential backoff with jitter. When Cloudinary pushed back with a 429, the pipeline backed off, waited, and retried. The jitter prevented thundering herds when multiple workers resumed simultaneously.
async function uploadWithRetry(asset, attempt = 0) {
  try {
    const result = await cloudinary.uploader.upload(asset.url, {
      public_id: asset.targetPath,
      resource_type: 'auto',
      overwrite: false,
    });
    await markAsCompleted(asset.id, result);
  } catch (err) {
    if (err.http_code === 429 && attempt < MAX_RETRIES) {
      const delay = BASE_DELAY * Math.pow(2, attempt) + Math.random() * 1000;
      await sleep(delay);
      return uploadWithRetry(asset, attempt + 1);
    }
    await markAsFailed(asset.id, err);
  }
}

One thing I underestimated: background transformations eat into your quota. Cloudinary lets you trigger things like background removal asynchronously during upload. Useful, but each transformation counts against your plan's limits. We had to carefully schedule heavy transformations during off-peak windows and monitor quota consumption in near-real-time to avoid blowing through monthly limits mid-migration.

Automatic Categorization With the Strategy Pattern

The migration wasn't just about moving files from A to B. FIBA needed assets to land in the right place and get tagged correctly. A photo from a EuroBasket quarterfinal needed to end up in the right competition folder, tagged with the right teams, linked to the right game.

I built an automatic categorization system using Azure App Functions and Cloudinary Webhooks. The flow:

  1. An asset gets uploaded or moved in Cloudinary.
  2. Cloudinary fires a webhook notification.
  3. An Azure App Function receives the notification and determines the asset type based on metadata, folder path, and naming conventions.
  4. A Strategy pattern selects the right categorization logic — competition assets, team assets, player assets, and editorial assets each had different rules.
  5. The function updates the asset's metadata and structured tags in Cloudinary.
Upload/Move → Cloudinary Webhook → Azure Function → Strategy Selection → Metadata Update

The Strategy pattern was the right call here. Each asset category had wildly different rules for tagging and folder placement, and new competition formats got added regularly. Adding a new strategy was a single file — no changes to the routing logic.

We also automated folder structure creation. When a new competition was registered in the system, a background job would scaffold the entire folder hierarchy in Cloudinary: folders for each team, each game, each round. By the time photographers started uploading, the structure was waiting for them.

What Went Wrong

Plenty.

Character encoding in folder names. Some competition names included accented characters or non-Latin scripts. Cloudinary handles Unicode, but our mapping scripts didn't always preserve encoding correctly. We found orphaned assets in mangled folder paths weeks into the migration.

Duplicate detection was harder than expected. The source system had duplicate assets — same file, different metadata, different locations. We had to decide: deduplicate and merge metadata, or preserve duplicates? We chose to deduplicate by content hash, but merging metadata from multiple source records into one Cloudinary asset introduced edge cases we were still fixing late in the project.

Webhook delivery isn't guaranteed. Azure App Functions occasionally dropped webhook events under load. We added a reconciliation job that ran nightly, comparing the source of truth against Cloudinary's actual state and reprocessing anything that fell through the cracks.

Key Takeaways

If you're planning a migration at this scale, here's what I'd tell you:

Treat it as a product, not a task. A two-million-asset migration has its own backlog, its own monitoring, its own incident response. Scope it accordingly.

Build observability from day one. We had dashboards tracking upload rates, failure rates, quota consumption, and estimated completion time. Without them, we would have been flying blind.

Plan for the long tail. The first 80% goes fast. The last 20% — the edge cases, the encoding issues, the orphaned references — takes as long as everything else combined.

Automate the boring parts ruthlessly. Folder creation, metadata tagging, quota monitoring — anything a human has to do repeatedly will eventually be done wrong. Automate it.

The migration took months. It wasn't glamorous work. But the system we built didn't just move assets — it gave FIBA a categorization engine that kept working long after the migration was done. The webhooks, the strategies, the automated folder scaffolding — all of that became part of the everyday workflow.

Sometimes the most impactful engineering isn't building something new. It's moving something old to a better place, carefully.