Member of Technical Staff - Model Serving / API Backend Engineer

Black Forest Labs · Freiburg (Germany), San Francisco (USA)

onsitefull-timesenior levelUSD 180k – USD 300k

About this role

About Black Forest Labs

We're the team behind Latent Diffusion, Stable Diffusion, and FLUX—foundational technologies that changed how the world creates images and video. We’re creating the generative models that power how people make images and video—tools used by millions of creators, developers, and businesses worldwide. Our FLUX models are among the most advanced in the world, and we're just getting started.

Headquartered in Freiburg, Germany with a growing presence in San Francisco, we're scaling fast while staying true to what makes us different: research excellence, open science, and building technology that expands human creativity.

Why This Role

Our research team moves fast. Models improve weekly. New capabilities emerge constantly.

What slows us down is not model quality—it’s productionization.

Without this role:

Research checkpoints sit longer before becoming usable APIs
Inference is slower than it needs to be
APIs struggle under load
Demos don’t reflect the true potential of our models

This role removes the bottleneck between frontier research and production reality. Once hired, researchers ship faster, demos launch faster, and customers experience models at their best.

What You’ll Work On

You will own the bridge between research breakthroughs and production systems.

Turn research checkpoints into production-ready inference services
Design and maintain high-performance APIs serving millions of requests
Optimize inference latency and throughput across GPU infrastructure
Build scalable serving architectures that handle unpredictable traffic
Improve reliability, monitoring, and observability across model-serving systems
Prototype and ship demos that showcase new capabilities in days, not weeks
Collaborate closely with researchers to move from idea to live endpoint rapidly

Tools & Context – Model Serving & API Infrastructure

Python, FastAPI, async systems
GPU infrastructure, CUDA, inference optimization
Docker and Kubernetes
Redis, Postgres, distributed task queues
Cloud platforms (AWS, GCP, or Azure)
Observability stacks (metrics, logging, tracing)

This role spans backend systems, GPU performance, and production ML serving.

What We’re Looking For

You’ve built and operated systems at meaningful scale. You understand the difference between a research prototype and a production system. You are comfortable navigating ambiguity, making tradeoffs, and improving systems under real-world constraints.

You demonstrate:

Strong judgment around performance, reliability, and cost tradeoffs
Experience scaling APIs or ML systems under load
Comfort working in fast-moving, research-adjacent environments
Ownership from system design through debugging and deployment

Role-specific experience we value:

Building and operating ML inference services in production
Designing scalable API architectures with async processing
Optimizing GPU workloads (batching, quantization, compilation, CUDA)
Managing distributed systems and task queues under variable load
Implementing monitoring and observability for production ML systems
Debugging performance bottlenecks across model, infrastructure, and network layers

Bonus experience includes:

Real-time or low-latency inference systems
TensorRT, reduced precision, layer fusion, or model compilation techniques
Frontend demo tooling (Streamlit, Gradio, React)
CI/CD and automated testing for ML systems
Security best practices for API and model serving

Base Annual Salary: $180,000–$300,000 USD

We're based in Europe and value depth over noise, collaboration over hero culture, and honest technical conversations over hype. Our models have been downloaded hundreds of millions of times, but we're still a ~50-person team learning what's possible at the edge of generative AI.

About Black Forest Labs

At Black Forest Labs, we’re on a mission to advance the state of the art in generative deep learning for media, building powerful, creative, and open models that push what’s possible. Born from foundational research, we continuously create advanced infrastructure to transform ideas into images and videos. Our team pioneered Latent Diffusion, Stable Diffusion, and FLUX.1 – milestones in the evolution of generative AI. Today, these foundations power millions of creations worldwide, from individual artists to enterprise applications.

Website

Jobb.ai is an independent skill benchmarking platform. Applications are submitted on the employer's official website.