Likeness — ML & Infrastructure Lead Technical Brief¶

What this is¶

Context for the ML & Infrastructure Lead role at Likeness. It assumes you've read the founder brief and the cofounder JD already; it doesn't re-explain the business case. What it does is lay out the technical posture as I currently see it, so that we walk into our first conversation with shared framing — and so you can push back on the parts you'd do differently before signing on.

Two things to hold in mind while reading.

First, the constraint that defines the role: the platform's central promise to creators is that their AI likeness never leaves it. Every architecture decision routes through that constraint. If a feature would technically work but would weaken that commitment, we don't ship it.

Second, the role itself: this is cofounder-level work, not vendor-level. You'd own the technical posture end-to-end, from training pipeline to inference service to provenance layer. The non-technical context — compliance, consent, creator trust — isn't something to work around. It's the design constraint.

What we're building, technically¶

The product surface is image generation gated by per-creator license rules. A subscriber pays for credits, prompts the system within constraints the creator has set, and receives a watermarked output. Some outputs are submitted to the creator for approval; some are private; all are revocable.

Underneath that, the technical stack we're starting with:

Base model. Open-weights image generation model, frozen at inference time. Working assumption: Flux.1 Dev with a defensible community fine-tune, or SDXL with an established adult-content community base. Not Stable Diffusion 3.x given the alignment posture.
Per-creator adapter. LoRA (~50–200MB) trained per creator on 30–100 curated source images, using a DreamBooth-style methodology. Stored encrypted, isolated per creator, never exported.
Face / identity conditioning. IP-Adapter FaceID, InstantID, or PuLID-style face locking layered on top of the LoRA. The LoRA captures aesthetic and embodiment; the face adapter enforces specific facial likeness.
Composition control. ControlNet for pose and scene where it doesn't compromise identity.
License-gated inference layer. Every prompt parses against the creator's license object before the model fires. Custom and load-bearing.
Watermarking and provenance. Invisible watermark (Tree-Ring, StableSignature, or successor) embedded at generation, plus signed metadata, perceptual hash, and license ID attached to every output.
Face-matching abuse prevention. Verifies that generated faces match verified-creator faces within tolerance. Catches accidental or deliberate identity bleed across creators.

That's the starting architecture. It's coherent and probably roughly right. It's also the version a senior ML engineer might walk in and reorganize meaningfully — which is the point of this brief.

The hard problems, in order of how much of your time they'll consume¶

1. License-gated inference¶

Every generation request runs through the creator's license object before any model is loaded. The license is a structured rules engine: allowed categories, blocked categories, explicitness ceiling, distribution rules, per-fan permissions. The compliance check has to be:

Fast (sub-100ms)
Auditable (every decision logged)
Conservative on ambiguity (fail closed)
Composable (creator rules + platform-floor rules + per-fan overrides combine cleanly)

You'd own the rule engine design and the prompt parser that feeds it. There's a real ML question buried in here: classifying prompts against rule categories is harder than it looks, especially as fans probe edges. The architecture I'd start from is hybrid — deterministic parser + classifier model + human escalation queue — but the design is yours.

The end-to-end shape, from prompt to delivered output:

flowchart TD
    A[Fan prompt] --> B[Prompt parser]
    B --> C{License check<br/>creator rules + platform floor}
    C -->|Pass| D[Load creator adapter<br/>per-request, isolated]
    C -->|Ambiguous| E[Human review queue]
    C -->|Block| F[Reject + audit log]
    D --> G[Base model + LoRA + face adapter]
    G --> H[Generated image]
    H --> I[Watermark + signed metadata + perceptual hash]
    I --> J{Output face match}
    J -->|Match| K[Deliver to fan]
    J -->|Mismatch| L[Hold + flag for review]

2. Per-creator adapter isolation and security¶

Per-creator LoRAs live in encrypted storage with strict access controls. You'd design:

The model registry (signed, audited, no plaintext weights at rest)
Inference workers that load adapters per-request, never co-mingled
Access logging at every layer
Key management for the encryption layer

The promise that "weights never leave the platform" is an architecture commitment. It means no API access to weights, no signed URLs that could leak, no developer convenience that creates an export path. It also means designing inference so that even a compromised inference worker can't exfiltrate a full adapter — a non-trivial threat model.

3. Face matching for abuse prevention¶

Two threat models worth distinguishing.

The first: a fan tries to inject a reference image of someone other than the creator they're subscribed to. The hard rule is no third-party reference uploads, but if any face-conditioning input is permitted at any point in the pipeline, there's an attack surface. The face matcher needs to verify that any face entering the pipeline matches a verified person.

The second: a fan extracts or coerces an output containing a face other than the verified creator. Could be intentional (prompt manipulation) or accidental (model bleed). The face matcher on the output side catches this and either rejects or flags for review.

The ML problem is genuinely interesting: identity-preservation at inference time and identity-blocking against an adversarial population, with legal exposure for false negatives and creator dissatisfaction for false positives. One of the more nuanced systems on the platform.

4. Watermark robustness¶

Watermarks need to survive, at minimum: JPEG recompression, screenshot-and-recapture, mild cropping, and standard image editing. They don't need to survive a determined adversarial attack — that's not the threat model. The threat model is: a leak shows up on a clip site, we want to prove provenance and tie it to a license and buyer.

Open question: do we use a pre-trained watermarking technique or train our own? The off-the-shelf options are mature enough for v0; longer-term it's worth evaluating whether a custom or layered watermark gives us better takedown evidence.

5. Cost, latency, and quality on inference¶

Per-image inference cost scales with usage. The unit economics depend on:

How aggressively we batch
Whether we use distilled / accelerated variants (Flux Schnell, SDXL Turbo) for speed-critical paths
How often we trigger the face adapter and ControlNet (every generation, or only when needed)
GPU choice and provider mix

You'd own the cost / latency / quality curve as we scale. The closed beta is when we find out whether retail credit pricing clears against real inference cost. That's a conversation you'd lead.

What's settled vs. what's up to you¶

Settled (non-negotiable)¶

No model export. Weights never leave the platform under any circumstances.
No third-party reference uploads. Every face in the system is identity-verified.
License-gated inference. Every prompt checks against rules before any model loads.
Per-output provenance. Watermark + perceptual hash + signed metadata on every generation.
Per-creator adapter isolation. No bleed across creators by design.

These are commitments to creators that the rest of the architecture serves. If you'd structure any of these differently, that's a conversation worth having before you join — not after.

Open (your call, with input)¶

Specific base model and community fine-tune
LoRA training methodology (vanilla LoRA vs. DreamBooth + LoRA vs. occasional full fine-tune for flagship creators)
Face conditioning technique (IP-Adapter vs. InstantID vs. PuLID vs. roll our own)
Watermarking technique (off-the-shelf vs. custom vs. layered)
Inference infrastructure stack (managed cloud GPUs vs. dedicated hardware)
Quality evaluation methodology
Whether to support multiple base models per creator (different aesthetic options) or one
How aggressively to use distilled / accelerated model variants

These are decisions where your judgment beats mine. I'd expect opinions, not deference, on most of them.

First six months¶

Roughly:

Months 1–2. Architecture review and ratification. Base model choice, training pipeline design, inference architecture, and the design of the license-gated inference layer. Stand up dev environment, model registry, evaluation harness.
Months 2–4. Train the first two or three creator LoRAs as concierge prototypes, with the creators in the room. Validate quality, refine the pipeline. Stand up the inference service with license checking and watermarking. Ship basic face matching and abuse detection.
Months 4–6. Onboard the rest of the concierge cohort. Ship a working v0 of the fan generation interface. Begin measuring real cost, latency, and quality with paying users. Tighten the abuse-detection layer based on what fans actually try.

By month 6 we should have honest answers to whether the unit economics hold, whether the moderation load scales, and whether creators feel the workflow respects them. Those are the gating questions for a seed round.

A few hard questions worth pushing on before you join¶

These are the kind of questions I'd want a candidate to ask me, because they signal you're taking the role's actual responsibility seriously.

Base model provenance. Most capable adult-content community fine-tunes have unclear training data provenance. What's our defensible story when a payment processor's compliance team asks? Do we accept the risk of a community base, or do we eventually train our own from consented data — which is genuinely expensive (millions, not thousands) and would shift the company's capital plan meaningfully?
Face-matching false positive tolerance. If we're aggressive about blocking outputs whose faces don't match the creator's, we'll occasionally block legitimate generations and frustrate creators. If we're permissive, we let through content that exposes the platform legally. What's the right operating point, and what data would we need to find it?
Watermark adversarial robustness. What happens when, not if, someone publishes a method for removing our watermark? What's the response plan? Is there a layered approach that buys resilience?
Catastrophic-failure planning. What does the architecture look like if a model leak happens despite our controls? What's the response, what's the customer comms posture, what's the technical recovery? "It can't happen" isn't an answer.
The video question. Video is on the roadmap but explicitly excluded from MVP scope. What's the architectural posture today that lets us add video later without rebuilding? Or is that wishful thinking, and video genuinely requires a separate architecture?

If you have strong takes on these — including takes that contradict the assumptions in this brief — that's the conversation I want to have.

Closing¶

This document represents one founder's working technical model of the platform. The reason we're hiring an ML & Infrastructure Lead at the cofounder level is that this model needs to be challenged, refined, and in some cases discarded, by someone who has built systems like this before. If you read this and your reaction is "most of this is roughly right, here's what I'd change," we should talk. If your reaction is "this is mostly wrong, here's how I'd rebuild it," we should especially talk.

Cash compensation is modest until pre-seed close. Equity is meaningful. The work, done right, is consent infrastructure for synthetic creator media — starting in the place where the unauthorized-use problem is loudest, and extending from there.