From bake photos to structured JSON: a multi-file map for sourdough vision

This post zooms in on one slice of mylevain — the path from optional loaf and crumb photos to structured vision output and downstream coaching. For the broader product arc, see I built an AI agent to help me bake better sourdough.

Two layers: (A) what I shipped in code — Claude Vision on bake completion (loaf / crumb photos) — and (B) a short, honest future note on optional photos in the recipe adapter (not implemented in my app today).

A — What I shipped: vision after the bake

The problem

Structured fields from the starter tracker and recipe adapter (grams, hydration, ratios, timelines) are great for planning a bake. They do not, by themselves, capture what the loaf looked like when it came out of the oven. A photo encodes crust, bloom, and crumb in ways numbers do not. My goal was to turn optional photos into persistent, structured feedback the rest of the app can reuse — for example in coaching copy that compares bakes over time.

Where it hooks in (multi-file path)

I did not put vision in the recipe agent’s tool loop. It lives on a dedicated API route I wrote that orchestrates upload, image fetch, one multimodal model call, and persistence.

UI → server: In BakeFinishModal, optional files become data URLs and are POSTed as JSON (loafPhoto / crumbPhoto) to /api/bake/[id]/analyze-loaf.

// components/bake/BakeFinishModal.tsx (excerpt ~L86–98)
async function runBackgroundAnalysis(loaf: File | null, crumb: File | null) {
  if (!loaf && !crumb) return;
  onAnalysisStart?.();
  try {
    const body: { loafPhoto?: string; crumbPhoto?: string } = {};
    if (loaf) body.loafPhoto = await fileToDataUrl(loaf);
    if (crumb) body.crumbPhoto = await fileToDataUrl(crumb);

    const res = await fetch(`/api/bake/${bakeId}/analyze-loaf`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(body),
    });

Route responsibilities: My route validates auth and bake state, optionally uploads new images to blob storage and updates loafPhotoUrl / crumbPhotoUrl, then downloads public URLs to base64 + media type for Claude, builds the vision payload, parses JSON from the model, and persists analysis on the bake.

// app/api/bake/[id]/analyze-loaf/route.ts (excerpt ~L72–131)
try {
  if (hasNewLoaf) {
    loafUrl = await uploadImage(
      loafPhoto!,
      `loaf/${params.id}/${Date.now()}.jpg`
    );
    await updateBake(params.id, userId, { loafPhotoUrl: loafUrl });
  }
  if (hasNewCrumb) {
    crumbUrl = await uploadImage(
      crumbPhoto!,
      `crumb/${params.id}/${Date.now()}.jpg`
    );
    await updateBake(params.id, userId, { crumbPhotoUrl: crumbUrl });
  }
} catch (err) {
  // ...
}
// ...
try {
  if (loafUrl) {
    const { mediaType, base64 } = await fetchImageAsVisionParts(loafUrl);
    visionInput.push({
      mediaType,
      base64,
      label: "exterior",
    });
  }
  if (crumbUrl) {
    const { mediaType, base64 } = await fetchImageAsVisionParts(crumbUrl);
    visionInput.push({
      mediaType,
      base64,
      label: "crumb",
    });
  }
} catch (err) {

Vision call (no tools): In runLoafVisionAnalysis I construct multimodal user content — text labels plus image blocks with base64 sources — and ask for strict JSON matching a documented shape.

// lib/agents/loaf-vision-agent.ts (excerpt ~L48–76)
for (const img of images) {
  content.push({
    type: "text",
    text:
      img.label === "exterior"
        ? "Image — EXTERIOR (whole loaf, crust and shape):"
        : "Image — CRUMB (cross-section of interior):",
  });
  content.push({
    type: "image",
    source: {
      type: "base64",
      media_type: img.mediaType,
      data: img.base64,
    },
  });
}

content.push({
  type: "text",
  text: "Respond with the JSON object only.",
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5-20250929",
  max_tokens: 2048,
  system: SYSTEM,
  messages: [{ role: "user", content }],
});

Schema and types

Bake rows store loafPhotoUrl and crumbPhotoUrl (see lib/db/schema.ts).
Parsed vision output is persisted as loafAnalysis (see lib/loaf-analysis.ts and the analyze-loaf route).
I left RecipeConstraints / AdaptedRecipe alone — vision is orthogonal to the recipe adapter types.

Downstream use

I wired visionOverallRead from stored loaf analysis into the bake summary agent’s prompt so coaching copy can correlate ratings, notes, timing, and the model’s fermentation read from photos.

// lib/agents/bake-summary-agent.ts (excerpt ~L23–25)
/** From vision loaf analysis when present */
visionOverallRead: string | null;

// lib/agents/bake-summary-agent.ts (excerpt ~L92–97)
visionOverallRead:
  b.loafAnalysis != null &&
  typeof b.loafAnalysis === "object" &&
  "overallRead" in (b.loafAnalysis as LoafAnalysis)
    ? (b.loafAnalysis as LoafAnalysis).overallRead
    : null,

Design choices that keep this maintainable

Separation from the recipe tool loop — The streaming recipe agent stays focused on baker’s math, weather, and schedule tools. Vision is a single-purpose HTTP + multimodal call, easier to test and tune without touching calculate_recipe or respond_to_user.
Storage before (re)analysis — New photos go to blob; re-analysis can use reanalyze: true and refetch URLs, so the route supports both “first upload” and “run vision again on what’s already stored.”
Shared image helpers — fetchImageAsVisionParts centralizes “public URL → base64 + media type,” which keeps the route readable and matches how I’d add more vision entry points later.

Bridge to the rest of the app

The recipe adapter still runs on text-only constraints: one user message produced by formatConstraintsPrompt, then the tool loop.

// lib/agents/recipe-agent.ts (excerpt ~L252–272)
export async function* runRecipeAgent(
  constraints: RecipeConstraints
): AsyncGenerator<AgentEvent> {
  const client = new Anthropic();
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: formatConstraintsPrompt(constraints) },
  ];

  const MAX_ROUNDS = 6;
  let rounds = 0;

  while (rounds < MAX_ROUNDS) {
    rounds++;

    const response = await client.messages.create({
      model: "claude-sonnet-4-5-20250929",
      max_tokens: 4096,
      system: SYSTEM_PROMPT,
      tools: TOOLS,
      messages,
    });

That intentional split is worth naming: planning (adapter) vs outcome (photos + vision).

Where Cursor helps (honest take)

Tracing modal → route → blob → fetch-as-base64 → vision module → DB → summary agent spans many files. In practice, the failure mode is not “the model prompt is wrong” but “the data never arrived where the next step expected it” — e.g. uploaded but not saved on the row, or wrong field on the JSON body. An editor that lets you jump to definitions, find references, and keep TypeScript in view across the whole Next.js app makes that graph easier to hold in your head than a single-file chat in a generic VS Code + sidecar LLM workflow. Cursor does not replace verifying the story against the real codebase; it lowers the cost of getting the graph right the first time and of revisiting it months later.

Tradeoffs and gotchas

Payload size: The client sends data URLs in JSON for new photos; large images mean larger requests. I mitigated that with blob storage, completion-only timing, and user-facing toasts when analysis fails but photos are saved (BakeFinishModal).
Two vision domains: The loaf/crumb system prompt is tuned for finished bread. A future “starter jar” or “bulk dough” photo feature needs a new prompt and probably a separate small agent module — not a copy-paste of the loaf JSON schema without revision.
Recipe SSE: The adapt endpoint streams SSE from a JSON body today. Bolting images onto the same request would force real product decisions: multipart uploads, size limits, timeouts, and whether vision runs before the tool loop or in parallel. I kept vision on its own route so those concerns stay decoupled from streaming recipe generation.

B — Future follow-up: optional photos in the recipe adapter (not shipped)

A natural extension would be to let bakers attach an optional photo of starter or dough when adapting a recipe, so the model can reason about activity or gluten development visually as well as numerically. I’d mirror the bake path: extend RecipeConstraints with optional image fields (or upload first and pass URLs), add file inputs to ConstraintsForm, then either (1) send a multimodal first message in runRecipeAgent, or (2) call a small starter-vision-agent (or similar) that returns structured text to prepend to formatConstraintsPrompt. I’d reuse guessMediaTypeFromBase64Prefix / stripDataUrlPrefix from lib/bake/image-fetch.ts where appropriate, and not reuse the loaf JSON schema verbatim without a new spec.

Proposed code (sketch)

Proposed — not shipped. Illustrative TypeScript only; this is not pasted from my app and would need product decisions (upload UX, size limits, SSE vs separate route) before it could ship.

// PROPOSED SKETCH — not in mylevain. Option (1): optional photo on constraints + multimodal first turn.

import type { RecipeConstraints } from "@/lib/…"; // illustrative path
import {
  guessMediaTypeFromBase64Prefix,
  stripDataUrlPrefix,
} from "@/lib/bake/image-fetch";

// `formatConstraintsPrompt` = existing text-only adapter prompt (section A).

type RecipeConstraintsWithOptionalPhoto = RecipeConstraints & {
  /** Data URL or raw base64 from ConstraintsForm; normalize before the API */
  starterOrDoughPhotoDataUrl?: string;
};

function buildFirstUserMessage(constraints: RecipeConstraintsWithOptionalPhoto): {
  role: "user";
  content: string | Array<
    | { type: "text"; text: string }
    | {
        type: "image";
        source: { type: "base64"; media_type: string; data: string };
      }
  >;
} {
  const promptText = formatConstraintsPrompt(constraints);
  const raw = constraints.starterOrDoughPhotoDataUrl;
  if (!raw) {
    return { role: "user", content: promptText };
  }

  const base64 = stripDataUrlPrefix(raw);
  const mediaType = guessMediaTypeFromBase64Prefix(base64);

  return {
    role: "user",
    content: [
      {
        type: "text",
        text: "Optional context — starter or dough photo (activity / gluten development):",
      },
      {
        type: "image",
        source: { type: "base64", media_type: mediaType, data: base64 },
      },
      { type: "text", text: promptText },
    ],
  };
}

// runRecipeAgent would open with [buildFirstUserMessage(constraints)] instead of
// { role: "user", content: formatConstraintsPrompt(constraints) }.

Option (2) in prose — a tiny describeStarterContext call that returns plain or structured text, then concatenate above formatConstraintsPrompt — is the same idea with vision isolated in its own function and no image block in the tool loop’s first turn.