Context

ChalkboardAI

A self-hosted AI math tutor that knows what your instructor actually taught. Running entirely offline on a Mac Studio for FERPA and FOIA compliance. Self-directed, in development.

Role
Product, design, engineering
Timeline
5-day build, 2026, actively iterating
Stack
React, TypeScript, Ollama, gemma3:27b, Fabric.js, IndexedDB, Whisper.cpp, Kokoro TTS, built with Claude Code
ChalkboardAI interface showing the infinite canvas with a math problem and AI tutor chat panel

I've taught statistics and calculus to 500+ students at Washtenaw Community College over the years. AI tutors are already in my students' hands, and the ones they're using don't know what happened in our classroom. They fill gaps I didn't leave, and they miss the ones I did.

I wanted to prototype the version that starts from the lecture itself. I gave myself five days to test the architectural thesis end-to-end and built it solo to understand the full stack of decisions involved when no one else is making them. Work continues.

Problem

Generic AI tutors don't see the lecture, so they can't reinforce the specific notation, approach, example sequence, and vocabulary the instructor used. They can also do real harm: by default, they'll cheerfully give a student the answer instead of helping the student arrive at it, undercutting the practice that learning math actually requires.

The product has to do two things at once. It has to ground every response in the specific lecture the student's instructor taught, with citations back to the source material. And it has to behave as a tutor (asking, prompting, scaffolding) rather than as an answer machine.

Approach

Local-first, by necessity. Every part of the AI stack runs on a single Mac Studio: inference (Ollama with gemma3:27b), transcription (Whisper.cpp), embeddings (nomic-embed-text), and text-to-speech (Kokoro). Student interactions with AI tutors are educational records under FERPA, and transcript logging matters under FOIA. On-premise is the right architecture for real institutional deployment, and it zeroes out per-student API cost in the process.

What got built. Student-facing v1, end-to-end against a synthetic curriculum. An infinite spatial canvas (pen, eraser, shapes, image paste, math-aware undo, light/dark pen pairing) where students work the problem. A vision-capable local model that reads the board each turn and answers in context, with citation badges that link back to the specific lecture each response is grounded in. Chat sessions scoped per course and lecture, with rolling summarization so long sessions don't lose earlier work. Whisper.cpp for voice input from the browser, Kokoro TTS for spoken responses, and math read aloud through the same accessibility pipeline MathJax uses for screen readers.

The design foundation was built in Claude Design: color tokens, type scale, and component variants across light and dark modes, treated as a documented artifact rather than ad-hoc styles.

ChalkboardAI design system light mode, built with Claude Design ChalkboardAI design system dark mode
Built in Claude Design: tokens, type, and component variants across light and dark modes.

Decisions

There were four moments during the build where I needed to understand the technical tradeoffs before I could decide. Each one started as a back-and-forth with Claude, acting as my software engineer.

01

Routing math computation through a verified solver

The base LLM could not be trusted to do arithmetic. On the same problem (the limit of f(x) = 12 − 5x as x approaches 2, answer: 2), it returned 7, then 3, then −8 in a single session. A math tutor that gets the math wrong is worse than no tutor at all.

Better prompting wouldn't fix it; the model was just computing badly. I connected the model to a math solver for computation, so the LLM handles reasoning and explanation while verified tool calls handle the arithmetic. The model went from three wrong answers on one problem to consistent accuracy on the same class of questions.

Early LLM chat: model confidently gives wrong answer of 7 for a limit, then recalculates to 3 First working session: ChalkboardAI canvas with the math solver connected

What I learned. When the failure is at the model's core capability, prompting harder is the wrong move. Tool use is the right one.

02

Rolling summary over sliding window for context management

Mid-session, the model started slowing down and drifting on format rules. The obvious fix was a sliding context window. The obvious fix was wrong: it would have broken "the AI remembers what we worked on twenty minutes ago," which is first-class important for a tutor.

I implemented rolling summarization instead. Preserves continuity at the cost of a second model call.

What I learned. Designing for context pressure from the start would have avoided the degradation. Context management is a day-one concern on the next build, not a thing to address when it breaks.

03

Inline chat artifacts over a duplicated help zone

An early version had a top-of-sidebar "help canvas" where AI-generated math items appeared. Watching it work, the duplication was the problem: users had to track where a generated item lived in two places, the chat and the help canvas.

I removed the help zone. AI-generated items now render inline under the chat bubble that produced them, still draggable onto the main board.

Figma wireframe showing the earlier layout with a separate AI help canvas on the left sidebar

What I learned. When in doubt, fewer surfaces.

04

Accessibility-first as a tiebreaker default

I'd written a working bespoke regex pipeline for speaking math notation aloud. It worked, but each notation type required case-by-case authoring. When I checked whether a standard existed, the Speech Rule Engine (the standards-based pipeline MathJax uses for screen readers) covered a much broader set of notation correctly out of the box. The trade was about 300KB of bundle size.

I deleted the bespoke pipeline.

What I learned. When an accessibility-aware library exists and works, it wins by default. The bundle-size cost is almost always smaller than the cost of maintaining a hand-authored equivalent.

Outcome

Student-facing v1 is working end-to-end against a synthetic curriculum. Actively iterating.

Five days ships a working v1. Five days does not ship a validated product. Here's what's still unvalidated:

See more work