Back to blog

Building profClaw: An AI Agent Engine from Scratch

Why I built profClaw, a local-first AI agent engine that runs on Docker, VPS, or your laptop. Architecture decisions, deployment modes, and what I learned about agentic AI.

9 min read
Gagan Deep Singh

Gagan Deep Singh

Founder | GLINR Studios


The existing AI agent platforms share a common flaw: they assume you want to run your agents in their cloud, on their pricing, with their limitations baked in. Most of the interesting ones are closed-source. The open-source ones are either abandoned, poorly documented, or so opinionated about infrastructure that adapting them to your setup takes longer than building from scratch.

I built profClaw because I wanted an agent engine I could run on a $6/month VPS, a home server, or my laptop, depending on the day. Local-first, no mandatory accounts, no usage metering by a third party. The AI model bills are real enough without paying a platform tax on top.

Why another agent engine

I looked seriously at alternatives before committing to a build. Most fall into one of two failure modes.

The first failure mode is "too much magic." Platforms that hide all the plumbing behind a GUI are great for demos and useless when something breaks in production at 2am. You have no idea what is actually happening. You cannot add a custom tool, tweak retry logic, or instrument a failing workflow without fighting the abstraction layer.

The second failure mode is "too much framework." Some open-source options are built like enterprise software from 2018: abstract everything, wire it together with XML config, and require three services running before you can send a single message. The operational overhead is real.

profClaw is trying to be neither. A single binary (or Docker container) that gets out of the way and lets you define agents, skills, and channels in code.

The deployment modes concept

One thing I am genuinely proud of is the three-tier deployment model: pico, mini, and pro.

Most self-hosted tools are all-or-nothing. Either you run the full stack (which assumes Redis, a message broker, persistent storage, monitoring) or you get nothing. That makes no sense for a solo developer testing an agent on their laptop.

Pico mode is the smallest possible footprint. No Redis, no external dependencies. The task queue runs in memory. Restart the process and you lose queued work. That is fine when you are building and testing a skill. It is not fine in production.

Mini mode is the default. It adds optional Redis for the task queue via BullMQ, giving you persistence and retry without requiring it. If Redis is not configured, it falls back to the in-memory queue gracefully. Most deployments run in mini mode.

Pro mode turns on the full stack: concurrent worker pools, distributed queue, webhook ingestion at scale, and the full integrations layer. This is for when profClaw is running 24/7 handling production workloads.

The mode is set with a single environment variable:

PROFCLAW_MODE=mini  # or pico, or pro

Everything downstream reads from that. No separate config files per environment, no feature flags to toggle. The engine detects what is available and adapts.

Architecture: Hono + BullMQ + skill registry

The server is built on Hono, which is fast, lightweight, and has a clean middleware API. I chose it over Express because it is designed for modern runtimes and the type safety is genuinely good out of the box. The entire API surface is typed end-to-end without hacks.

The task queue is BullMQ in pro and mini modes. BullMQ gives you persistent jobs, retry with backoff, rate limiting per queue, and a solid concurrency model. Wrapping it behind an abstraction lets the engine swap in the in-memory queue in pico mode without changing any calling code.

// The queue interface is the same regardless of backend
interface TaskQueue {
  enqueue(task: Task): Promise<string>;
  process(handler: TaskHandler): void;
  getStatus(taskId: string): Promise<TaskStatus>;
}
 
// In-memory for pico, BullMQ for mini/pro
const queue = createQueue(config.mode);

The skill registry is where agents get their capabilities. At startup, the engine scans the skills/ directory for SKILL.md files and builds a registry of available tools. Each skill defines its name, description, parameters (as a Zod schema), and the handler function. The agentic executor uses this registry when deciding which tools to call.

// A skill file's handler export
export const handler: SkillHandler = async (params, context) => {
  const { repo, issue_number } = params;
  const client = context.getIntegration('github');
  const issue = await client.issues.get({ owner: repo.owner, repo: repo.name, issue_number });
  return { success: true, data: issue.data };
};

The agent never hard-codes tool logic. It reads the skill registry, picks the right tool for the job, and calls it. Adding a new capability means adding a new skill file, not modifying the core.

Chat provider abstraction

One of the design goals was "one agent, many channels." You should be able to configure your agent once and have it respond on Slack, Discord, Telegram, and a web chat widget without duplicating logic.

The provider abstraction handles this. Each chat provider (Slack, Discord, Telegram, WhatsApp, WebChat, Matrix, and several others) implements a common interface: receive a message, normalize it to the internal format, pass it to the execution engine, then format and send the response back through the same provider.

interface ChatProvider {
  id: ChatProviderId;
  connect(config: ProviderConfig): Promise<void>;
  onMessage(handler: MessageHandler): void;
  sendMessage(channel: string, content: Content): Promise<void>;
  disconnect(): Promise<void>;
}

The execution engine does not know or care which provider sent the message. It receives a normalized IncomingMessage, runs the agent, and returns a Response. The provider handles the formatting and delivery. This means the same agent logic produces a Slack block kit response or a Telegram markdown message depending on which provider received the original message.

The skill system

Skills are the agent's capabilities defined as files, not code baked into the engine. Each skill is a directory with a SKILL.md describing what it does (for the agent to read as context) and a TypeScript handler file with the actual implementation.

skills/
  github-issue-triage/
    SKILL.md       # natural language description + parameter schema
    handler.ts     # implementation
  linear-sync/
    SKILL.md
    handler.ts
  code-review/
    SKILL.md
    handler.ts

The SKILL.md file is read at startup and injected into the agent's system prompt as a tool definition. The agent sees a list of available skills, understands when to use each one from the description, and calls them via structured tool use. You do not need to prompt-engineer the tool selection manually. You write a clear skill description and the model figures out when to reach for it.

This architecture means the core engine ships with built-in skills and you extend it by dropping new files into the skills directory. No recompile, no plugin registration, no configuration change beyond pointing at the directory.

Provider-agnostic AI layer

profClaw supports 15+ AI providers via a unified adapter layer built on the Vercel AI SDK. OpenAI, Anthropic, Google Gemini, Mistral, Groq, Cohere, and others all work through the same interface. You configure which model to use, and the engine handles the provider-specific API details.

// Config-driven model selection
const model = createAdapter({
  provider: 'anthropic',
  model: 'claude-sonnet-4-6',
  apiKey: process.env.ANTHROPIC_API_KEY,
});
 
// Or mix models for different tasks
const config: AgentConfig = {
  primaryModel: 'anthropic/claude-sonnet-4-6',
  fastModel: 'openai/gpt-4o-mini',  // for quick lookups
  embeddingModel: 'openai/text-embedding-3-small',
};

Switching providers is a config change, not a code change. This matters more than it sounds. Provider pricing changes, availability changes, and performance for specific tasks varies. Locking your agent engine to a single provider is a liability.

Local-first: why it matters

Running on your own hardware is not just about cost. It is about control.

When you run your agents locally, your conversation data and task history never leave your infrastructure unless you explicitly send them somewhere. For anyone building agents that touch internal tools, private repositories, or customer data, that is a hard requirement, not a preference.

Local deployment also means latency that is predictable. A cloud agent platform that is having a bad day affects all its customers at once. Your VPS does not care what is happening at a cloud provider.

The practical tradeoff is that you own the operational burden. If your server goes down, your agents go down. profClaw's deployment modes exist partly to make this manageable: in pico mode there is nothing to keep running beyond the Node process. In pro mode you are operating a real service and you should treat it like one.

What I learned building an agentic executor

The hardest part of building profClaw was not the infrastructure. It was the agentic execution loop.

Agentic AI is not just "call a tool and return the result." It is a loop: the model calls a tool, gets a result, decides if it needs more information, calls another tool, maybe backtracks, eventually produces output. Managing that loop safely required thinking carefully about a few things.

Tool call sandboxing. A tool that runs code or executes shell commands needs to be isolated. A buggy skill that loops infinitely or consumes unbounded memory should not take down the engine. I added configurable timeouts (default 5 minutes, via POOL_TIMEOUT_MS) and output size limits on every tool call.

Max iterations. Agents can loop. Without a hard cap on tool call iterations, a confused agent will happily call tools forever. The default is 25 iterations per task. Configurable, but always set.

Structured error propagation. When a tool fails, the agent needs useful information to decide what to do next. Returning a raw stack trace is useless. Every tool call result is structured: success, data, error.code, error.message. The agent can reason about the error code and try a recovery strategy.

type ToolResult<T> =
  | { success: true; data: T }
  | { success: false; error: { code: string; message: string } };

The explicit union type forces handling both cases everywhere in the codebase. TypeScript will not let you access data without checking success first.

What is next

profClaw is heading toward a cloud launch at profclaw.ai. The self-hosted version will stay fully open. The cloud version adds managed Redis, persistent storage, a hosted web chat widget, and a dashboard for monitoring agents in production.

The core engine, the skill system, and the provider abstraction are not changing. The cloud version is the self-hosted version with a control plane in front of it. That is how it should be.

If you are building internal tooling, automating workflows, or want an agent that talks to your users on Slack without sending their data through a third-party platform, profClaw is worth a look. It runs on hardware you already have, it is typed end-to-end, and you own every line of what it does.


Contact