Building Glin-Profanity: A Multilingual Profanity Filter for 24+ Languages

Content moderation is one of those problems that sounds solved until you actually look at what's available. When I was building GLINCKER's community features, I needed profanity filtering that worked across languages. What I found was a graveyard of English-only libraries, outdated word lists, and npm packages with 50 stars and 4 open CVEs. So I built my own.

That was glin-profanity. It now ships as three packages: the core library, an MCP server, and an OpenClaw plugin. Here's how it works and why I made the decisions I made.

Why Not Just Use What Already Existed

The most popular profanity filters on npm are effectively bad-words forks. They work fine for English. They have a word list, they check if any word in your string matches, done. The problems start immediately when you leave English:

Language coverage: most libraries don't support Arabic, Hindi, Turkish, Vietnamese, or the dozen other languages that GLINCKER's users actually speak. You can find language-specific libraries for some of these, but then you're stitching together five different packages with five different APIs and five different maintenance histories.

Normalization: someone writing f4ck or f*ck or fück is still writing a profanity. English-only libraries often handle basic leetspeak, but they rarely handle Unicode normalization, diacritic stripping, or the character substitution patterns that differ by language and region.

False positives: the Scunthorpe problem is real. English libraries that naively match substrings will flag "assassin," "classic," and "Middlesex." Multilingual libraries, when they exist, often make this worse because word boundaries in languages without spaces (like Japanese or Thai) require different treatment entirely.

Performance: some of the more sophisticated filters run everything through a regex chain or, worse, call out to a Python ML model. I needed something that could filter a message in under a millisecond on commodity hardware. No network calls, no model inference.

The conclusion was that nothing solved the problem well enough. The GLINCKER user base spans 30+ countries, and I was not going to ship an English-only filter and call it done.

Core Architecture

The library is built around three layers: word lists, a normalization pipeline, and a matching engine.

Word lists are the foundation. Each language has its own file: en.ts, es.ts, fr.ts, ar.ts, and so on. Each entry is a typed object with the word, its severity score (1-3), and optional metadata about whether it's context-sensitive:

interface ProfanityEntry {
  word: string;
  severity: 1 | 2 | 3;
  contextSensitive?: boolean;
  languages?: string[];
}

Severity 1 is mild (words that might be acceptable in some contexts), 2 is moderate, 3 is severe. This lets callers decide their own threshold rather than having me make that call for them.

The normalization pipeline runs before any matching. It's a chain of transforms applied in order:

function normalize(input: string, locale: string): string {
  return input
    |> toLowerCase
    |> stripDiacritics       // café -> cafe, über -> uber
    |> expandLeetspeak       // 4 -> a, 3 -> e, 0 -> o, etc.
    |> collapseRepeats       // fuuuuck -> fuck
    |> normalizeUnicode      // confusable Unicode codepoints
    |> stripPunctuation;     // f.u.c.k -> fuck
}

The order matters. You strip diacritics before expanding leetspeak because some diacritics are used as leetspeak substitutions in specific languages. You collapse repeats before matching because fuuuuuck should match fuck. Punctuation stripping happens last because a period in the middle of a word is bypass technique, not sentence structure.

I didn't use the pipeline proposal syntax above in practice (it's not in TypeScript yet), but that's the logical shape. The actual implementation chains function calls.

The matching engine takes the normalized string and checks it against the active word lists. For space-separated languages, it splits on word boundaries and checks each token. For languages without clear word boundaries, it uses a sliding window approach, which is slower but necessary.

Handling the Scunthorpe Problem

The false positive problem is genuinely hard. "Classic" contains a slur as a substring. "Grape" contains profanity in some European languages. Word boundary matching eliminates most of these, but not all.

My approach has two parts.

First, a whitelist of known false positives per language. This is manually curated and language-specific. If "Middlesex" keeps getting flagged, it goes in the English whitelist. It's not elegant but it's effective and auditable.

const WHITELISTED: Record<string, string[]> = {
  en: ['scunthorpe', 'middlesex', 'assassin', 'classic', 'grape'],
  fr: ['salut', /* ... */],
  // ...
};

Second, for entries marked contextSensitive: true, the matcher checks surrounding context before flagging. If the word appears inside a longer alphanumeric token (like a URL slug or a compound word), it's not flagged. If it's isolated as a standalone word, it is.

This won't catch everything and it will still produce false positives in edge cases. I document this. The library is a heuristic filter, not a definitive classifier. Callers who need higher precision should use the severity scoring to build a review queue rather than auto-blocking everything.

Severity Scoring

Rather than a binary pass/fail, glin-profanity returns a structured result:

interface FilterResult {
  clean: boolean;
  score: number;          // 0-100 aggregate severity
  matches: MatchDetail[];
  censored: string;       // input with matches replaced
}
 
interface MatchDetail {
  word: string;
  severity: 1 | 2 | 3;
  index: number;
  language: string;
}

The score field is a weighted aggregate. Three severity-3 matches produce a higher score than ten severity-1 matches. This lets applications build graduated responses: severity-1 matches might just get logged, severity-2 matches trigger a user warning, severity-3 matches auto-block.

The censored field replaces matches with asterisks by default, but callers can pass a custom replacement function:

const result = filter(userInput, {
  languages: ['en', 'es'],
  replacement: (match) => '*'.repeat(match.word.length),
  threshold: 2,           // only flag severity 2+
});

Performance

The design goal was sub-millisecond filtering on a single message. The constraint that achieved this was: no ML, no regex mega-patterns, no async I/O.

The word lists are loaded once at startup and stored in memory as Set<string> objects per language. Lookup is O(1). The normalization pipeline is pure string manipulation, no regex for the hot paths (I benchmarked this - compiled regex for short strings adds overhead that simple character iteration does not).

Benchmarks on a MacBook M3:

Single English message (50 words): ~0.08ms
Single message, 5 languages active: ~0.3ms
Batch of 1000 messages: ~180ms total (~0.18ms each)

This is fast enough to run synchronously on every message in a Hono request handler without worrying about it. No worker threads needed, no offloading to a queue.

The trade-off is memory. Loading all 24 language packs at once costs about 8MB of heap. That's fine for a server process. For edge environments with tight memory limits, there's a lazy-loading mode where language packs are loaded on first use and cached.

The MCP Server

The glin-profanity-mcp package wraps the core library as a Model Context Protocol server. This means AI assistants like Claude can use it as a tool during conversations.

The use case is content review workflows where an AI is helping moderate or classify user content. Instead of the AI trying to determine profanity from its own training data (which varies by language and cultural context), it can call the tool and get a structured, deterministic result.

npx glin-profanity-mcp

The MCP server exposes three tools:

check_profanity: check a single string, returns the FilterResult structure
batch_check: check an array of strings, returns results in order
list_languages: returns supported language codes and their word list sizes

The implementation is a thin stdio transport layer over the core library:

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import { filter, listLanguages } from 'glin-profanity';
 
const server = new Server(
  { name: 'glin-profanity-mcp', version: pkg.version },
  { capabilities: { tools: {} } }
);
 
server.setRequestHandler(CallToolRequestSchema, async (req) => {
  switch (req.params.name) {
    case 'check_profanity':
      return { content: [{ type: 'text', text: JSON.stringify(
        filter(req.params.arguments.text, req.params.arguments.options ?? {})
      )}]};
    // ...
  }
});
 
await server.connect(new StdioServerTransport());

The interesting design question was tool granularity. I could expose one tool with all options, or many specific tools. I landed on three because: one tool with too many options produces bad AI behavior (it tries to infer defaults), and ten tools with narrow scope produces confusion about which to use. Three is enough to cover the real use cases without overwhelming the tool selection.

Packaging: Three Packages from One Codebase

The monorepo structure:

packages/
  glin-profanity/          # core library (glin-profanity on npm)
  glin-profanity-mcp/      # MCP server (glin-profanity-mcp on npm)
  glin-profanity-openclaw/ # OpenClaw plugin (@glincker/profanity-openclaw)
  shared/                  # word lists, normalization pipeline (not published)

The OpenClaw plugin integrates the filter into OpenClaw's agent framework as a guardrail. Agents that use it automatically have outgoing message filtering applied before responses are sent to users. This is more useful for agents running in community contexts where you don't want the AI to reproduce profanity from user inputs in its responses.

All three packages share the same shared/ word lists. The normalization pipeline and matching engine live in the core and the other packages depend on it. No code duplication.

The 24+ languages as of now: English, Spanish, French, German, Portuguese, Italian, Dutch, Russian, Polish, Turkish, Arabic, Hebrew, Hindi, Bengali, Japanese, Korean, Chinese (Simplified), Chinese (Traditional), Vietnamese, Thai, Indonesian, Malay, Romanian, Czech, Hungarian, and Swedish.

The most requested languages that took the longest to ship were Arabic and Hindi. Both required native speaker review of the word lists because my own validation tooling can't catch culturally context-dependent nuance. I ended up running community review rounds on GitHub before merging those lists.

Japanese and Thai required distinct handling in the matching engine because they don't use spaces as word delimiters. The sliding window approach works but it produces more false positives than space-separated languages. The Japanese and Thai whitelists are consequently longer.

What I'd Change

The normalization pipeline is good but it handles leetspeak with a hardcoded substitution map. Different communities use different substitutions and the map is never complete. A learned substitution model, even a simple one, would do better here.

The context-awareness is binary: is the word standalone or embedded. Real contextual analysis would look at surrounding words to understand intent. "Damn good coffee" and "go to damn hell" have different severity profiles but the current system scores them identically. This is the most significant limitation.

More languages is the obvious roadmap item. Swahili, Ukrainian, and Persian are the three most requested that aren't shipped yet. The bottleneck is always word list quality review, not the engineering.

The long-term improvement I think about most is a hybrid approach: the current fast string-matching system as a first pass, with an ML classifier layer that only activates for ambiguous cases. The string matcher handles the clear cases in microseconds. The classifier handles the hard cases where context actually matters. You get the performance of pure matching for 95% of inputs and the accuracy of ML for the 5% where it's needed.

Whether that's worth the added complexity depends on how much the false positive rate matters in practice. For most applications, the current system is good enough. For a platform with genuine moderation stakes, the hybrid approach would be worth it.

The library is at github.com/glincker/glin-profanity and on npm as glin-profanity. Contributions to word lists, especially for languages not currently covered, are always open.