Scaling GLINCKER: Architecture Decisions That Shaped Our Platform

Building GLINCKER wasn't a weekend project that accidentally got popular. It was a deliberate attempt to build a social platform from first principles — and every architectural decision had consequences that I'm still living with today. Here's the honest retrospective.

The Starting Point: What Are We Actually Building?

GLINCKER is a social platform. That sounds simple until you enumerate what "social platform" actually means in 2024: user profiles, content feeds, real-time notifications, content moderation, search, recommendations, and enough surface area to attract bad actors within 48 hours of launch.

Before writing a single line of code I spent two weeks mapping out the data access patterns, not the features. What gets read most? What needs to be consistent vs. eventually consistent? What can tolerate latency and what can't? That exercise shaped every technology choice that followed.

Choosing the Stack

I landed on Spring Boot 3.x for the API layer, PostgreSQL as the primary data store with Redis for caching and pub/sub, and Next.js for the web frontend with a React Native mobile app sharing component logic via a shared design system.

People asked why Java. The honest answer: I know it deeply, Spring Security's OAuth2 support is mature, and JVM startup times stopped being an issue the moment GraalVM native images became viable. Spring Boot's ecosystem — Spring Data JPA, Spring Cache, Spring Security, Spring WebSocket — meant I wasn't stitching together disparate libraries for each concern.

@Service
@Transactional
public class FeedService {
 
    private final PostRepository postRepository;
    private final FollowRepository followRepository;
    private final RedisTemplate<String, Object> redis;
 
    public Page<Post> getFeedForUser(UUID userId, Pageable pageable) {
        // Fan-out-on-read for users with < 1000 followers
        // Fan-out-on-write for high-follower accounts
        String cacheKey = "feed:" + userId;
 
        List<UUID> followedIds = followRepository
            .findFollowedUserIds(userId);
 
        return postRepository
            .findByAuthorIdInOrderByCreatedAtDesc(followedIds, pageable);
    }
}

The fan-out decision was one of the first real trade-offs. Fan-out-on-read is simpler but gets expensive as follow counts grow. Fan-out-on-write is more complex but keeps reads fast. I implemented a hybrid: write fan-out for accounts under a follower threshold, read fan-out above it. Twitter calls this the "celebrity problem" and it's a real problem at any scale.

Real-Time: WebSockets Over SSE (Mostly)

Notifications and live feed updates need real-time delivery. I evaluated three options: WebSockets, Server-Sent Events (SSE), and polling. SSE is great for unidirectional updates but falls apart when you need bidirectional communication (typing indicators, read receipts). Polling is a non-starter. WebSockets it was.

Spring WebSocket with STOMP over SockJS gave me a reliable abstraction:

@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
 
    @Override
    public void configureMessageBroker(MessageBrokerRegistry registry) {
        registry.enableSimpleBroker("/topic", "/queue");
        registry.setApplicationDestinationPrefixes("/app");
    }
 
    @Override
    public void registerStompEndpoints(StompEndpointRegistry registry) {
        registry.addEndpoint("/ws")
            .setAllowedOriginPatterns("*")
            .withSockJS();
    }
}

What I underestimated: connection state management at scale. Each open WebSocket connection holds memory on the server. At a few hundred concurrent users it's fine. At tens of thousands, you need to think carefully about connection pooling, heartbeat intervals, and graceful reconnection on the client. I eventually moved notification delivery to Redis pub/sub with the WebSocket layer acting as a dumb relay — much more horizontally scalable.

Multi-Tenant Architecture

GLINCKER supports organizational accounts — teams, companies, communities. That introduced multi-tenancy requirements early. I had two options: separate databases per tenant (clean isolation, expensive), or shared schema with tenant discrimination (cheaper, more complex).

I went with row-level tenancy using a tenant_id column on every major table, enforced via a Hibernate filter that activates automatically based on the JWT context:

@Aspect
@Component
public class TenantContextAspect {
 
    @Around("@annotation(TenantScoped)")
    public Object applyTenantContext(ProceedingJoinPoint jp) throws Throwable {
        String tenantId = SecurityContextHolder.getContext()
            .getAuthentication()
            .getDetails()
            .toString(); // extracted from JWT claim
 
        TenantContext.set(tenantId);
        try {
            return jp.proceed();
        } finally {
            TenantContext.clear();
        }
    }
}

The risk here is data leakage from a missed filter. I mitigated this with integration tests that explicitly verify tenant isolation for every repository method and a CI check that flags any new @Repository class without the tenant filter annotation.

Content Moderation with glin-profanity

User-generated content at any scale means moderation. I built glin-profanity as a standalone npm package specifically because I needed a moderation primitive that worked across both the Node-based tooling layer and the frontend validation. The core idea: a filter that's configurable, doesn't just do naive word matching, and handles Unicode obfuscation (people get creative with @ symbols and Cyrillic look-alikes).

On the backend, moderation runs as an async pipeline:

Synchronous pre-publish check — glin-profanity runs on submission, blocks obvious violations immediately
Async deep scan — queued job runs image hashing (perceptual hash against known CSAM databases) and more expensive NLP checks
Human review queue — borderline cases escalate to a review dashboard

The key decision: don't block the publish action on the expensive checks. Users expect immediate feedback. Show the content optimistically, run the deep scan in the background, and retroactively remove content if it fails. The window of exposure is seconds, and the UX win is significant.

Database Schema Decisions I Regret

PostgreSQL was the right call. JSONB columns were where I got lazy.

Early on I stored notification metadata as JSONB because the schema varied by notification type. It worked. Then six months later I needed to query "all notifications where the referenced post has been deleted" and discovered that querying inside JSONB at scale is painful. I refactored to a proper polymorphic association table — it took three days and a zero-downtime migration script using pg_repack.

The lesson: JSONB is fine for truly unstructured data that you'll never query on. The moment you find yourself writing ->>'key' in a WHERE clause, you probably want a real column.

CI/CD Pipeline

The pipeline runs on GitHub Actions with three stages:

Fast feedback (< 2 min): unit tests, lint, type check
Integration (< 8 min): Testcontainers-based integration tests, Spring context load
Deploy (< 5 min): Docker build + push to registry, rolling deploy via Kubernetes

I use semantic-release for automated versioning based on conventional commits. Every merge to main that includes a feat: or fix: commit triggers a release automatically.

The infrastructure lives on a managed Kubernetes cluster. Spring Boot health endpoints (/actuator/health/liveness and /actuator/health/readiness) hook directly into Kubernetes probes — zero custom health check code.

What I'd Do Differently

Event sourcing earlier. The feed, notifications, and activity streams are all inherently event-driven. I built them CRUD-first and retrofitted event emission. Doing it event-first from day one would have simplified both the real-time layer and the audit trail.

GraphQL instead of REST for the mobile API. React Native clients have variable connectivity and overfetch is expensive on mobile. REST with field filtering is a hack. GraphQL solves it correctly.

Invest in observability on day one. I added distributed tracing (OpenTelemetry + Tempo) six months into development. Those first six months of debugging production issues using only logs were genuinely painful. Structured logging with correlation IDs from the start — non-negotiable on the next project.

Building a platform is different from building a product. Every decision you make in week one echoes through every week that follows. Choose boring technology, instrument everything, and plan your data model like it will outlive the application code — because it will.