Yiwei Shi

Sequence Modelling

Sun, 17 May 2026 06:51:14 -0400

Sequence Modeling in Industrial Recommendation & Ads Systems

A synthesis of six engineering blog posts from Uber, Pinterest, and Meta describing how the biggest consumer platforms have moved from hand-crafted aggregate features and DLRMs toward transformer-based sequence modeling for recommendations and ads.

1. Why sequence modeling? The shared motivation

Every blog post starts from the same diagnosis: classical Deep Learning Recommendation Models (DLRMs) and statistics-based features have hit a ceiling.

What is Harness Engineering ?

Sat, 16 May 2026 07:12:48 -0400

TL;DR

Beginning in Q1 2026, OpenAI, Cursor, LangChain, and Anthropic all brought up the concept of harness engineering. What is it? OpenAI’s harness engineering addresses the interaction dimension: how a human can steer large amounts of agent work with minimal intervention, corresponding to span of control and delegation in management theory. Cursor’s self-driving codebases address the spatial dimension: how hundreds of agents running in parallel can avoid stepping on each other, corresponding to cross-team coordination. Anthropic’s harness design for long-running apps addresses the temporal dimension: how a single agent running continuously for hours can avoid drifting off course, corresponding to milestone management in long-cycle projects.

Two-Tower, DCN v2, and Transformers: How Modern Retrieval and Ranking Fit Together

Tue, 12 May 2026 08:21:39 -0400

If you’ve spent any time around modern recommendation, search, or ads systems, you’ve run into three architectures that keep showing up: two-tower models, DCN v2, and Transformers. They’re often discussed as if they’re alternatives, but in production they’re almost always composed. Each one solves a different problem, and the interesting design work is in how you fit them together.

This post walks through what each does, where they slot in, and how a typical large-scale retrieval-and-ranking stack actually uses all three.

Modern Real-Time OLAP Systems: ClickHouse is Winning

Mon, 11 May 2026 22:52:17 -0400

A comparison of ClickHouse, StarRocks, Apache Druid, and Apache Pinot — the four engines defining real-time analytics in 2026.

Why Real-Time OLAP Matters Now

The old split was clean: OLTP for operations, batch warehouses for BI. Data freshness measured in hours was fine, because the workloads were retrospective.

That model has broken. User-facing analytics, fraud detection, ad bidding, and RAG-style AI applications all need sub-second queries over billions of rows, with data that’s seconds old. Real-time OLAP databases close the gap between streaming ingestion (Kafka, Flink) and interactive analytical queries.

Python Review 2026: 21 Essential Examples

Sun, 10 May 2026 10:21:53 -0400

Python Examples Reference

Section 01

For Loops with Index

Python's enumerate() is the idiomatic way to get both the index and the value while iterating. You can also use range(len(...)), but it's less Pythonic.

Building a Multi-Turn LLM Tool-Calling Pipeline

Sat, 09 May 2026 01:20:40 -0400

If you’ve used the OpenAI, Anthropic, or Bedrock APIs to build something more sophisticated than a chatbot, you’ve probably written an agent loop — code that lets the model call tools, receive results, and decide what to do next. I recently built one for a document analysis pipeline at work, and a few things surprised me. This post is a distillation of those lessons, using a generic example.

The Setup: A Four-Tool Pipeline

Imagine you’re processing a document. For each item the model identifies, you want to:

The AI Memory Supercycle: Who Actually Earns the GPU Dollar

Fri, 08 May 2026 11:58:50 -0400

A deep dive into the memory shortage, the Nvidia value chain, and where the profits are flowing

The Setup: Something Strange Is Happening in Memory

If you only watch Nvidia, you are missing the most interesting story in semiconductors right now.

In Q1 2026, SK Hynix posted an operating margin of 72%. That number is not a typo. It exceeded both Nvidia and TSMC in the same quarter. For a company that makes commodity DRAM, this is the kind of margin associated with luxury handbags, not memory chips. Industry veterans say profitability like this has not been seen since Microsoft launched Windows 95 in 1995.

What Is FDE ?

Thu, 07 May 2026 11:36:30 -0400

FDE stands for Forward Deployed Engineer, a role first systematized by Palantir. While a traditional product engineer builds a single feature to serve many customers, an FDE provides the diverse range of capabilities required by a single customer.

The distinction can be illustrated by a specific scenario: a pre-sales engineer’s job ends when the customer says, “This looks like it should work,” whereas an FDE’s job begins when the customer says, “Then let’s get it running.” If your code is live in a customer’s production environment and you’re the one handling on-call issues at midnight, you are an FDE.

Back-of-the-Envelope Numbers Every System Designer Should Know

Tue, 05 May 2026 22:52:48 -0400

When you’re sketching a system architecture on a whiteboard, you don’t need precise benchmarks — you need to know whether your design is within an order of magnitude of feasible. Is one Postgres node enough? Do you need Kafka, or will RabbitMQ do? Should you reach for Cassandra, or is your workload nowhere near needing it?

Here are the numbers I keep in my head, calibrated against published benchmarks from Confluent, Instaclustr, Honeycomb, and others. Treat them as starting points for capacity planning, not SLAs.