<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Yiwei Shi</title><link>/</link><description>Recent content on Yiwei Shi</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 17 May 2026 06:51:14 -0400</lastBuildDate><atom:link href="/index.xml" rel="self" type="application/rss+xml"/><item><title>Sequence Modelling</title><link>/posts/20260517-sequence-modelling/</link><pubDate>Sun, 17 May 2026 06:51:14 -0400</pubDate><guid>/posts/20260517-sequence-modelling/</guid><description>&lt;h1 id="sequence-modeling-in-industrial-recommendation--ads-systems"&gt;Sequence Modeling in Industrial Recommendation &amp;amp; Ads Systems&lt;/h1&gt;
&lt;p&gt;A synthesis of six engineering blog posts from &lt;strong&gt;Uber&lt;/strong&gt;, &lt;strong&gt;Pinterest&lt;/strong&gt;, and &lt;strong&gt;Meta&lt;/strong&gt; describing how the biggest consumer platforms have moved from hand-crafted aggregate features and DLRMs toward transformer-based sequence modeling for recommendations and ads.&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="1-why-sequence-modeling-the-shared-motivation"&gt;1. Why sequence modeling? The shared motivation&lt;/h2&gt;
&lt;p&gt;Every blog post starts from the same diagnosis: classical Deep Learning Recommendation Models (DLRMs) and statistics-based features have hit a ceiling.&lt;/p&gt;</description></item><item><title>What is Harness Engineering ?</title><link>/posts/20260516-harness-engineering/</link><pubDate>Sat, 16 May 2026 07:12:48 -0400</pubDate><guid>/posts/20260516-harness-engineering/</guid><description>&lt;h2 id="tldr"&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;Beginning in Q1 2026, OpenAI, Cursor, LangChain, and Anthropic all brought up the concept of harness engineering. What is it? OpenAI&amp;rsquo;s harness engineering addresses the interaction dimension: how a human can steer large amounts of agent work with minimal intervention, corresponding to span of control and delegation in management theory. Cursor&amp;rsquo;s self-driving codebases address the spatial dimension: how hundreds of agents running in parallel can avoid stepping on each other, corresponding to cross-team coordination. Anthropic&amp;rsquo;s harness design for long-running apps addresses the temporal dimension: how a single agent running continuously for hours can avoid drifting off course, corresponding to milestone management in long-cycle projects.&lt;/p&gt;</description></item><item><title>Two-Tower, DCN v2, and Transformers: How Modern Retrieval and Ranking Fit Together</title><link>/posts/20260512-modern-sequence-modelling/</link><pubDate>Tue, 12 May 2026 08:21:39 -0400</pubDate><guid>/posts/20260512-modern-sequence-modelling/</guid><description>&lt;p&gt;If you&amp;rsquo;ve spent any time around modern recommendation, search, or ads systems, you&amp;rsquo;ve run into three architectures that keep showing up: &lt;strong&gt;two-tower models&lt;/strong&gt;, &lt;strong&gt;DCN v2&lt;/strong&gt;, and &lt;strong&gt;Transformers&lt;/strong&gt;. They&amp;rsquo;re often discussed as if they&amp;rsquo;re alternatives, but in production they&amp;rsquo;re almost always &lt;em&gt;composed&lt;/em&gt;. Each one solves a different problem, and the interesting design work is in how you fit them together.&lt;/p&gt;
&lt;p&gt;This post walks through what each does, where they slot in, and how a typical large-scale retrieval-and-ranking stack actually uses all three.&lt;/p&gt;</description></item><item><title>Modern Real-Time OLAP Systems: ClickHouse is Winning</title><link>/posts/20260511-readtime-olap/</link><pubDate>Mon, 11 May 2026 22:52:17 -0400</pubDate><guid>/posts/20260511-readtime-olap/</guid><description>&lt;p&gt;A comparison of ClickHouse, StarRocks, Apache Druid, and Apache Pinot — the four engines defining real-time analytics in 2026.&lt;/p&gt;
&lt;h2 id="why-real-time-olap-matters-now"&gt;Why Real-Time OLAP Matters Now&lt;/h2&gt;
&lt;p&gt;The old split was clean: OLTP for operations, batch warehouses for BI. Data freshness measured in hours was fine, because the workloads were retrospective.&lt;/p&gt;
&lt;p&gt;That model has broken. User-facing analytics, fraud detection, ad bidding, and RAG-style AI applications all need sub-second queries over billions of rows, with data that&amp;rsquo;s seconds old. Real-time OLAP databases close the gap between streaming ingestion (Kafka, Flink) and interactive analytical queries.&lt;/p&gt;</description></item><item><title>Python Review 2026: 21 Essential Examples</title><link>/posts/20260510-python/</link><pubDate>Sun, 10 May 2026 10:21:53 -0400</pubDate><guid>/posts/20260510-python/</guid><description>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
&lt;head&gt;
&lt;meta charset="UTF-8"&gt;
&lt;meta name="viewport" content="width=device-width, initial-scale=1.0"&gt;
&lt;title&gt;Python Examples Reference&lt;/title&gt;
&lt;link href="https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;600&amp;family=Source+Serif+4:ital,wght@0,400;0,700;1,400&amp;family=DM+Sans:wght@400;500;700&amp;display=swap" rel="stylesheet"&gt;
&lt;style&gt;
 :root {
 --bg: #0f1117;
 --surface: #181a24;
 --surface-hover: #1e2130;
 --border: #2a2d3a;
 --text: #e0e0e8;
 --text-dim: #8b8ea0;
 --accent: #7c9bf5;
 --accent-dim: #4a6bc5;
 --green: #6ec87a;
 --orange: #e0a050;
 --pink: #d87caa;
 --red: #e06060;
 --yellow: #d4c46a;
 --cyan: #5cc5c5;
 --code-bg: #12141e;
 --code-border: #252838;
 }

 * { margin: 0; padding: 0; box-sizing: border-box; }

 body {
 background: var(--bg);
 color: var(--text);
 font-family: 'DM Sans', sans-serif;
 line-height: 1.7;
 padding: 0;
 }

 .hero {
 padding: 60px 40px 40px;
 max-width: 900px;
 margin: 0 auto;
 }

 .hero h1 {
 font-family: 'Source Serif 4', serif;
 font-size: 2.6rem;
 color: #fff;
 letter-spacing: -0.02em;
 margin-bottom: 8px;
 }

 .hero .subtitle {
 color: var(--text-dim);
 font-size: 1.05rem;
 }

 .toc {
 max-width: 900px;
 margin: 0 auto 40px;
 padding: 0 40px;
 }

 .toc h2 {
 font-family: 'Source Serif 4', serif;
 font-size: 1.2rem;
 color: var(--accent);
 margin-bottom: 12px;
 text-transform: uppercase;
 letter-spacing: 0.08em;
 }

 .toc-grid {
 display: grid;
 grid-template-columns: 1fr 1fr;
 gap: 4px 32px;
 }

 .toc a {
 color: var(--text-dim);
 text-decoration: none;
 font-size: 0.9rem;
 padding: 4px 0;
 display: block;
 transition: color 0.15s;
 }

 .toc a:hover { color: var(--accent); }
 .toc a .num { color: var(--accent-dim); margin-right: 8px; font-family: 'JetBrains Mono', monospace; font-size: 0.78rem; }

 .content {
 max-width: 900px;
 margin: 0 auto;
 padding: 0 40px 80px;
 }

 .section {
 margin-bottom: 56px;
 scroll-margin-top: 24px;
 }

 .section-number {
 font-family: 'JetBrains Mono', monospace;
 font-size: 0.75rem;
 color: var(--accent-dim);
 letter-spacing: 0.1em;
 text-transform: uppercase;
 margin-bottom: 4px;
 }

 .section h2 {
 font-family: 'Source Serif 4', serif;
 font-size: 1.55rem;
 color: #fff;
 margin-bottom: 10px;
 padding-bottom: 10px;
 border-bottom: 1px solid var(--border);
 }

 .section p {
 color: var(--text-dim);
 margin-bottom: 16px;
 font-size: 0.95rem;
 }

 .section p strong { color: var(--text); }

 pre {
 background: var(--code-bg);
 border: 1px solid var(--code-border);
 border-radius: 8px;
 padding: 20px 24px;
 overflow-x: auto;
 margin-bottom: 16px;
 font-family: 'JetBrains Mono', monospace;
 font-size: 0.82rem;
 line-height: 1.65;
 color: var(--text);
 }

 .kw { color: var(--pink); }
 .fn { color: var(--accent); }
 .st { color: var(--green); }
 .cm { color: var(--text-dim); font-style: italic; }
 .nb { color: var(--cyan); }
 .num { color: var(--orange); }
 .op { color: var(--text-dim); }
 .dec { color: var(--yellow); }
 .out {
 display: block;
 margin-top: 8px;
 padding-top: 8px;
 border-top: 1px dashed var(--code-border);
 color: var(--text-dim);
 }

 .note {
 background: var(--surface);
 border-left: 3px solid var(--accent-dim);
 padding: 12px 16px;
 border-radius: 0 6px 6px 0;
 margin-bottom: 16px;
 font-size: 0.88rem;
 color: var(--text-dim);
 }

 .note strong { color: var(--accent); }

 @media (max-width: 640px) {
 .hero, .toc, .content { padding-left: 20px; padding-right: 20px; }
 .hero h1 { font-size: 1.8rem; }
 .toc-grid { grid-template-columns: 1fr; }
 }
&lt;/style&gt;
&lt;/head&gt;
&lt;body&gt;


&lt;nav class="toc"&gt;
 &lt;h2&gt;Contents&lt;/h2&gt;
 &lt;div class="toc-grid"&gt;
 &lt;a href="#s1"&gt;&lt;span class="num"&gt;01&lt;/span&gt; For Loops with Index&lt;/a&gt;
 &lt;a href="#s2"&gt;&lt;span class="num"&gt;02&lt;/span&gt; Arrays in Python&lt;/a&gt;
 &lt;a href="#s3"&gt;&lt;span class="num"&gt;03&lt;/span&gt; Access Modifiers&lt;/a&gt;
 &lt;a href="#s4"&gt;&lt;span class="num"&gt;04&lt;/span&gt; Name Mangling&lt;/a&gt;
 &lt;a href="#s5"&gt;&lt;span class="num"&gt;05&lt;/span&gt; Generators&lt;/a&gt;
 &lt;a href="#s6"&gt;&lt;span class="num"&gt;06&lt;/span&gt; Map API&lt;/a&gt;
 &lt;a href="#s7"&gt;&lt;span class="num"&gt;07&lt;/span&gt; Inheritance&lt;/a&gt;
 &lt;a href="#s8"&gt;&lt;span class="num"&gt;08&lt;/span&gt; MRO &amp;amp; C3 Linearization&lt;/a&gt;
 &lt;a href="#s9"&gt;&lt;span class="num"&gt;09&lt;/span&gt; Type Enforcement&lt;/a&gt;
 &lt;a href="#s10"&gt;&lt;span class="num"&gt;10&lt;/span&gt; Exception Rethrow&lt;/a&gt;
 &lt;a href="#s11"&gt;&lt;span class="num"&gt;11&lt;/span&gt; HashSet&lt;/a&gt;
 &lt;a href="#s12"&gt;&lt;span class="num"&gt;12&lt;/span&gt; Closures&lt;/a&gt;
 &lt;a href="#s13"&gt;&lt;span class="num"&gt;13&lt;/span&gt; Closure Use Cases&lt;/a&gt;
 &lt;a href="#s14"&gt;&lt;span class="num"&gt;14&lt;/span&gt; Decorators&lt;/a&gt;
 &lt;a href="#s15"&gt;&lt;span class="num"&gt;15&lt;/span&gt; Multithreading &amp;amp; Multiprocessing&lt;/a&gt;
 &lt;a href="#s16"&gt;&lt;span class="num"&gt;16&lt;/span&gt; GIL Removal&lt;/a&gt;
 &lt;a href="#s17"&gt;&lt;span class="num"&gt;17&lt;/span&gt; Recent Python Features&lt;/a&gt;
 &lt;a href="#s18"&gt;&lt;span class="num"&gt;18&lt;/span&gt; f-strings&lt;/a&gt;
 &lt;a href="#s19"&gt;&lt;span class="num"&gt;19&lt;/span&gt; New Type-Parameter Syntax (3.12)&lt;/a&gt;
 &lt;a href="#s20"&gt;&lt;span class="num"&gt;20&lt;/span&gt; Recap of 3.8 / 3.9 / 3.10&lt;/a&gt;
 &lt;a href="#s21"&gt;&lt;span class="num"&gt;21&lt;/span&gt; Positional-Only Params &amp;amp; Pattern Matching&lt;/a&gt;
 &lt;/div&gt;
&lt;/nav&gt;

&lt;div class="content"&gt;

&lt;!-- ===== 1. FOR LOOPS WITH INDEX ===== --&gt;
&lt;div class="section" id="s1"&gt;
 &lt;div class="section-number"&gt;Section 01&lt;/div&gt;
 &lt;h2&gt;For Loops with Index&lt;/h2&gt;
 &lt;p&gt;Python's &lt;strong&gt;enumerate()&lt;/strong&gt; is the idiomatic way to get both the index and the value while iterating. You can also use &lt;strong&gt;range(len(...))&lt;/strong&gt;, but it's less Pythonic.&lt;/p&gt;</description></item><item><title>Building a Multi-Turn LLM Tool-Calling Pipeline</title><link>/posts/20260509-multi-turn-llm-workflow/</link><pubDate>Sat, 09 May 2026 01:20:40 -0400</pubDate><guid>/posts/20260509-multi-turn-llm-workflow/</guid><description>&lt;p&gt;If you&amp;rsquo;ve used the OpenAI, Anthropic, or Bedrock APIs to build something more sophisticated than a chatbot, you&amp;rsquo;ve probably written an agent loop — code that lets the model call tools, receive results, and decide what to do next. I recently built one for a document analysis pipeline at work, and a few things surprised me. This post is a distillation of those lessons, using a generic example.&lt;/p&gt;
&lt;h2 id="the-setup-a-four-tool-pipeline"&gt;The Setup: A Four-Tool Pipeline&lt;/h2&gt;
&lt;p&gt;Imagine you&amp;rsquo;re processing a document. For each item the model identifies, you want to:&lt;/p&gt;</description></item><item><title>The AI Memory Supercycle: Who Actually Earns the GPU Dollar</title><link>/posts/20260508-memory/</link><pubDate>Fri, 08 May 2026 11:58:50 -0400</pubDate><guid>/posts/20260508-memory/</guid><description>&lt;p&gt;&lt;em&gt;A deep dive into the memory shortage, the Nvidia value chain, and where the profits are flowing&lt;/em&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h2 id="the-setup-something-strange-is-happening-in-memory"&gt;The Setup: Something Strange Is Happening in Memory&lt;/h2&gt;
&lt;p&gt;If you only watch Nvidia, you are missing the most interesting story in semiconductors right now.&lt;/p&gt;
&lt;p&gt;In Q1 2026, SK Hynix posted an operating margin of 72%. That number is not a typo. It exceeded both Nvidia and TSMC in the same quarter. For a company that makes commodity DRAM, this is the kind of margin associated with luxury handbags, not memory chips. Industry veterans say profitability like this has not been seen since Microsoft launched Windows 95 in 1995.&lt;/p&gt;</description></item><item><title>What Is FDE ?</title><link>/posts/20260507-what-is-fde/</link><pubDate>Thu, 07 May 2026 11:36:30 -0400</pubDate><guid>/posts/20260507-what-is-fde/</guid><description>&lt;p&gt;FDE stands for Forward Deployed Engineer, a role first systematized by Palantir. While a traditional product engineer builds a single feature to serve many customers, an FDE provides the diverse range of capabilities required by a single customer.&lt;/p&gt;
&lt;p&gt;The distinction can be illustrated by a specific scenario: a pre-sales engineer’s job ends when the customer says, &amp;ldquo;This looks like it should work,&amp;rdquo; whereas an FDE’s job begins when the customer says, &amp;ldquo;Then let’s get it running.&amp;rdquo; If your code is live in a customer&amp;rsquo;s production environment and you’re the one handling on-call issues at midnight, you are an FDE.&lt;/p&gt;</description></item><item><title>Back-of-the-Envelope Numbers Every System Designer Should Know</title><link>/posts/20260505-basic-machine-numbers/</link><pubDate>Tue, 05 May 2026 22:52:48 -0400</pubDate><guid>/posts/20260505-basic-machine-numbers/</guid><description>&lt;p&gt;When you&amp;rsquo;re sketching a system architecture on a whiteboard, you don&amp;rsquo;t need precise benchmarks — you need to know whether your design is within an order of magnitude of feasible. Is one Postgres node enough? Do you need Kafka, or will RabbitMQ do? Should you reach for Cassandra, or is your workload nowhere near needing it?&lt;/p&gt;
&lt;p&gt;Here are the numbers I keep in my head, calibrated against published benchmarks from Confluent, Instaclustr, Honeycomb, and others. Treat them as starting points for capacity planning, not SLAs.&lt;/p&gt;</description></item></channel></rss>