Type “the Xmas costume for a 5-year-old girl to join the school party” into most e-commerce search bars today, and you’ll get a wall of irrelevant results — some matching “Xmas,” some matching “costume,” none of them understanding that this is a single, specific intent. Traditional keyword search treats a sentence as a bag of tokens. What the shopper actually wants is a short list of age-appropriate, size-correct, in-stock costumes that match their child’s taste — without having to translate their thought into filter clicks.
This is squarely a neural search problem, and it’s a natural extension of the agentic AI theme from earlier this week: the same shift toward language-native interfaces that’s reshaping developer tools is reshaping product discovery. The difference is that here, the “guardrails” aren’t tests and CI pipelines — they’re catalog data quality, persona signals, and ranking logic that keeps a powerful language model from confidently recommending a size 14 superhero cape to a toddler.
Proposed architecture
The system is organized into five cooperating layers: query understanding, persona, embedding, retrieval, and ranking — sitting on top of the existing product catalog.

Query Understanding Service. An LLM-based NLU layer parses the raw search string into structured intent and entities — occasion (“Christmas party”), recipient (“girl, age 5”), category (“costume”), and any explicit attributes the shopper mentioned (color, theme, budget). This turns a sentence into a structured seed for everything downstream.
User Persona Service. A profile store holds longer-lived signals: the shopper’s saved sizes for their children, past purchases, preferred brands, and style affinities (e.g., “tends to buy bright colors, avoids licensed characters”). These traits don’t come from the query — they come from history, and they’re what make two shoppers typing the identical search see different results.
Embedding Service. Combines the parsed query intent with relevant persona signals into a single query vector — effectively asking “products like this description, for a shopper like this.”
Vector Search Engine. Retrieves a broad candidate set of products from a vector store of product embeddings, based on semantic similarity to the combined query vector. This is where “costume,” “fancy dress,” and “dress-up outfit” all converge even though the shopper only typed one of them.
Personalization and Ranking Engine. The final and most important step. Candidates from vector search are re-ranked using a blended score: semantic relevance, persona fit (does the size match this child? does the style match past purchases?), and catalog signals pulled from the Product Catalog Service (current stock, price band, available sizes and colors).
Why the ranking layer matters most
It’s tempting to think the hard part is the language model parsing the query — but LLMs are now quite good at that. The harder, more valuable problem is the ranking layer, because it’s where business logic, inventory reality, and personal taste all collide. A technically “relevant” costume that’s out of stock in the child’s size is a worse result than a slightly less on-theme costume that’s available and matches the shopper’s past color preferences. Getting this blend right is mostly a data and weighting problem, not a model problem — which means it’s the layer most worth investing engineering time in.
Open questions worth prototyping
A few things this architecture doesn’t answer yet, and would be worth testing with real traffic: how much persona weight is too much (over-personalization can make search feel like it’s “boxing in” a shopper who’s looking for something new this year), how to cold-start ranking for shoppers with no purchase history, and how to keep the query understanding layer fast enough that it doesn’t become the new latency bottleneck. Each of these is a good candidate for a follow-up post once there’s data to look at.
