About

A small indie search engine built on library science principles. Run as a hobby out of Palm Springs.

Origin

Built in a weekend in February 2026, maintained quietly since. Cerulean started as an experiment in what someone with no prior development experience could build, working with Claude (Anthropic's Claude Opus 4.7 model, doing its best imitation of the author's voice). The code, the site, and this page were all written that way. That feels apt for a project this transparent about how it works.

Built by Garret Blanchette. Source code on GitHub. The project has no commercial structure: no ads, no profit motive, no accounts, no data collection beyond what the server logs by default. A hobby that runs in public.

What it's for

Source literacy, not gatekeeping. The library science tradition Cerulean draws on is a teaching tradition before it is anything else. The classification framework is meant to be visible and learnable, not specialist credentialing. The point of surfacing source-role and source-type tags is to make the structural distinctions students are already taught to look for legible inside a search box. A high schooler researching their first paper should be able to read the tags and use them. So should a graduate student. So should a reference librarian working outside their licensed databases.

This means no paywall, no professional edition, no required login. The methodology is documented in full, the bundled domain index is public and challengeable, and the source is on GitHub. If the project ever comes under commercial pressure to enclose part of it, that pressure should be visible in this paragraph.

How results are classified

Every classified result is tagged on two axes: source role (primary, secondary, tertiary) and source type (journalism, academic, indie, commercial, and several others). The framework, the three-tier classifier pipeline, and the honest limits are documented on the methodology page.

Current production state. Source-type tags are live. Source-role tags are rolling out. The methodology page describes the target system in full and is updated as production catches up.

Search providers

Cerulean is a metasearch wrapper. It does not crawl the web itself; it queries three commercial APIs and classifies what comes back. The wrapper architecture is honest about the dependency: if any provider changes terms, the corresponding source goes dark, and the site says so.

Brave. Independent index. Best privacy posture of the three: no profiling, no tracking. Smaller coverage on long-tail queries, capped at 20 results per search. Best default for most use cases.

Serper. Thin wrapper around Google's own SERP. Best coverage and ranking quality, especially for obscure queries. The query is logged by a commercial intermediary before reaching Google, so it is the worst privacy option. Use when Brave misses what you are looking for.

DuckDuckGo. Free fallback, mostly Bing-derived results. Decent for general queries, no API cost. Rate-limited and occasionally returns thin or stale results. Use when the others fail or when you want a zero-cost option.

Top sentences

Extractive, not generative. Top sentences pulls representative sentences directly from result snippets and shows them as bullets above the source list. No language model is involved. The point is to help triage which sources collectively cover the topic, not to replace reading them. Intentionally different from Google's AI Overview, which generates a synthesized answer and pushes sources below it. Off by default.

The AI-content filter

A heuristic estimate, not a verdict. The filter looks at signal patterns in titles and snippets that correlate with machine-generated text. It has no access to the full page, no model trained on the specific generator that produced any given result, and no way to confirm authorship. The tag is a flag, not a fact.

How it fits the two-axis taxonomy. The filter is one structural signal feeding the SEO farm and aggregator source-type classifications. False positives on legitimate technical or formulaic writing cost the reader very little. False negatives cost more, especially against well-resourced bot-farm content, commercial astroturfing, and state-sponsored operations that do not announce themselves. The higher-order defense is the one search has always relied on: independent human scrutiny of the source landed on.