diff --git a/README.md b/README.md
index b834a52..7ddcaab 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # duriin_api
 
-Node.js Fastify server that ingests news articles from RSS, Google News RSS, SEC EDGAR 8-K filings, Alpha Vantage News Sentiment, Finnhub company news, and GDELT into a local SQLite archive.
+Node.js Fastify server that ingests news articles from RSS, SEC EDGAR 8-K filings, Alpha Vantage News Sentiment, Finnhub company news, and GDELT into a local SQLite archive.
 
 ## Setup
 
@@ -8,7 +8,7 @@ Node.js Fastify server that ingests news articles from RSS, Google News RSS, SEC
    ```bash
    npm install
    ```
-2. Edit `config.json` with your API keys, tickers, RSS feeds, Google News settings, and schedules.
+2. Edit `config.json` with your API keys, tickers, and schedules.
 3. Start the server:
    ```bash
    npm start
@@ -20,172 +20,70 @@ The server listens on the host and port defined in `config.json`.
 
 On startup the server:
 
-1. Opens the SQLite database.
-2. Registers the article and status routes.
+1. Opens the SQLite database and runs any pending migrations.
+2. Registers routes.
 3. Starts the HTTP server.
-4. Immediately runs all ingestion sources once.
-5. Starts the cron scheduler for recurring ingestions, content backfill, and embedding backfill.
+4. Launches continuous background loops for each source, content backfill, and embedding backfill.
 
 When a new article is inserted:
 
 - the record is written immediately with `title`, `description`, `url`, `source`, and timestamps
-- `content` and `image` start as `null`
-- full article extraction runs asynchronously after insert
-- vector embeddings are generated later, after title, description, and content are all available
+- `content` starts as `null`
+- content backfill workers pick it up asynchronously — plain HTTP first, Playwright fallback for JS-heavy sites
+- vector embeddings are generated after title, description, and content are all available
+- only articles with content + embedding are exposed via the API
+
+Content backfill prioritises recent articles (`pub_date_effective DESC`) so newest content surfaces first regardless of ingestion order.
+
+Per-domain fetch policies are tracked automatically — domains that repeatedly fail plain fetch are upgraded to browser-only, domains that fail both are blocked temporarily.
 
 ## API overview
 
-All exposed endpoints are `GET` endpoints.
+All endpoints are `GET`.
 
 ### `GET /`
 
-Simple health check.
-
-**Response**
-```json
-{ "ok": true }
-```
-
-Use this to confirm the server is running, not to inspect ingestion state.
+Health check. Returns `{ "ok": true }`.
 
 ### `GET /articles`
 
-Returns articles from the `articles` table. Only articles that are considered **usable** are exposed — meaning they have non-empty `content`, a stored embedding, and are not index/category pages. Behavior changes based on the query params you send.
+Returns usable articles — non-empty `content`, stored embedding, not an index/category page.
 
 #### Query params
 
-##### `keyword`
+| Param | Description |
+|---|---|
+| `keyword` | Keyword matched against `title`, `description`, and `content`. Repeat the param for multiple keywords — e.g. `keyword=bitcoin&keyword=ethereum` |
+| `keyword_mode` | How multiple keywords are combined — `and` (default) or `or` |
+| `source` | Exact match on the stored `source` field (e.g. `rss:BBC`, `gdelt:Al Jazeera`) |
+| `from` | `pub_date >= from` (ISO-8601) |
+| `to` | `pub_date <= to` (ISO-8601) |
+| `limit` | Rows to return. Default `20`, max `100` |
+| `offset` | Pagination offset. Default `0` |
+| `order` | Sort order — see below. Not applied to `semantic` or `similar_to_article` results (those are sorted by distance) |
+| `semantic` | Semantic search by meaning via embedding similarity |
+| `similar_to_article` | Vector similarity search using another article's embedding |
 
-Plain keyword search.
+#### `order` values
 
-- matches `title`, `description`, and `content`
-- uses SQL `LIKE`
-- works like substring matching, not semantic search
-- best when you want literal words or phrases to appear in the article text
+| Value | Sort |
+|---|---|
+| `newest` | `pub_date_effective DESC` (default) |
+| `oldest` | `pub_date_effective ASC` |
+| `ingested_newest` | `ingested_at DESC` |
+| `ingested_oldest` | `ingested_at ASC` |
 
-Example:
-```http
-GET /articles?keyword=earnings
-```
+#### Search modes
 
-##### `source`
+- If `semantic` is present — semantic nearest-neighbor search. Query is embedded via OpenRouter and matched against the article index. Results include a `distance` field (lower = closer).
+- Else if `similar_to_article` is present — finds articles similar to the given article ID. Returns `404` if that article has no embedding.
+- Otherwise — normal filtered list mode. All params apply.
 
-Exact match on the stored `source` field.
+`keyword` and `source`, `from`, `to` also work as post-filters on `semantic` and `similar_to_article` results.
 
-Example:
-```http
-GET /articles?source=rss
-```
+`include_embedding` is explicitly rejected on this endpoint.
 
-##### `from`
-
-Only returns rows where `pub_date >= from`.
-
-Example:
-```http
-GET /articles?from=2025-01-01T00:00:00.000Z
-```
-
-##### `to`
-
-Only returns rows where `pub_date <= to`.
-
-Example:
-```http
-GET /articles?to=2025-01-31T23:59:59.999Z
-```
-
-##### `limit`
-
-Number of rows to return.
-
-- default: `20`
-- max: `100`
-
-Example:
-```http
-GET /articles?limit=10
-```
-
-##### `offset`
-
-Pagination offset.
-
-- default: `0`
-
-Example:
-```http
-GET /articles?limit=10&offset=20
-```
-
-##### `similar_to_article`
-
-Runs vector similarity search instead of normal list mode.
-
-- value must be an existing article ID
-- the server looks up that article's embedding
-- nearest-neighbor search runs in `sqlite-vec`
-- the source article is excluded from the result set
-- each result includes a `distance` field
-- lower `distance` means more similar
-- returns `404` if the article has no stored embedding
-
-Example:
-```http
-GET /articles?similar_to_article=123&limit=5
-```
-
-Not found response:
-```json
-{ "error": "Embedding not found for article" }
-```
-
-##### `semantic`
-
-Semantic search by meaning, not exact wording.
-
-- use this when you want conceptually related results
-- unlike `keyword`, the words do not need to appear literally in the article text
-- the query text is normalized before embedding
-- query embeddings are cached in SQLite
-- on cache miss, the server requests an embedding from OpenRouter
-- nearest article matches are returned from the embedding index
-- each result includes a `distance` field
-- lower `distance` means a closer semantic match
-- returns `400` if `semantic` is empty
-
-Example:
-```http
-GET /articles?semantic=ai chip demand&limit=10
-```
-
-Bad request response:
-```json
-{ "error": "Semantic query must not be empty" }
-```
-
-##### `include_embedding`
-
-Explicitly rejected on `/articles`.
-
-Response:
-```json
-{ "error": "Embeddings are not returned directly. Use similar_to_article for vector search." }
-```
-
-#### General behavior
-
-- If `semantic` is present, semantic search is used.
-- Else if `similar_to_article` is present, similarity search is used.
-- Otherwise normal list/search mode is used.
-- `keyword` is literal keyword matching.
-- `semantic` is semantic matching by meaning.
-- Normal list/search results are ordered by `COALESCE(pub_date, ingested_at) DESC, id DESC`.
-- `from` and `to` are compared against stored publication timestamps, so ISO-8601 values are the safest input.
-- `source` must match the stored source name exactly.
-- `keyword` is substring matching, not full-text search.
-
-#### Normal list/search response shape
+#### Response shape
 
 ```json
 [
@@ -194,113 +92,60 @@ Response:
     "title": "...",
     "description": "...",
     "content": "...",
-    "image": "...",
     "url": "...",
     "normalized_title": "...",
-    "source": "rss",
+    "source": "rss:BBC",
     "pub_date": "2025-01-01T12:34:56.000Z",
     "ingested_at": "2025-01-01T12:35:10.000Z"
   }
 ]
 ```
 
-#### Similarity/topic search response shape
-
-```json
-[
-  {
-    "id": 456,
-    "title": "...",
-    "description": "...",
-    "content": "...",
-    "image": "...",
-    "url": "...",
-    "normalized_title": "...",
-    "source": "rss",
-    "pub_date": "2025-01-02T09:00:00.000Z",
-    "ingested_at": "2025-01-02T09:00:10.000Z",
-    "distance": 0.1234
-  }
-]
-```
-
-#### Combined example
-
-```http
-GET /articles?keyword=earnings&source=rss&from=2025-01-01T00:00:00.000Z&limit=10&offset=0
-```
+Semantic and similarity results also include `"distance": 0.1234`.
 
 ### `GET /articles/:id`
 
-Returns one article by numeric ID.
-
-**Behavior**
-
-- Looks up the article directly in SQLite.
-- Same usability filter as the list endpoint — returns `404` if the article exists but is not usable.
-- Returns the same article fields as normal `/articles` list mode.
-- Does not return embedding data.
-- Returns `404` if the ID does not exist.
-
-**Example**
-```http
-GET /articles/123
-```
-
-**Not found response**
-```json
-{ "error": "Article not found" }
-```
+Returns one article by numeric ID. Same usability filter as the list endpoint — returns `404` if the article exists but has no content or embedding.
 
 ### `GET /status`
 
-Returns ingestion and archive summary information.
+Returns archive summary. Cached for 30 seconds.
 
 **Response fields**
 
-- `total`: total number of rows in `articles` across all sources
-- `usable`: articles that have content, an embedding, and are not index pages
-- `lastIngestionBySource`: in-memory timestamps of the last successful batch run per source
-- `bySource`: per-source breakdown, each with `total` and `usable` counts
+- `total` — total rows across all sources
+- `usable` — articles with content + embedding, not index pages
+- `lastIngestionBySource` — in-memory timestamps of the last successful batch per source (resets on restart)
+- `bySource` — per-source `{ total, usable }`
+- `embeddingModels` — active embedding models with article count and detected dimensions
 
-**Important detail**
+### `GET /sources`
 
-`lastIngestionBySource` is kept in memory, so it resets when the process restarts.
+Returns the full source catalog from `sources.json` enriched with live DB stats.
 
-**Example response**
-```json
-{
-  "total": 10234,
-  "usable": 8700,
-  "lastIngestionBySource": {
-    "rss": "2025-01-02T10:00:00.000Z",
-    "gdelt": "2025-01-02T10:05:00.000Z"
-  },
-  "bySource": {
-    "alphavantage": { "total": 120, "usable": 98 },
-    "edgar": { "total": 88, "usable": 70 },
-    "finnhub": { "total": 400, "usable": 360 },
-    "gdelt": { "total": 2100, "usable": 1800 },
-    "rss": { "total": 7526, "usable": 6372 }
-  }
-}
-```
+**Per-source fields**
+
+- `id`, `label`, `websites`, `backfill`, `feeds` — from `sources.json` (feed URLs preserve the `[FAILED]` prefix if the feed has been marked dead)
+- `counts` — aggregated `{ total, ready, skipped, failed, pending, untried, usable }` across all feed types for this source
+- `byFeed` — same breakdown split by feed prefix (`rss`, `gdelt`, etc.)
+- `domains` — current domain fetch policy per website: `policy` (auto / browser_only / blocked), failure/success counts, `expiresAt`
+
+Use `domains[].policy` to diagnose why a source has high `skipped` or `failed` counts — `blocked` means backfill has given up on that domain temporarily.
 
 ## Article field notes
 
-- `image` stores the extracted main image as ultra-compressed base64 WebP.
-- `normalized_title` is stored for matching and indexing.
-- `source` may be a shared source like `rss`, `googlenews`, `gdelt`, `edgar`, `alphavantage`, or `finnhub`.
-- `pub_date` is normalized to ISO-8601 when it can be parsed.
-- `ingested_at` is the insert timestamp set by the server.
+- `pub_date` is normalized to ISO-8601 when parseable; `null` otherwise.
+- `pub_date_effective` is `COALESCE(pub_date, ingested_at)` — used for sorting.
+- `ingested_at` is the server-side insert timestamp.
+- `normalized_title` is stored for deduplication and indexing.
+- `source` format is `<feed_type>:<label>` for GDELT and RSS (e.g. `gdelt:Bloomberg Markets`, `rss:TechCrunch`), or just the source name for other feeds (`alphavantage`, `edgar`, `finnhub`).
 
 ## Notes
 
-- SQLite archive file defaults to `./archive.sqlite`.
-- Deduplication is enforced on `url`; normalized titles are stored and indexed for matching but are not unique.
-- `googleNews` accepts `queries`, `topics`, `language`, and `country`, and resolves Google redirect URLs to publisher URLs before ingestion.
-- Article body extraction runs asynchronously after insertion, with scheduled retries for rows still missing content.
-- Embeddings are generated asynchronously with OpenRouter `perplexity/pplx-embed-v1-0.6b` and indexed in `sqlite-vec` for similarity search.
-- Topic search caches normalized query embeddings in SQLite and falls back to OpenRouter on cache miss.
-- SEC requests use the configured `User-Agent`.
-- Duplicate URLs are skipped rather than inserted again.
+- SQLite archive defaults to `./archive.sqlite`.
+- Deduplication is enforced on `url`.
+- GDELT ingestion streams per-window to avoid accumulating the full 6-year backlog in memory at once.
+- Content backfill uses separate concurrency pools for plain HTTP and Playwright (browser) fetches.
+- Embeddings use OpenRouter and are indexed in `sqlite-vec` for ANN search.
+- Query embeddings are cached in SQLite to avoid redundant API calls.
+- SEC requests use the `User-Agent` from `config.json`.
diff --git a/src/routes/articles.js b/src/routes/articles.js
index 8fc1543..c119f5b 100644
--- a/src/routes/articles.js
+++ b/src/routes/articles.js
@@ -12,9 +12,15 @@ function buildArticlesQuery(query) {
   const includeEmbedding = String(query.include_embedding || '').toLowerCase() === 'true';
 
   if (query.keyword) {
-    conditions.push('(title LIKE ? OR description LIKE ? OR content LIKE ?)');
-    const keyword = `%${query.keyword}%`;
-    params.push(keyword, keyword, keyword);
+    const keywords = [].concat(query.keyword).map((k) => k.trim()).filter(Boolean);
+    const mode = String(query.keyword_mode || '').toLowerCase() === 'or' ? 'OR' : 'AND';
+    const clauses = keywords.map(() => '(title LIKE ? OR description LIKE ? OR content LIKE ?)');
+
+    conditions.push(`(${clauses.join(` ${mode} `)})`);
+    for (const kw of keywords) {
+      const like = `%${kw}%`;
+      params.push(like, like, like);
+    }
   }
 
   if (query.source) {
@@ -36,6 +42,14 @@ function buildArticlesQuery(query) {
   conditions.push('is_index_page = 0');
   conditions.push('has_embedding = 1');
 
+  const ORDERS = {
+    newest: 'pub_date_effective DESC, id DESC',
+    oldest: 'pub_date_effective ASC, id ASC',
+    ingested_newest: 'ingested_at DESC, id DESC',
+    ingested_oldest: 'ingested_at ASC, id ASC',
+  };
+  const orderBy = ORDERS[query.order] || ORDERS.newest;
+
   const whereClause = `WHERE ${conditions.join(' AND ')}`;
   const limit = Number.parseInt(query.limit, 10);
   const offset = Number.parseInt(query.offset, 10);
@@ -48,7 +62,7 @@ function buildArticlesQuery(query) {
       SELECT id, title, description, content, ${includeEmbedding ? 'embedding,' : ''} url, normalized_title, source, pub_date, ingested_at
       FROM articles
       ${whereClause}
-      ORDER BY pub_date_effective DESC, id DESC
+      ORDER BY ${orderBy}
       LIMIT ? OFFSET ?
     `,
     params,
@@ -64,23 +78,58 @@ function shouldExcludeIndexPages(query) {
   return String(query.exclude_index_pages || '').toLowerCase() !== 'false';
 }
 
-function mapNeighborsToArticles(neighbors, excludeIndexPages, limit) {
+function mapNeighborsToArticles(neighbors, excludeIndexPages, limit, query = {}) {
   const ids = neighbors.map((row) => row.articleId);
   if (ids.length === 0) {
     return [];
   }
 
   const placeholders = ids.map(() => '?').join(', ');
+  const conditions = [];
+  const params = [...ids];
+
+  conditions.push(`id IN (${placeholders})`);
+  conditions.push("content IS NOT NULL AND content != ''");
+  conditions.push('has_embedding = 1');
+
+  if (excludeIndexPages) conditions.push('is_index_page = 0');
+
+  if (query.source) {
+    conditions.push('source = ?');
+    params.push(query.source);
+  }
+
+  if (query.from) {
+    conditions.push('pub_date >= ?');
+    params.push(query.from);
+  }
+
+  if (query.to) {
+    conditions.push('pub_date <= ?');
+    params.push(query.to);
+  }
+
+  if (query.keyword) {
+    const keywords = [].concat(query.keyword).map((k) => k.trim()).filter(Boolean);
+    const mode = String(query.keyword_mode || '').toLowerCase() === 'or' ? 'OR' : 'AND';
+    const clauses = keywords.map(() => '(title LIKE ? OR description LIKE ? OR content LIKE ?)');
+
+    conditions.push(`(${clauses.join(` ${mode} `)})`);
+    for (const kw of keywords) {
+      const like = `%${kw}%`;
+      params.push(like, like, like);
+    }
+  }
+
   const articles = db.prepare(`
     SELECT id, title, description, content, url, normalized_title, source, pub_date, ingested_at
     FROM articles
-    WHERE id IN (${placeholders})
-      AND content IS NOT NULL AND content != ''
-      AND has_embedding = 1
-      ${excludeIndexPages ? 'AND is_index_page = 0' : ''}
-  `).all(...ids);
+    WHERE ${conditions.join(' AND ')}
+  `).all(...params);
+
   const byId = new Map(articles.map((article) => [article.id, article]));
 
+  // preserve distance ordering from the vector search
   return neighbors
     .map((row) => {
       const article = byId.get(row.articleId);
@@ -113,7 +162,7 @@ async function articleRoutes(fastify) {
         Math.min(limit * 5, 500)
       );
 
-      return mapNeighborsToArticles(neighbors, excludeIndexPages, limit);
+      return mapNeighborsToArticles(neighbors, excludeIndexPages, limit, query);
     }
 
     if (query.similar_to_article) {
@@ -130,7 +179,7 @@ async function articleRoutes(fastify) {
         return { error: 'Embedding not found for article' };
       }
 
-      return mapNeighborsToArticles(neighbors, excludeIndexPages, limit);
+      return mapNeighborsToArticles(neighbors, excludeIndexPages, limit, query);
     }
 
     const { sql, params } = buildArticlesQuery(query);