No description
Find a file
2026-04-17 00:11:50 +01:00
src update scheduler configuration for news crawler and other sources 2026-04-17 00:11:50 +01:00
.dockerignore add Docker configuration and news crawler implementation 2026-04-16 22:54:27 +01:00
.gitignore add Docker configuration and news crawler implementation 2026-04-16 22:54:27 +01:00
config.json update scheduler configuration for news crawler and other sources 2026-04-17 00:11:50 +01:00
docker-compose.yml add Docker configuration and news crawler implementation 2026-04-16 22:54:27 +01:00
Dockerfile add Docker configuration and news crawler implementation 2026-04-16 22:54:27 +01:00
package-lock.json initialize project structure with core modules and configuration 2026-04-16 22:47:34 +01:00
package.json initialize project structure with core modules and configuration 2026-04-16 22:47:34 +01:00
README.md enhance news crawler configuration with new sources and improved request headers 2026-04-16 23:32:56 +01:00
server.js initialize project structure with core modules and configuration 2026-04-16 22:47:34 +01:00

duriin_api

Node.js Fastify server that ingests news articles from RSS, SEC EDGAR 8-K filings, Alpha Vantage News Sentiment, Finnhub company news, and GDELT into a local SQLite archive.

Setup

  1. Install dependencies:
    npm install
    
  2. Edit config.json with your API keys, including openRouter.apiKey, tickers, RSS feeds, and schedules.
  3. Start the server:
    npm start
    

API

  • GET /articles?q=&source=&from=&to=&limit=&offset=
  • GET /articles?similar_to={id}&limit=
  • GET /articles?topic={query}&limit=
  • GET /articles/:id
  • GET /status

Notes

  • SQLite archive file defaults to ./archive.sqlite.
  • Deduplication is enforced on url; normalized titles are stored and indexed for matching but are not unique.
  • newsCrawler reuses rssFeeds as the publisher catalog, derives one crawler source per feed label, and supports disabledLabels plus per-label overrides for seeds and allowed hosts.
  • Article body extraction runs asynchronously after insertion, with hourly retries for rows still missing content.
  • Main article images are stored as ultra-compressed base64 WebP.
  • Embeddings are generated asynchronously with OpenRouter perplexity/pplx-embed-v1-0.6b and indexed in sqlite-vec for similarity search.
  • Topic search caches normalized query embeddings in SQLite and falls back to OpenRouter on cache miss.
  • SEC requests use the configured User-Agent.