Duriin-API/README.md

34 lines
1.4 KiB
Markdown

# duriin_api
Node.js Fastify server that ingests news articles from RSS, SEC EDGAR 8-K filings, Alpha Vantage News Sentiment, Finnhub company news, and GDELT into a local SQLite archive.
## Setup
1. Install dependencies:
```bash
npm install
```
2. Edit `config.json` with your API keys, including `openRouter.apiKey`, tickers, RSS feeds, and schedules.
3. Start the server:
```bash
npm start
```
## API
- `GET /articles?q=&source=&from=&to=&limit=&offset=`
- `GET /articles?similar_to={id}&limit=`
- `GET /articles?topic={query}&limit=`
- `GET /articles/:id`
- `GET /status`
## Notes
- SQLite archive file defaults to `./archive.sqlite`.
- Deduplication is enforced on `url`; normalized titles are stored and indexed for matching but are not unique.
- `newsCrawler` reuses `rssFeeds` as the publisher catalog, derives one crawler source per feed label, and supports `disabledLabels` plus per-label `overrides` for seeds and allowed hosts.
- Article body extraction runs asynchronously after insertion, with hourly retries for rows still missing content.
- Main article images are stored as ultra-compressed base64 WebP.
- Embeddings are generated asynchronously with OpenRouter `perplexity/pplx-embed-v1-0.6b` and indexed in `sqlite-vec` for similarity search.
- Topic search caches normalized query embeddings in SQLite and falls back to OpenRouter on cache miss.
- SEC requests use the configured `User-Agent`.