34 lines
1.4 KiB
Markdown
34 lines
1.4 KiB
Markdown
# duriin_api
|
|
|
|
Node.js Fastify server that ingests news articles from RSS, SEC EDGAR 8-K filings, Alpha Vantage News Sentiment, Finnhub company news, and GDELT into a local SQLite archive.
|
|
|
|
## Setup
|
|
|
|
1. Install dependencies:
|
|
```bash
|
|
npm install
|
|
```
|
|
2. Edit `config.json` with your API keys, including `openRouter.apiKey`, tickers, RSS feeds, and schedules.
|
|
3. Start the server:
|
|
```bash
|
|
npm start
|
|
```
|
|
|
|
## API
|
|
|
|
- `GET /articles?q=&source=&from=&to=&limit=&offset=`
|
|
- `GET /articles?similar_to={id}&limit=`
|
|
- `GET /articles?topic={query}&limit=`
|
|
- `GET /articles/:id`
|
|
- `GET /status`
|
|
|
|
## Notes
|
|
|
|
- SQLite archive file defaults to `./archive.sqlite`.
|
|
- Deduplication is enforced on `url`; normalized titles are stored and indexed for matching but are not unique.
|
|
- `newsCrawler` reuses `rssFeeds` as the publisher catalog, derives one crawler source per feed label, and supports `disabledLabels` plus per-label `overrides` for seeds and allowed hosts.
|
|
- Article body extraction runs asynchronously after insertion, with hourly retries for rows still missing content.
|
|
- Main article images are stored as ultra-compressed base64 WebP.
|
|
- Embeddings are generated asynchronously with OpenRouter `perplexity/pplx-embed-v1-0.6b` and indexed in `sqlite-vec` for similarity search.
|
|
- Topic search caches normalized query embeddings in SQLite and falls back to OpenRouter on cache miss.
|
|
- SEC requests use the configured `User-Agent`.
|