RSS Watch provides feed browsing, category grouping, optional AI summaries, and a full REST API for scraper workers and client applications.
RSS Watch is both a public feed reader and a backend ingestion pipeline. Public pages (/feed, category views, and entry pages) are for readers. The scraper workers and trigger jobs fetch configured sources and post normalized payloads into Tools for processing.
RSS Watch workers identify themselves as bot traffic and include a documentation URL in the User-Agent string.
Mozilla/5.0 (compatible; RSSWatchBot; +https://tools.tornevall.net/docs/rsswatch; runner=scrape; host=<agent-hostname>)Mozilla/5.0 (compatible; RSSWatchBot; +https://tools.tornevall.net/docs/rsswatch; runner=trigger; host=<agent-hostname>)Operational note: scheduler/agent reporting is still keyed by agent_id (hostname) in API calls; changing User-Agent text does not change agent attribution logic.
Scrape workers now also write timestamped run logs. By default the runner tries SCRAPE_LOG_FILE, then /var/log/tools-scraper/scrape.log, then local fallback paths inside the scraper project when those locations are writable.
/feed/feed to search titles, descriptions, content, site names, and categories without leaving the page/feed-admin (query per_page still overrides per request)/feed/c/{categorySlug} (includes the same history/diff view style as regular feed pages)/feed/cards/{categorySlug} for a card-based analysis view/feed and /feed/cards/{categorySlug} now also merge analytics by category slug before the visible edition is chosen, so the same category still shows its newest current-bucket edition even if older cached rows used a slightly different display-name spelling or casing.history=1, history_limit)./feed, /feed/c/{categorySlug}, and /feed/cards/{categorySlug} now rewrite any accidentally stored localhost/loopback URLs to the active public host before display./articles/foo) against the owning feed's real_url/url, so /out/{contentId} and visible source-link labels stay fully qualified.publishby value (for example a numeric ID) and an operator has added a manual mapping for that feed./feed and use a hash-based direct URL instead, for example /feed/key/{public-hash} and /api/rss/feed/{public-hash}/api/rss/feed/{selector}:
analytics-daily (alias daily-analytics)analytics-weekly (alias weekly-analytics)analytics-monthly (alias monthly-analytics)analytics-yearly (alias yearly-analytics)analytics-bulk (alias bulk-analytics, combines daily/weekly/monthly/yearly)/api/rss/feed/{category-slug} for category analytics and /feed/{site-urlid} for site analytics)/feed now includes a mini Ask about all open feeds card where visitors can submit free-form questions against the current open RSS dataset/feed now also shows a dedicated support/contact block next to the public feed disclaimers, linking both to the ordinary contact page and directly to support@tornevall.net for questions, complaints, corrections, or feed-related issuesPOST /feed/user-questionsGET /feed/user-questionssabah) are less likely to disappear from retrieval.user_id or guest), IP, user-agent, status, answer/error, and response metadata for audit/history/feed question submit is AJAX-first and renders answer/error inline; standard form post/redirect remains as fallback if JavaScript is unavailable/feed and on site-specific feed pages now uses a two-step background flow for better responsiveness:
POST /feed/user-questions with start_only=1 reserves the question row and returns a short-lived client token plus progress URLsPOST /feed/user-questions/{question}/process runs the actual analysis in the background requestGET /feed/user-questions/{question}/status lets the browser poll current phase/progress until the answer is ready/status checks. Stuck polls are aborted after a short timeout, the browser backs off while the page is hidden, and progress refresh resumes cleanly when the tab becomes active again.process request itself already finishes with the final answer, the browser now accepts that direct response immediately instead of continuing to spam extra /status polling until the next timer tick.content.publishby before broader keyword fallback.publishby values instead of real names, Feed Q&A now also uses the per-feed manual publishby mappings from the RSS editor so those questions can still match the intended author.publishby values are missing/inconsistent./feed now shows those phrases in a readonly helper box under the latest answer so readers can understand what the backend also looked for before answering./feed and the site-specific feed question panel now also show a separate readonly search keywords used box, so readers can see the literal queryable terms/phrases the backend searched for before any broader expansion terms were applied./feed now supports follow-up question chaining: users can pick a previous completed answer as follow-up context, and the backend reuses prior retrieval terms plus prior answer context when going deeper.use_web_search) so RSS answers can run with web-search-backed verification when enabled./feed and on site-specific feed pages now show whether web search was actually used for that answer and, when available, list the returned citation links.Tara Sabah / Tara Saleh instead of awkward helper-text variants like tara sabah artiklar or tara saleh källor.strict, balanced, expansive) both on /feed and on site-specific feed question panels./feed/user-questions now supports pagination with configurable rows-per-page to keep history manageable when question volume grows/feed/user-questions via AJAX inline actions (server redirect fallback still works without JavaScript)/feed/user-questions and /feed-admin now also include answer model and answer tone controls for OpenAI question responsesstatus, user_id (or guest), ip, and deleted scope (active/deleted/all)hard delete removes rows permanentlysoft delete keeps rows with deletion metadata (deleted_at, deleted_by_user_id, deleted_reason)POST /feed/user-questions/{question}/restore) for undo workflow.daily, weekly, monthly, or yearly windows.version_history.articles contains several revised articles.content rows by keyword matches in title/description/content, and then only relevant rows are expanded into context/version-history blocks.matched_entries evidence layer, so concrete title, description, content excerpts, publishby, feed_title, and article links are still available when the final answer is written.anders sydborg) before looser supporting tokens, which reduces false positives from broad first-name-only matches.har, du, något, spännande, with, about, something, and similar words is now filtered out even when OpenAI or deterministic fallback extraction sees them in the original question.daily, weekly, monthly) allow a broader keyword result cap than yearly and all_time, so specific short-term questions can include richer evidence without exploding prompt size.5 passes and is admin-configurable; if the cap is reached, the system proceeds to final answer generation with the best context collected so far./feed/user-questions now includes a compact How this answer was analyzed box on completed rows, showing method, scope/retrieval details, the literal search keywords used, and any extra search terms used before the final answer./feed/entry/{contentId})/feed/entry/{contentId} can now switch between the existing highlighted diff view and a side-by-side before/after comparison using the page-level diff toggle (?diff_view=inline or ?diff_view=side_by_side)./feed/feed/c/{categorySlug} in a new tab/feed overview now refreshes those Read more buttons when a collapsed category card is opened, so long analyses are expandable there again instead of only on the cards page./feed now shows exactly one selected public edition per period (daily, weekly, monthly, yearly) for each category/site block./feed; use /feed/cards/{categorySlug}?show_all=1 when you intentionally want to review every stored edition for a category./feed./rsspermission:rss/rss/{urlid}/publishby-mappingspublishby values (for example 4, 27, or other unfinished identifiers) to a real display nameurlid, so the same raw publishby value can be reused by a different feed without collisions/feed pages, category cards, edited-post views, Feed Q&A, and /api/rss/feed/{site} readersSite Type now supports explicit xpath and json flows in addition to rss and wpsitetype=xpath, elements JSON rules are required and validated server-side before saveelements accepts both supported extraction shapes:
begin + table) for JSON payload traversal/rss list UI, elements editing is now shown only for xpath rows and uses a collapsible panel with AJAX save on blur/rss/xpath-lab lets editors paste HTML snippets, inspect a DOM outline, run/test XPath queries, and review a SimpleXML-style XML preview before saving rulesURL while the originally entered page URL is preserved in Real URL/rss add form now performs a server-side duplicate guard before insert. If the submitted URL or Real URL already matches an existing feed row (including cross-matches between those fields), the insert is blocked and the matching row is shown in the warning message.external?type=rss2... and external.php?type=RSS2..., including relative <link rel="alternate"> feed URLs found on forum pages.content.publishby when feeds provide it (for example dc:creator-style fields).content:encoded, dc:creator, guid, pubDate), prioritizes content:encoded as full body when present, and falls back to stable guid/link/hash identifiers when links are missing./feed-adminrss.posting.handle permission now also get a dedicated Posting Queue UI at both /feed-admin/posting-queue and /rss/posting-queueHandle group, Handle page, Handled group, and Handled page, together with the article title, a short excerpt, and the source linkreview, group, page, and done modes so one bot or operator can focus only on pending group posts, only pending page posts, or the handled archivesegment anchor -> global anchor -> current datedaily = that day, weekly = ISO week with Monday start, monthly = calendar month, yearly = calendar year)/feed-admin now also includes a Feed script snippets / embeddable widgets panel where operators can store reusable JavaScript boxes, autosave them over AJAX, ask AI to generate the inline script body, and copy one merged bundle URL for external site embeds./feed-admin with per-period enable/disable and time fields for daily, weekly, monthly, and yearly.schedule:run or directly via php artisan rss:run-scheduled-analytics).[AUTO] Scheduled analytics) and always run with overwrite-current semantics so automatic variants update in place.php artisan rss:generate-analytics --period=yearly --force (or --overwrite-current) to explicitly re-run and overwrite the current bucket row even when the underlying snapshot is unchanged./rss-maintenance-adminhttps://tools.tornevall.com/apihttps://tools.tornevall.net/apiRSS API endpoints use provider-scoped API keys:
scraper=1 required on /api/rss/urls/api/rss, /api/rss/feed/{site})/api/rss/data) accept the request without additional auth (rate-limited)/api/rss/analytics/run) requires Authorization: Bearer <ANALYTICS_CRON_SECRET>/api/rss/posting-queue/items*) require an authenticated web session plus the dedicated rss.posting.handle permission/api/managed-scripts/feed and /api/managed-scripts/feed/bundle.js; see docs/frontend-script-boxes for the shared script-box contract.feed script endpoints also accept additive filter query params such as feed_ids, feeds, categories, groups, category_slugs, and limit, and they echo the normalized result back as query_context.filters.GET /api/rss/posting-queue/itemsReturns the operator posting queue as JSON for the current authenticated user.
Query params:
queue=review|group|page|done (default review)category=<category-slug>feed_id=<urlid>per_page=<10..250>Response includes:
filterspaginationitems[] with contentid, urlid, title, description, excerpt, link, feed_title, feed_category, entry_url, and queue_statusPATCH /api/rss/posting-queue/items/{contentId}Updates one queue row's checkbox state.
Supported boolean fields:
handle_grouphandle_pagehandled_grouphandled_pageOptional helper field:
queue=review|group|page|done so the response can also tell the caller whether the row should be removed from the current filtered list immediately.Success response includes:
contentidqueue_statusshould_removeGET /api/rssReturns overview with URL list, categories, and available parameter info.
Response (200)
{
"urls": [
{
"urlid": 12,
"title": "Example Feed",
"url": "https://example.com/rss",
"real_url": "https://example.com/rss",
"category": "News",
"readinterval": 60,
"lastscrape": "2026-04-01 08:00:00",
"publicSelector": "12",
"hidden": false,
"feedUrl": "/api/rss/feed/12",
"categoryFeedUrl": "/api/rss/feed/news"
}
],
"categories": [
{ "name": "News", "slug": "news", "feedCount": 3 }
],
"availParams": { "always": [0, 1], "scraper": [0, 1] }
}
GET /api/rss/urlsReturns scrapeable feed URLs. Requires scraper=1.
Query parameters
| Parameter | Required | Description |
|---|---|---|
scraper |
yes | Must be 1 or request is rejected with 403 |
agent_id |
recommended | Stable agent/worker ID. Used for per-agent due-window tracking |
agent_name |
fallback | Fallback identifier when agent_id is absent |
always |
no | 0 (default) = due-set for agent; 1 = all scrapeable rows |
limit |
no | Max rows when always=0 |
always=0 behavior:
readinterval and the agent's last-claim timestampidle report with wait hints (wait_seconds, next_poll_at, and next_url) so scraper clients can back off intelligently instead of guessing.always=1 behavior:
deleted=0 and noscrape=0 — no interval filteringResponse (200)
{
"urls": [
{
"urlid": 12,
"url": "https://example.com/rss",
"real_url": "https://example.com/rss",
"readinterval": 60,
"sitetype": "rss",
"method": "GET",
"elements": null
}
],
"agent": { "agent_id": "scraper-node-1", "resolved_name": "scraper-node-1" },
"idle": {
"is_idle": false,
"reason": "urls_available",
"schedule_mode": "per_agent_seen",
"wait_seconds": null,
"wait_minutes": null,
"next_poll_at": null,
"next_url": null,
"scrapeable_total": 125
}
}
Idle response example (no URLs currently due):
{
"urls": [],
"agent": { "agent_id": "scraper-node-1", "resolved_name": "scraper-node-1" },
"idle": {
"is_idle": true,
"reason": "no_urls_due_for_agent",
"schedule_mode": "per_agent_seen",
"wait_seconds": 173,
"wait_minutes": 3,
"next_poll_at": "2026-04-05 14:20:00",
"next_url": {
"urlid": 12,
"title": "Example Feed",
"url": "https://example.com/rss"
},
"scrapeable_total": 125
}
}
The bundled scrape runner uses these idle fields to log how long it should wait before the next due URL is expected.
POST /api/rss/dataIngest raw feed payload from a scraper worker.
Request body
{
"agent_id": "scraper-node-1",
"content": {
"12": "<rss version=\"2.0\">...</rss>",
"34": "<html>...</html>"
}
}
| Field | Type | Description |
|---|---|---|
agent_id |
string | Optional but recommended; identifies the scraper |
content |
object | Keys are numeric urlid strings; values are raw feed or HTML payload |
Response (200)
{
"received": {
"12": { "dataLength": 14200, "url": "https://example.com/rss", "title": "Example Feed" },
"34": { "dataLength": 82000, "url": "https://example.com/nyheter", "title": "Nyheter" }
},
"exceptions": {},
"agent": { "agent_id": "scraper-node-1" }
}
Each received[{urlid}] item includes:
dataLength – byte length of accepted payloadurl – source URL from urls.urltitle – feed title from urls.title (useful for logs and Google Alerts traceability)GET /api/rss/updateProcesses queued inbound payloads into normalized content and triggers subscription delivery.
Query parameters
| Parameter | Description |
|---|---|
urlid |
Optional integer. When present, only inbound rows for this feed are processed in this call |
Queue maintenance behavior:
handled=1, processlock=0) are purged on each runResponse (200)
{
"converted": 12,
"skipped": 2,
"urlid_filter": 27,
"subscriptionNotifications": {
"processed": 3,
"delivered": 2,
"skipped": 1
}
}
urlid_filter is included only when the urlid query parameter was used.
GET /api/rss/feed/{site}Returns an Atom/RSS feed for a specific site or category.
Selector behavior for {site}:
urlid — specific feedpublic_hash — direct access for hidden feeds (bypasses hidden filter)analytics-daily / daily-analyticsanalytics-weekly / weekly-analyticsanalytics-monthly / monthly-analyticsanalytics-yearly / yearly-analyticsanalytics-bulk / bulk-analytics (all periods combined)# Public feed for urlid 12
curl "https://tools.tornevall.com/api/rss/feed/12" -H "Accept: application/atom+xml"
# Category feed
curl "https://tools.tornevall.com/api/rss/feed/news" -H "Accept: application/atom+xml"
# Weekly analytics feed
curl "https://tools.tornevall.com/api/rss/feed/analytics-weekly" -H "Accept: application/atom+xml"
POST /api/rss/analytics/runTrigger AI analytics generation programmatically (for cron or external triggers).
Authentication: Authorization: Bearer <ANALYTICS_CRON_SECRET>
Request body (all fields optional)
{
"period": "weekly",
"categories": "all",
"sites": false,
"model": "gpt-4o-mini",
"user_id": 1,
"max_tokens": 1400,
"dry_run": false
}
| Field | Type | Default | Description |
|---|---|---|---|
period |
string | weekly |
daily, weekly, monthly, yearly, all |
categories |
string | all |
Category names comma-separated, or all |
sites |
boolean | false |
Also generate site-level analytics |
model |
string | gpt-4o-mini |
OpenAI model name |
user_id |
integer | 1 |
User ID for attribution |
max_tokens |
integer | 1400 |
Max output tokens per analysis item |
dry_run |
boolean | false |
Plan run without calling AI |
Response (200)
{
"ok": true,
"dry_run": false,
"periods": ["weekly"],
"languages": ["en", "sv"],
"stats": {
"categories_ok": 5,
"categories_err": 0,
"sites_ok": 0,
"sites_err": 0
},
"log": [
{ "type": "category", "category": "News", "period": "weekly", "status": "ok", "languages": ["en","sv"] }
]
}
/feed/user-questions)Submit free-form questions against the RSS dataset and retrieve answer history.
POST /feed/user-questionsSubmit a question. Guests require Cloudflare Turnstile. Authenticated users skip Turnstile.
Request (form or JSON)
| Field | Type | Description |
|---|---|---|
question |
string | The question text |
question_period |
string | daily, weekly, monthly (default), yearly, all_time |
period |
string | Legacy alias for question_period |
keyword_aggressiveness |
string | strict, balanced (default), or expansive |
focus_categories[] |
array | Limit context to specific category slugs |
focus_site_ids[] |
array | Limit context to up to 10 specific urlid values |
cf-turnstile-response |
string | Cloudflare Turnstile token (guests only) |
JSON response (when Accept: application/json)
{
"ok": true,
"answer": "Based on the available context...",
"question_id": 42,
"model": "gpt-5.4",
"latency_ms": 1830,
"context_stats": {
"feed_count": 17,
"entry_count": 120,
"period_type": "monthly",
"period_start": "2026-03-01 00:00:00",
"period_end": "2026-03-31 23:59:59"
}
}
Focus selector behavior:
focus_site_ids[] with ≤10 IDs activates JSON context mode: AI receives a version_history.articles section with all stored edit history across ALL time (not limited by the selected period)focus_categories[] filters context to matching feeds onlyfocus_site_ids[] appliesGET /feed/user-questionsReturns paginated question history. Authenticated users see their own questions; admins see all.
Query parameters: page, per_page (10/25/50/100/200), status, user_id, ip, scope (active/deleted/all)
When asking questions via focus_site_ids[] (≤10 sites), the AI context includes structured JSON with a version_history.articles block:
{
"context_type": "site_focused",
"version_history": {
"included": true,
"article_count": 8,
"note": "Compact all-time version-history sample for edited articles in scope.",
"articles": [
{
"title": "Article title at latest version",
"version_count": 12,
"returned_version_count": 12,
"versions_truncated": false,
"title_change_count": 1,
"first_published": "2026-02-18 13:05",
"last_published": "2026-03-29 10:22",
"versions": [
{
"v": 1,
"hash": "547e39",
"published": "2026-02-18 13:05",
"title": "Original title",
"description": "First 220 chars of description...",
"content": "First 320 chars of content if present..."
}
]
}
]
}
}
Each version sample includes:
v — version sequence number (1 = oldest, N = newest)hash — 6-char prefix of content_hash for traceabilitypublished — timestamp of that stored versiontitle — title at that versiondescription — first 220 characters of description (when not empty)content — first 320 characters of content (when not empty)When version_count ≤ VERSION_SAMPLE_LIMIT all versions are returned. For articles with more versions than the limit, representative samples are selected across the full history span (first, last, and evenly spaced intermediates).
/rss/xpath-lab)The XPath Lab is available at /rss/xpath-lab and requires permission:rss.
elements pipeline rule (same 5-element JSON format as urls.elements)value extractors such as href or src will be skipped by the legacy rendererelements / Pipeline JSON formatXPath feeds (sitetype=xpath) store extraction rules in urls.elements as a 5-element JSON array:
[
element 0: Row selectors (array of absolute XPaths)
element 1: Field XPaths (object: fieldName → array of relative XPaths)
element 2: Value types used (prefer just "value" when possible)
element 3: Source map (object: fieldName → "mainNode")
element 4: Output mapping (object: RSS field → [fieldName, valueType])
]
Required output fields in element 4: title, description, link. Optional: pubdate.
Recommended example — attribute-node fields mapped with value:
[
[
"//*[@role='main']//ol[contains(concat(' ',normalize-space(@class),' '),' sap-search__result ')]/li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]",
"//*[@role='main']//li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]",
"//li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]"
],
{
"titleNode": [
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sv-notopmargin ')]/a",
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[@tabindex='0']/a"
],
"titleHrefNode": [
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sv-notopmargin ')]/a/@href",
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[@tabindex='0']/a/@href"
],
"summary": [
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/p[contains(concat(' ',normalize-space(@class),' '),' normal ')]",
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/p"
],
"date": [
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__meta ')]/span[2]",
"/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__meta ')]/*[2]"
]
},
["value"],
{
"titleNode": "mainNode",
"titleHrefNode": "mainNode",
"summary": "mainNode",
"date": "mainNode"
},
{
"title": ["titleNode", "value"],
"description": ["summary", "value"],
"link": ["titleHrefNode", "value"],
"pubdate": ["date", "value"]
}
]
XPath conventions:
/// (single slash = child axis relative to matched row node). The scraper prepends . at runtime to make them context-relative/@href, /@src, /@datetime) and map it with "value"contains(@class, ...) patterns should use the concat(' ',normalize-space(@class),' ') idiom for exact class-name matchingImportant legacy-parser note:
value extractors unless pipeline[3] also declares that extractor key (for example "href": "mainNode").link, src, or datetime, rewrite the field as a dedicated attribute-node XPath and map it with "value".Using the XPath Lab to debug a pipeline:
elements JSON into the Pipeline rules fieldhref / src compatibility, rewrite those values as dedicated attribute-node fields (/@href, /@src, /@datetime) mapped with "value"Authenticated users can subscribe per:
Available channels depend on your account settings.
On /feed/subscriptions, channel settings are now AJAX-first with autosave:
If JavaScript is unavailable, the same forms still fall back to normal server post/redirect behavior.
Discord setup notes on /feed/subscriptions:
webhook.incoming and returns to the app callback (/oauth/discord/callback).Delivery behavior:
/api/rss/update converts inbound rows/feed/entry/{contentId}), with the original source URL included as secondary fallbackrss:notify-subscribers task still runs every 15 minutes as fallback/retry protection if an immediate delivery is missed or a channel fails temporarily