← Back to docs

RSS Watch

Language: EN | EN | SV

RSS Watch

RSS Watch provides feed browsing, category grouping, optional AI summaries, and a full REST API for scraper workers and client applications.

RSS Watch is both a public feed reader and a backend ingestion pipeline. Public pages (/feed, category views, and entry pages) are for readers. The scraper workers and trigger jobs fetch configured sources and post normalized payloads into Tools for processing.

Crawler identity and User-Agent strings

RSS Watch workers identify themselves as bot traffic and include a documentation URL in the User-Agent string.

  • Scrape runner UA pattern:
    • Mozilla/5.0 (compatible; RSSWatchBot; +https://tools.tornevall.net/docs/rsswatch; runner=scrape; host=<agent-hostname>)
  • Trigger runner UA pattern:
    • Mozilla/5.0 (compatible; RSSWatchBot; +https://tools.tornevall.net/docs/rsswatch; runner=trigger; host=<agent-hostname>)

Operational note: scheduler/agent reporting is still keyed by agent_id (hostname) in API calls; changing User-Agent text does not change agent attribution logic.

Scrape workers now also write timestamped run logs. By default the runner tries SCRAPE_LOG_FILE, then /var/log/tools-scraper/scrape.log, then local fallback paths inside the scraper project when those locations are writable.

Public Feed Page

  • URL: /feed
  • Browse feeds grouped by category
  • Use the new free-text post search at the top of /feed to search titles, descriptions, content, site names, and categories without leaving the page
  • Search results are loaded with AJAX, grouped by category, and temporarily replace the normal feed/category/site browser until Clear is pressed
  • Default Feeds per page is now configurable from /feed-admin (query per_page still overrides per request)
  • Open feed source or category RSS
  • Open the new aggregated category reader page at /feed/c/{categorySlug} (includes the same history/diff view style as regular feed pages)
  • Open the new compact category cards page at /feed/cards/{categorySlug} for a card-based analysis view
  • The cards page now keeps AI analyses at the top (exactly one selected variant per period: daily/weekly/monthly/yearly), and then shows the normal category article list below.
  • The cards page now also sorts those analysis editions by the actual period bucket instead of by generation time alone, so a regenerated older week/month no longer appears ahead of a newer bucket just because it was rebuilt later.
  • Public /feed and /feed/cards/{categorySlug} now also merge analytics by category slug before the visible edition is chosen, so the same category still shows its newest current-bucket edition even if older cached rows used a slightly different display-name spelling or casing.
  • Category articles on cards view include the same history/diff behavior as the classic category view (history=1, history_limit).
  • The cards page now uses a more editorial/news-magazine presentation: masthead-style category header, analysis edition cards, and a lead-story-first article layout.
  • Analysis links rendered on /feed, /feed/c/{categorySlug}, and /feed/cards/{categorySlug} now rewrite any accidentally stored localhost/loopback URLs to the active public host before display.
  • Public article/source links now also resolve relative source URIs (for example /articles/foo) against the owning feed's real_url/url, so /out/{contentId} and visible source-link labels stay fully qualified.
  • Feed/article views now show a resolved author name when the source feed only stored an unfinished publishby value (for example a numeric ID) and an operator has added a manual mapping for that feed.
  • See first and last discovery timestamps
  • View cached AI summaries per category
  • Feed Admin now also follows the selected anchor date when it picks the preferred category/site analysis for a period, so reviewing an older bucket no longer silently falls back to whatever is current today.
  • Feeds marked as hidden are intentionally omitted from /feed and use a hash-based direct URL instead, for example /feed/key/{public-hash} and /api/rss/feed/{public-hash}
  • Analytics feeds are now available as dedicated selectors under /api/rss/feed/{selector}:
    • analytics-daily (alias daily-analytics)
    • analytics-weekly (alias weekly-analytics)
    • analytics-monthly (alias monthly-analytics)
    • analytics-yearly (alias yearly-analytics)
    • analytics-bulk (alias bulk-analytics, combines daily/weekly/monthly/yearly)
  • Analytics feed items now link back to related category/site targets when available (/api/rss/feed/{category-slug} for category analytics and /feed/{site-urlid} for site analytics)
  • /feed now includes a mini Ask about all open feeds card where visitors can submit free-form questions against the current open RSS dataset
  • /feed now also shows a dedicated support/contact block next to the public feed disclaimers, linking both to the ordinary contact page and directly to support@tornevall.net for questions, complaints, corrections, or feed-related issues

Feed Q&A and history

  • Submit endpoint: POST /feed/user-questions
  • History page: GET /feed/user-questions
  • Guests must pass Cloudflare Turnstile before submit and are limited by configurable daily/weekly quotas
  • Logged-in users are also quota-limited (higher defaults), with separate configurable daily/weekly limits
  • Admin users are unlimited and bypass Turnstile for this feature
  • Ask-a-question now keeps literal fallback search tokens alongside the AI keyword plan, so short concrete terms that are searchable in SQL/LIKE form (for example sabah) are less likely to disappear from retrieval.
  • When a match only exists in article description/body text, the model now also receives short keyword-aware excerpts from those fields instead of title-only entry summaries.
  • Guest intake also has a configurable concurrent protection (distinct active guest IPs in a short processing window)
  • Every question stores timestamp, actor (user_id or guest), IP, user-agent, status, answer/error, and response metadata for audit/history
  • /feed question submit is AJAX-first and renders answer/error inline; standard form post/redirect remains as fallback if JavaScript is unavailable
  • AJAX question submit on /feed and on site-specific feed pages now uses a two-step background flow for better responsiveness:
    • POST /feed/user-questions with start_only=1 reserves the question row and returns a short-lived client token plus progress URLs
    • POST /feed/user-questions/{question}/process runs the actual analysis in the background request
    • GET /feed/user-questions/{question}/status lets the browser poll current phase/progress until the answer is ready
  • Those progress polls now run one request at a time instead of stacking overlapping /status checks. Stuck polls are aborted after a short timeout, the browser backs off while the page is hidden, and progress refresh resumes cleanly when the tab becomes active again.
  • If the background process request itself already finishes with the final answer, the browser now accepts that direct response immediately instead of continuing to spam extra /status polling until the next timer tick.
  • The submit button now switches to a visible spinner state while the browser polls the backend.
  • A progress card now shows the current analysis phase, elapsed time, and when available the current pre-analysis pass number and context-block number.
  • For person-targeted questions (for example "articles Andreas Magnusson wrote"), Feed Q&A now does an author-first retrieval pass against content.publishby before broader keyword fallback.
  • If a feed stores ID-like or legacy publishby values instead of real names, Feed Q&A now also uses the per-feed manual publishby mappings from the RSS editor so those questions can still match the intended author.
  • This improves author attribution precision when bylines are present as names, while still allowing a fallback search if publishby values are missing/inconsistent.
  • When the system broadened retrieval with extra related search phrases, /feed now shows those phrases in a readonly helper box under the latest answer so readers can understand what the backend also looked for before answering.
  • /feed and the site-specific feed question panel now also show a separate readonly search keywords used box, so readers can see the literal queryable terms/phrases the backend searched for before any broader expansion terms were applied.
  • /feed now supports follow-up question chaining: users can pick a previous completed answer as follow-up context, and the backend reuses prior retrieval terms plus prior answer context when going deeper.
  • Follow-up questions now also reuse a compact list of the earlier answer's stored reference hits, so deeper questions can continue from the same concrete article/source evidence instead of only from the earlier wording.
  • Feed Q&A settings now include an explicit web search toggle (use_web_search) so RSS answers can run with web-search-backed verification when enabled.
  • The latest answer boxes on /feed and on site-specific feed pages now show whether web search was actually used for that answer and, when available, list the returned citation links.
  • Those same latest-answer boxes now also show the stored reference hits that were kept from the retrieval stage, making it easier to inspect which articles/sources grounded the answer before asking a deeper follow-up question.
  • The newest Feed Q&A answer surfaces now render safe markdown consistently, so links and paragraph formatting in latest/recent answers are shown as formatted content instead of raw markdown/plain text.
  • Those extra search phrases are now cleaned into literal/queryable terms before display and retrieval reuse, so person/entity phrases stay closer to Tara Sabah / Tara Saleh instead of awkward helper-text variants like tara sabah artiklar or tara saleh källor.
  • Users can now choose keyword search breadth per question (strict, balanced, expansive) both on /feed and on site-specific feed question panels.
  • /feed/user-questions now supports pagination with configurable rows-per-page to keep history manageable when question volume grows
  • Admins can now delete question rows from /feed/user-questions via AJAX inline actions (server redirect fallback still works without JavaScript)
  • Admin settings on /feed/user-questions and /feed-admin now also include answer model and answer tone controls for OpenAI question responses
  • Admin history view now includes filters for status, user_id (or guest), ip, and deleted scope (active/deleted/all)
  • Admin moderation now supports both single-row and bulk deletion with explicit delete mode:
    • hard delete removes rows permanently
    • soft delete keeps rows with deletion metadata (deleted_at, deleted_by_user_id, deleted_reason)
  • Soft-deleted rows can now be restored via admin action (POST /feed/user-questions/{question}/restore) for undo workflow.
  • Feed Q&A now also supports an all-time / whole-database period so questions can search across the full stored RSS history instead of only daily, weekly, monthly, or yearly windows.
  • Large question contexts now use sequential block analysis: the backend splits the JSON context into smaller blocks, sends them one by one, and carries a compact rolling summary forward between OpenAI calls.
  • This reduces large one-shot token spikes while preserving broader evidence, especially when version_history.articles contains several revised articles.
  • Feed Q&A now runs a keyword-first retrieval pass before full context assembly: OpenAI derives compact search terms from the question, the backend narrows candidate content rows by keyword matches in title/description/content, and then only relevant rows are expanded into context/version-history blocks.
  • Those SQL-matched rows now also travel forward as a dedicated matched_entries evidence layer, so concrete title, description, content excerpts, publishby, feed_title, and article links are still available when the final answer is written.
  • This especially improves questions like "who is writing about X right now?", because the final model can inspect the actual matched rows instead of only seeing aggregate counts such as how many feeds/posts matched.
  • The keyword extractor now expects a strict JSON keyword list from OpenAI and orders terms from most specific phrase to broader supporting terms.
  • Multi-word person/entity names are now prioritized as full phrases (for example anders sydborg) before looser supporting tokens, which reduces false positives from broad first-name-only matches.
  • Generic question filler such as har, du, något, spännande, with, about, something, and similar words is now filtered out even when OpenAI or deterministic fallback extraction sees them in the original question.
  • When OpenAI keyword extraction is unavailable, the deterministic fallback now keeps stronger phrase-style candidates (quoted phrases and clear proper-name sequences) and only broadens with more specific trailing tokens such as surnames.
  • Admins can set a default keyword-breadth mode, while the current user can still override it per question.
  • Keyword planning is now more scope-anchored: extra search terms must stay tied to the user question and, when selected, the active category/site scope. Overly generic planner terms are filtered out before SQL retrieval runs.
  • Short-window periods (daily, weekly, monthly) allow a broader keyword result cap than yearly and all_time, so specific short-term questions can include richer evidence without exploding prompt size.
  • Feed Q&A can now run a bounded multi-pass pre-analysis loop before the final answer: each pass may ask OpenAI what to look for next, run another SQL-backed narrowing step, and stop either when enough context is found or when the configured pass cap is reached.
  • The default pre-analysis cap is 5 passes and is admin-configurable; if the cap is reached, the system proceeds to final answer generation with the best context collected so far.
  • Large-context questions that trigger sequential block analysis now expose that backend step in the progress UI, so users can see when the system is still reviewing context blocks instead of appearing frozen.
  • Per-question metadata now stores richer analysis details for broad-scope runs (for example when no focus is selected or many sites are included): scope profile, context size, block metadata, and retrieval strategy.
  • /feed/user-questions now includes a compact How this answer was analyzed box on completed rows, showing method, scope/retrieval details, the literal search keywords used, and any extra search terms used before the final answer.

Entry-level noisy post controls (/feed/entry/{contentId})

  • Entry permalink pages now limit inline revision history to the newest 200 stored versions for that feed/link pair, while still showing the total revision count when a link has extreme duplicate churn.
  • The Changes block on /feed/entry/{contentId} can now switch between the existing highlighted diff view and a side-by-side before/after comparison using the page-level diff toggle (?diff_view=inline or ?diff_view=side_by_side).
  • The raw Versions list still stays one revision at a time underneath the diff block, so the new side-by-side mode improves pairwise comparison without removing the chronological single-version audit trail.
  • For admin users, each entry page now includes per-post controls for persistent noisy/cycling links:
    • Mark/unmark cautious mode (auto-purge noisy duplicates on future imports)
    • Ignore this post at import (feed-scoped or global), so incoming scraper reports for the link are dropped
    • Purge noisy duplicates (keeps newest representative per distinct meaningful hash)
    • Purge all except one latest row (hard reset mode)

Analytics Cards on /feed

  • Weekly/monthly/yearly cards are shown per category only when cached analytics exist
  • Cards are color-coded by period
  • Each category analytics variant now includes a Share button that opens /feed/c/{categorySlug} in a new tab
  • Click Read more to expand long analyses
  • The main /feed overview now refreshes those Read more buttons when a collapsed category card is opened, so long analyses are expandable there again instead of only on the cards page.
  • Markdown is rendered as HTML for readability
  • /feed now shows exactly one selected public edition per period (daily, weekly, monthly, yearly) for each category/site block.
  • Extra cached editions are no longer rendered inline on /feed; use /feed/cards/{categorySlug}?show_all=1 when you intentionally want to review every stored edition for a category.
  • That selected edition is now resolved from the category slug first, so older name/case variants for the same category do not hide a newer edition on /feed.

Editor Page

  • URL: /rss
  • Requires permission:rss
  • Manage feed URLs and metadata
  • Each feed row now has an Authors action that opens /rss/{urlid}/publishby-mappings
  • On that page, operators can manually map raw publishby values (for example 4, 27, or other unfinished identifiers) to a real display name
  • Mappings are scoped per urlid, so the same raw publishby value can be reused by a different feed without collisions
  • Resolved names from this editor workflow are reused by public /feed pages, category cards, edited-post views, Feed Q&A, and /api/rss/feed/{site} readers
  • View cached category analytics
  • Site Type now supports explicit xpath and json flows in addition to rss and wp
  • When sitetype=xpath, elements JSON rules are required and validated server-side before save
  • elements accepts both supported extraction shapes:
    • object format (begin + table) for JSON payload traversal
    • legacy pipeline-array format for HTML/XPath extraction pipelines
  • In the /rss list UI, elements editing is now shown only for xpath rows and uses a collapsible panel with AJAX save on blur
  • New visual helper: /rss/xpath-lab lets editors paste HTML snippets, inspect a DOM outline, run/test XPath queries, and review a SimpleXML-style XML preview before saving rules
  • Add-form URL mapping: when using 🤖 Auto, the detected feed endpoint is written to URL while the originally entered page URL is preserved in Real URL
  • The /rss add form now performs a server-side duplicate guard before insert. If the submitted URL or Real URL already matches an existing feed row (including cross-matches between those fields), the insert is blocked and the matching row is shown in the warning message.
  • Add-feed auto-detection now also understands vBulletin-style external RSS endpoints such as external?type=rss2... and external.php?type=RSS2..., including relative <link rel="alternate"> feed URLs found on forum pages.
  • Add-form category flow: use dropdown for existing categories; use free-text to create/override with a new category (free-text takes precedence)
  • RSS/Atom import now stores creator/author byline metadata in content.publishby when feeds provide it (for example dc:creator-style fields).
  • RSS2 import now handles namespaced vBulletin-style fields more robustly (content:encoded, dc:creator, guid, pubDate), prioritizes content:encoded as full body when present, and falls back to stable guid/link/hash identifiers when links are missing.
  • vBulletin-style RSS2 imports now also keep item-level category metadata when present and prefer GUID-aware duplicate checks before falling back to the older link-based duplicate logic, which helps forum-style feeds where the same item can be republished with slightly different visible URLs.

Feed Admin

  • URL: /feed-admin
  • Operators with the separate rss.posting.handle permission now also get a dedicated Posting Queue UI at both /feed-admin/posting-queue and /rss/posting-queue
  • That queue is meant for simple Playwright-friendly publishing workflows: each latest article row exposes AJAX checkboxes for Handle group, Handle page, Handled group, and Handled page, together with the article title, a short excerpt, and the source link
  • Queue filters support review, group, page, and done modes so one bot or operator can focus only on pending group posts, only pending page posts, or the handled archive
  • Rows disappear from the active queue automatically once the selected target(s) are marked handled
  • Generate daily, weekly, monthly, or yearly category analysis
  • Generate daily, weekly, monthly, or yearly site-level analysis per feed/news source
  • Use the Anchor date calendar to generate for historical periods (for example last week/month/year/day when a cron run was missed)
  • Category cards and site cards now also have optional per-segment Anchor date inputs; precedence is segment anchor -> global anchor -> current date
  • Current period semantics are calendar-aligned to the selected anchor date (daily = that day, weekly = ISO week with Monday start, monthly = calendar month, yearly = calendar year)
  • Choose model per run
  • Optionally provide "watch for" analyst guidance before generation
  • Existing cached variants now include per-variant actions for Retranslate and Regenerate (replace) (for both category and site analytics)
  • When Regenerate (replace) is used, the currently selected variant/period is overwritten instead of creating an extra duplicate variant
  • Site-level generation now has a dedicated per-site optional title field (separate from the global title input)
  • Feed-admin cards now also include two cleanup actions per period:
    • Purge old buckets: remove cached editions from older period buckets while keeping the current bucket
    • Keep only selected edition: delete all other cached editions for the selected period and keep only the currently selected one
  • /feed-admin now also includes a Feed script snippets / embeddable widgets panel where operators can store reusable JavaScript boxes, autosave them over AJAX, ask AI to generate the inline script body, and copy one merged bundle URL for external site embeds.

Scheduler note

  • Analytics scheduling is now admin-configurable in /feed-admin with per-period enable/disable and time fields for daily, weekly, monthly, and yearly.
  • Cron should run the scheduler check every minute (either via Laravel schedule:run or directly via php artisan rss:run-scheduled-analytics).
  • The scheduler executes each period once per relevant cycle after its configured server time; missed runs are caught up as soon as possible.
  • If a period has no previous scheduler timestamp, it is treated as never-run and is executed immediately once.
  • Automatic scheduler runs use a dedicated auto variant title ([AUTO] Scheduled analytics) and always run with overwrite-current semantics so automatic variants update in place.
  • Feed Admin and Scheduled Jobs now also show the last scheduled slot, anchor date, parsed success/error counts, languages, and a short output excerpt for each period so cron results are easier to verify.
  • Manual CLI/API generations are still available when needed.
  • Manual CLI runs now skip unchanged bucket+variant signatures by default to avoid unnecessary repeat AI calls.
  • Use php artisan rss:generate-analytics --period=yearly --force (or --overwrite-current) to explicitly re-run and overwrite the current bucket row even when the underlying snapshot is unchanged.
  • For larger datasets, category and site analytics generation now uses segmented block passes before final synthesis (deeper output pattern aligned with Feed Q&A block analysis style).

RSS Maintenance Admin

  • URL: /rss-maintenance-admin
  • Lists suspicious links with unusually many revisions over short windows
  • Compares raw hash churn vs meaningful hash churn to identify noisy pseudo-updates
  • Supports dry-run and live purge per link, keeping the latest representative per meaningful change

API Reference

Base URL

  • Dev: https://tools.tornevall.com/api
  • Prod: https://tools.tornevall.net/api

Authentication

RSS API endpoints use provider-scoped API keys:

  • scraper=1 required on /api/rss/urls
  • No auth required for read-only public endpoints (/api/rss, /api/rss/feed/{site})
  • Scraper write endpoints (/api/rss/data) accept the request without additional auth (rate-limited)
  • Analytics trigger (/api/rss/analytics/run) requires Authorization: Bearer <ANALYTICS_CRON_SECRET>
  • The posting-queue operator endpoints (/api/rss/posting-queue/items*) require an authenticated web session plus the dedicated rss.posting.handle permission
  • Reusable frontend bundle endpoints for feed widgets now also exist under /api/managed-scripts/feed and /api/managed-scripts/feed/bundle.js; see docs/frontend-script-boxes for the shared script-box contract.
  • The managed feed script endpoints also accept additive filter query params such as feed_ids, feeds, categories, groups, category_slugs, and limit, and they echo the normalized result back as query_context.filters.

GET /api/rss/posting-queue/items

Returns the operator posting queue as JSON for the current authenticated user.

Query params:

  • queue=review|group|page|done (default review)
  • category=<category-slug>
  • feed_id=<urlid>
  • per_page=<10..250>

Response includes:

  • filters
  • pagination
  • items[] with contentid, urlid, title, description, excerpt, link, feed_title, feed_category, entry_url, and queue_status

PATCH /api/rss/posting-queue/items/{contentId}

Updates one queue row's checkbox state.

Supported boolean fields:

  • handle_group
  • handle_page
  • handled_group
  • handled_page

Optional helper field:

  • queue=review|group|page|done so the response can also tell the caller whether the row should be removed from the current filtered list immediately.

Success response includes:

  • contentid
  • queue_status
  • should_remove

GET /api/rss

Returns overview with URL list, categories, and available parameter info.

Response (200)

{
  "urls": [
    {
      "urlid": 12,
      "title": "Example Feed",
      "url": "https://example.com/rss",
      "real_url": "https://example.com/rss",
      "category": "News",
      "readinterval": 60,
      "lastscrape": "2026-04-01 08:00:00",
      "publicSelector": "12",
      "hidden": false,
      "feedUrl": "/api/rss/feed/12",
      "categoryFeedUrl": "/api/rss/feed/news"
    }
  ],
  "categories": [
    { "name": "News", "slug": "news", "feedCount": 3 }
  ],
  "availParams": { "always": [0, 1], "scraper": [0, 1] }
}

GET /api/rss/urls

Returns scrapeable feed URLs. Requires scraper=1.

Query parameters

Parameter Required Description
scraper yes Must be 1 or request is rejected with 403
agent_id recommended Stable agent/worker ID. Used for per-agent due-window tracking
agent_name fallback Fallback identifier when agent_id is absent
always no 0 (default) = due-set for agent; 1 = all scrapeable rows
limit no Max rows when always=0

always=0 behavior:

  • Returns only URLs due for the calling agent based on readinterval and the agent's last-claim timestamp
  • One agent's claims do not block other agents
  • When no URLs are currently due, the response now also includes an additive idle report with wait hints (wait_seconds, next_poll_at, and next_url) so scraper clients can back off intelligently instead of guessing.

always=1 behavior:

  • Returns all rows where deleted=0 and noscrape=0 — no interval filtering

Response (200)

{
  "urls": [
    {
      "urlid": 12,
      "url": "https://example.com/rss",
      "real_url": "https://example.com/rss",
      "readinterval": 60,
      "sitetype": "rss",
      "method": "GET",
      "elements": null
    }
  ],
  "agent": { "agent_id": "scraper-node-1", "resolved_name": "scraper-node-1" },
  "idle": {
    "is_idle": false,
    "reason": "urls_available",
    "schedule_mode": "per_agent_seen",
    "wait_seconds": null,
    "wait_minutes": null,
    "next_poll_at": null,
    "next_url": null,
    "scrapeable_total": 125
  }
}

Idle response example (no URLs currently due):

{
  "urls": [],
  "agent": { "agent_id": "scraper-node-1", "resolved_name": "scraper-node-1" },
  "idle": {
    "is_idle": true,
    "reason": "no_urls_due_for_agent",
    "schedule_mode": "per_agent_seen",
    "wait_seconds": 173,
    "wait_minutes": 3,
    "next_poll_at": "2026-04-05 14:20:00",
    "next_url": {
      "urlid": 12,
      "title": "Example Feed",
      "url": "https://example.com/rss"
    },
    "scrapeable_total": 125
  }
}

The bundled scrape runner uses these idle fields to log how long it should wait before the next due URL is expected.


POST /api/rss/data

Ingest raw feed payload from a scraper worker.

Request body

{
  "agent_id": "scraper-node-1",
  "content": {
    "12": "<rss version=\"2.0\">...</rss>",
    "34": "<html>...</html>"
  }
}
Field Type Description
agent_id string Optional but recommended; identifies the scraper
content object Keys are numeric urlid strings; values are raw feed or HTML payload

Response (200)

{
  "received": {
    "12": { "dataLength": 14200, "url": "https://example.com/rss", "title": "Example Feed" },
    "34": { "dataLength": 82000, "url": "https://example.com/nyheter", "title": "Nyheter" }
  },
  "exceptions": {},
  "agent": { "agent_id": "scraper-node-1" }
}

Each received[{urlid}] item includes:

  • dataLength – byte length of accepted payload
  • url – source URL from urls.url
  • title – feed title from urls.title (useful for logs and Google Alerts traceability)

GET /api/rss/update

Processes queued inbound payloads into normalized content and triggers subscription delivery.

Query parameters

Parameter Description
urlid Optional integer. When present, only inbound rows for this feed are processed in this call

Queue maintenance behavior:

  • Handled rows (handled=1, processlock=0) are purged on each run
  • Stale lock release uses lock-age semantics — locks older than the configured window are released regardless of row age

Response (200)

{
  "converted": 12,
  "skipped": 2,
  "urlid_filter": 27,
  "subscriptionNotifications": {
    "processed": 3,
    "delivered": 2,
    "skipped": 1
  }
}

urlid_filter is included only when the urlid query parameter was used.


GET /api/rss/feed/{site}

Returns an Atom/RSS feed for a specific site or category.

Selector behavior for {site}:

  • Numeric urlid — specific feed
  • Category slug — all entries in category
  • public_hash — direct access for hidden feeds (bypasses hidden filter)
  • Analytics selectors:
    • analytics-daily / daily-analytics
    • analytics-weekly / weekly-analytics
    • analytics-monthly / monthly-analytics
    • analytics-yearly / yearly-analytics
    • analytics-bulk / bulk-analytics (all periods combined)
# Public feed for urlid 12
curl "https://tools.tornevall.com/api/rss/feed/12" -H "Accept: application/atom+xml"

# Category feed
curl "https://tools.tornevall.com/api/rss/feed/news" -H "Accept: application/atom+xml"

# Weekly analytics feed
curl "https://tools.tornevall.com/api/rss/feed/analytics-weekly" -H "Accept: application/atom+xml"

POST /api/rss/analytics/run

Trigger AI analytics generation programmatically (for cron or external triggers).

Authentication: Authorization: Bearer <ANALYTICS_CRON_SECRET>

Request body (all fields optional)

{
  "period": "weekly",
  "categories": "all",
  "sites": false,
  "model": "gpt-4o-mini",
  "user_id": 1,
  "max_tokens": 1400,
  "dry_run": false
}
Field Type Default Description
period string weekly daily, weekly, monthly, yearly, all
categories string all Category names comma-separated, or all
sites boolean false Also generate site-level analytics
model string gpt-4o-mini OpenAI model name
user_id integer 1 User ID for attribution
max_tokens integer 1400 Max output tokens per analysis item
dry_run boolean false Plan run without calling AI

Response (200)

{
  "ok": true,
  "dry_run": false,
  "periods": ["weekly"],
  "languages": ["en", "sv"],
  "stats": {
    "categories_ok": 5,
    "categories_err": 0,
    "sites_ok": 0,
    "sites_err": 0
  },
  "log": [
    { "type": "category", "category": "News", "period": "weekly", "status": "ok", "languages": ["en","sv"] }
  ]
}

Feed Q&A API (/feed/user-questions)

Submit free-form questions against the RSS dataset and retrieve answer history.

POST /feed/user-questions

Submit a question. Guests require Cloudflare Turnstile. Authenticated users skip Turnstile.

Request (form or JSON)

Field Type Description
question string The question text
question_period string daily, weekly, monthly (default), yearly, all_time
period string Legacy alias for question_period
keyword_aggressiveness string strict, balanced (default), or expansive
focus_categories[] array Limit context to specific category slugs
focus_site_ids[] array Limit context to up to 10 specific urlid values
cf-turnstile-response string Cloudflare Turnstile token (guests only)

JSON response (when Accept: application/json)

{
  "ok": true,
  "answer": "Based on the available context...",
  "question_id": 42,
  "model": "gpt-5.4",
  "latency_ms": 1830,
  "context_stats": {
    "feed_count": 17,
    "entry_count": 120,
    "period_type": "monthly",
    "period_start": "2026-03-01 00:00:00",
    "period_end": "2026-03-31 23:59:59"
  }
}

Focus selector behavior:

  • focus_site_ids[] with ≤10 IDs activates JSON context mode: AI receives a version_history.articles section with all stored edit history across ALL time (not limited by the selected period)
  • focus_categories[] filters context to matching feeds only
  • Site IDs take priority: if both are provided, only focus_site_ids[] applies

GET /feed/user-questions

Returns paginated question history. Authenticated users see their own questions; admins see all.

Query parameters: page, per_page (10/25/50/100/200), status, user_id, ip, scope (active/deleted/all)


Version history in context

When asking questions via focus_site_ids[] (≤10 sites), the AI context includes structured JSON with a version_history.articles block:

{
  "context_type": "site_focused",
  "version_history": {
    "included": true,
    "article_count": 8,
    "note": "Compact all-time version-history sample for edited articles in scope.",
    "articles": [
      {
        "title": "Article title at latest version",
        "version_count": 12,
        "returned_version_count": 12,
        "versions_truncated": false,
        "title_change_count": 1,
        "first_published": "2026-02-18 13:05",
        "last_published": "2026-03-29 10:22",
        "versions": [
          {
            "v": 1,
            "hash": "547e39",
            "published": "2026-02-18 13:05",
            "title": "Original title",
            "description": "First 220 chars of description...",
            "content": "First 320 chars of content if present..."
          }
        ]
      }
    ]
  }
}

Each version sample includes:

  • v — version sequence number (1 = oldest, N = newest)
  • hash — 6-char prefix of content_hash for traceability
  • published — timestamp of that stored version
  • title — title at that version
  • description — first 220 characters of description (when not empty)
  • content — first 320 characters of content (when not empty)

When version_count ≤ VERSION_SAMPLE_LIMIT all versions are returned. For articles with more versions than the limit, representative samples are selected across the full history span (first, last, and evenly spaced intermediates).


XPath Lab (/rss/xpath-lab)

The XPath Lab is available at /rss/xpath-lab and requires permission:rss.

Features

  • Paste raw HTML into the snippet field
  • Inspect visual DOM outline with depth, XPath, attributes, and text preview
  • Click suggested XPath hints (auto-detected from common patterns)
  • Test a single XPath expression against the HTML
  • Test a full elements pipeline rule (same 5-element JSON format as urls.elements)
  • Compare a fast lab preview with a legacy scraper compatibility preview that uses the same parser path as live imports
  • See compatibility warnings when non-value extractors such as href or src will be skipped by the legacy renderer
  • View SimpleXML-style XML representation
  • Ask OpenAI to generate XPath pipeline rules from the pasted HTML

elements / Pipeline JSON format

XPath feeds (sitetype=xpath) store extraction rules in urls.elements as a 5-element JSON array:

[
  element 0: Row selectors (array of absolute XPaths)
  element 1: Field XPaths (object: fieldName → array of relative XPaths)
  element 2: Value types used (prefer just "value" when possible)
  element 3: Source map (object: fieldName → "mainNode")
  element 4: Output mapping (object: RSS field → [fieldName, valueType])
]

Required output fields in element 4: title, description, link. Optional: pubdate.

Recommended example — attribute-node fields mapped with value:

[
  [
    "//*[@role='main']//ol[contains(concat(' ',normalize-space(@class),' '),' sap-search__result ')]/li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]",
    "//*[@role='main']//li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]",
    "//li[contains(concat(' ',normalize-space(@class),' '),' sap-search__result-item ')]"
  ],
  {
    "titleNode": [
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sv-notopmargin ')]/a",
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[@tabindex='0']/a"
    ],
    "titleHrefNode": [
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sv-notopmargin ')]/a/@href",
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[@tabindex='0']/a/@href"
    ],
    "summary": [
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/p[contains(concat(' ',normalize-space(@class),' '),' normal ')]",
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/p"
    ],
    "date": [
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__meta ')]/span[2]",
      "/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__text ')]/div[contains(concat(' ',normalize-space(@class),' '),' sap-search__meta ')]/*[2]"
    ]
  },
  ["value"],
  {
    "titleNode": "mainNode",
    "titleHrefNode": "mainNode",
    "summary": "mainNode",
    "date": "mainNode"
  },
  {
    "title": ["titleNode", "value"],
    "description": ["summary", "value"],
    "link": ["titleHrefNode", "value"],
    "pubdate": ["date", "value"]
  }
]

XPath conventions:

  • Element 0 (row selectors): absolute XPaths starting with //
  • Element 1 (field XPaths): relative paths starting with / (single slash = child axis relative to matched row node). The scraper prepends . at runtime to make them context-relative
  • For attribute-based output such as links, image URLs, or machine-readable dates, it is usually safer to extract the attribute node directly (/@href, /@src, /@datetime) and map it with "value"
  • All contains(@class, ...) patterns should use the concat(' ',normalize-space(@class),' ') idiom for exact class-name matching

Important legacy-parser note:

  • Older live XPath rendering can skip non-value extractors unless pipeline[3] also declares that extractor key (for example "href": "mainNode").
  • The lab now shows a dedicated legacy scraper compatibility preview so you can spot this mismatch before saving rules.
  • If the lab preview looks correct but the legacy preview drops link, src, or datetime, rewrite the field as a dedicated attribute-node XPath and map it with "value".

Using the XPath Lab to debug a pipeline:

  1. Paste the AJAX HTML output or page source into the snippet field
  2. Paste the full elements JSON into the Pipeline rules field
  3. Click Analyze snippet — the lab will show which row selector matched and extract up to 25 rows in the lab preview
  4. Check the Production / legacy scraper compatibility preview below it. This is the important panel when you want to know what the live import path will actually receive.
  5. If warnings mention missing href / src compatibility, rewrite those values as dedicated attribute-node fields (/@href, /@src, /@datetime) mapped with "value"
  6. If 0 rows match, the lab auto-triggers AI suggestions (or check the Ask AI checkbox explicitly)
  7. Copy the AI-generated JSON into the elements field in the RSS editor

Subscriptions

Authenticated users can subscribe per:

  • Feed
  • Category

Available channels depend on your account settings.

On /feed/subscriptions, channel settings are now AJAX-first with autosave:

  • Channel checkbox on/off saves immediately via AJAX
  • Text/webhook fields save on blur (when leaving the field)
  • Pause/Resume and Remove also use AJAX inline updates
  • Explicitly unchecking all channels is now supported (all delivery channels disabled)

If JavaScript is unavailable, the same forms still fall back to normal server post/redirect behavior.

Discord setup notes on /feed/subscriptions:

  • You can either paste a Discord webhook URL manually or use Connect Discord app (get webhook).
  • OAuth flow uses webhook.incoming and returns to the app callback (/oauth/discord/callback).
  • Callback URL is shown on the page so admins know what to register in Discord developer settings.
  • After callback, the latest Discord webhook payload is kept in session and can be reused per subscription with one click.

Delivery behavior:

  • New RSS imports now attempt an immediate subscription notification pass right after /api/rss/update converts inbound rows
  • Subscription digests now include a direct Tools entry permalink first (/feed/entry/{contentId}), with the original source URL included as secondary fallback
  • The scheduled rss:notify-subscribers task still runs every 15 minutes as fallback/retry protection if an immediate delivery is missed or a channel fails temporarily