Quantify the Narrative

Global News & Sentiment Intelligence

Turn the global news cycle into a tradable time-series. Access 15+ years of institutional-grade news, mapped to 50,000+ tickers, and scored for sentiment with context-aware NLP. Don't just read the news—measure the market's reaction before it happens.

Get API Key

Read Documentation

read Documentation

50,000+

Global Tickers

15+ Years

Historical Data

2Mn+

Articles/Month

<50ms

Latency

Why Use Nextmark's News & Sentiment Data?

Ready for Your LLM. Stop spending 80% of your time cleaning PDFs. We provide the clean text, mapped metadata, and slide content you need to feed your RAG pipelines immediately.

Get the Dataset

Get Started

From Noise to Signal

Most news feeds are a firehose of unstructured noise. Nextmark converts millions of daily articles, press releases, and filings into structured, qualitative data. We use proprietary Entity Resolution to ensure that a story about "Apple" is mapped to $AAPL, not the fruit, and a story about "Paris" is mapped to the market, not the city.

Beyond "Positive/Negative"

Legacy sentiment tools use "bag-of-words" counting. Nextmark uses Transformer-based LLMs to understand context. We differentiate between a "Revenue Miss" (Negative) and a "Strategic Divestiture" (Neutral/Positive), giving you a sentiment score that correlates with price action, not just keyword density.

Point-in-Time Precision

Built for backtesting. Our historical archives are stamped with the exact millisecond the news hit the wire, preventing look-ahead bias in your models. Train your algos on the reality of the past, not the revised history.

Data Delivery Formats

Built for Your Stack

Text, Audio, or Vector. Choose the format that fits your research workflow.

Get Started

REST API (JSON)

Real-time, low-latency lookups for live trading.

Websocket

Push-based feed for immediate event processing.

Vector DB (RAG-Ready)

Vector ready JSONL feed to directly input into your models.

Bulk S3 / Snowflake

Historical dump for backtesting and heavy quantitative research.

Data Output Sample

Structured for Precise NLP Analysis

Get Started

JSON

{
  "timestamp_utc": "2023-10-12T14:30:00.00Z",
  "article_id": "nm_8829102",
  "source": "Global_Wire_Service",
  "headline": "TechCorp announces strategic delay in 
               Q4 chipset rollout due to supply constraints.",
  "entities": [
    {
      "ticker": "TCP",
      "figi": "BBG000BLNNH6",
      "name": "TechCorp Inc.",
      "sentiment_score": -0.78,
      "sentiment_label": "Negative",
      "relevance_score": 1.0,
      "event_type": "Supply_Chain_Disruption"
    },
    {
      "ticker": "NVDA",
      "figi": "BBG000BBJQV0",
      "name": "NVIDIA Corp",
      "sentiment_score": -0.15,
      "sentiment_label": "Neutral_Negative",
      "relevance_score": 0.3,
      "event_type": "Sector_Read_Across"
    }
  ]
}

Key Fields

sentiment_score : A continuous -1.0 to +1.0 float quantifying the exact polarity of the news event for regression modeling..
relevance_score : A 0-100 confidence metric indicating if the ticker is the story's subject or just a footnote mention.
event_type : Auto-classification of the article into 50+ financial categories like "M&A," "Guidance Update," or "Executive Change."
timestamp_utc : Nanosecond-precision timestamps for when the news hit the wire, ensuring zero look-ahead bias in backtests.
ticker / figi :Robust entity mapping that handles share class distinctions and historical ticker changes automatically.
source_rank : A 1-5 credibility rating allowing you to weight major wires over unverified blogs or social noise.
novelty_score : A uniqueness metric that filters out reposts and echoes to ensure you only trade on breaking information.

Who Is This For?

Quant Researchers & Data Scientists

Build factors, not just features. Ingest 15 years of point-in-time narrative data to build uncorrelated sentiment factors. Perfect for backtesting event-driven strategies, training volatility models, and measuring signal decay without the risk of look-ahead bias.

Learn More

Discretionary Portfolio Managers

Filter the noise, trade the signal. Stop reacting to every headline. Use sentiment scoring to automate your news consumption—filtering out routine noise to focus only on material narrative shifts, "divergence" events, and unpriced risks across your watchlist.

Learn More

Risk Managers & Platform Developers

Automate your defense. Integrate real-time sentiment alerts directly into your internal dashboards or risk engines. Programmatically monitor your entire portfolio for "Negative Sentiment Spikes" or "Reputational Shocks" instantly, 24/7.
‍

Learn More

Read Between the Lines.

Turn unstructured voice into structured alpha.

Get Free API Key

Get Started

D‍ownload Sample JSON

Get Started

FAQs

Our team of experienced financial advisors is here to provide personalized guidance and support.

Contact us

Is your sentiment scoring based on "Bag-of-Words" or Contextual LLMs?

We do not use legacy "positive/negative" word counting (Bag-of-Words), which fails to capture financial nuance. Our engine utilizes domain-specific Transformer models (LLMs) trained on 15 years of financial text. This allows the model to understand that a phrase like "narrowing losses" is Positive, whereas a keyword search for "loss" would flag it as Negative. We score strictly on financial implication, not just linguistic polarity.

How do you handle Point-in-Time (PIT) accuracy to prevent look-ahead bias?

All historical data is stamped with the exact millisecond the article was ingested by our system (ingest_timestamp_utc), not the time the event occurred. Furthermore, our entity mapping respects the historical symbology of that specific date. If you query data from 2014 for Facebook, the API returns data mapped to $FB, ensuring your backtest reflects the exact reality of the market at that moment, with zero forward-looking leakage.

How does the API handle duplicate stories or "re-prints" from multiple sources?

We calculate a novelty_score for every incoming article by vectorizing the text and comparing it against a rolling 24-hour window of news for that specific ticker. High Novelty (>0.8): Breaking news or a materially new update. Low Novelty (<0.2): Syndicated re-prints or minor updates to an existing story. This allows you to filter out the "echo chamber" and trade only on the initial signal.

How do you distinguish between a company being the "Subject" vs. just a "Mention"?

We provide a relevance_score (0-100) for every entity detected.If a company is in the headline or the lead paragraph, it receives a score of 100. If a company is merely listed as a constituent in an ETF wrapper or mentioned in a peer comparison, it receives a score below 20.We recommend filtering your feed for relevance_score > 70 to remove noise from index rebalancing or sector roundups.

How do you handle Ticker Changes and Corporate Actions (e.g., FB to META)?

We map all data to the Bloomberg FIGI (Financial Instrument Global Identifier), which remains persistent regardless of ticker changes. Our API allows you to query by the current ticker (e.g., META) and automatically retrieve the full historical history of the previous ticker (FB) seamlessly stitched together. You do not need to maintain your own mapping table.

Does the sentiment score account for "Gap Risk" or volatility?

Yes. In addition to the raw sentiment_score (-1 to +1), we provide a confidence_interval. If a news event is ambiguous or polarizing (e.g., a complex merger announcement where the model detects both high positive and high negative signals), the confidence_interval widens. Sophisticated models use this variance as a proxy for potential volatility, allowing you to size positions down when the model is less certain of the directional impact.