Alpha isn't just in the numbers; it's in the tone, the hesitation, and the Q&A. We convert unstructured earnings calls and investor decks into machine-readable datasets. Access 15+ years of perfectly parsed transcripts, synchronized slide decks, and speaker-level sentiment scoring.
Ready for Your LLM. Stop spending 80% of your time cleaning PDFs. We provide the clean text, mapped metadata, and slide content you need to feed your RAG pipelines immediately.
Generic transcripts are messy. We explicitly separate "Management Remarks" from the "Q&A Session." We identify every speaker by Role (CEO, CFO) and Name, allowing you to track who said what. Run sentiment analysis specifically on the CFO's answers during the Q&A to spot hesitation.
Don't ignore the slides. We scrape and OCR the accompanying Earnings Presentation (PDF), extracting the text and tables from every slide. We link specific slide content to the timestamp in the transcript where it was discussed, giving you the full multimedia context.
Building a RAG bot? We offer a "Chunked" feed. Instead of one massive text blob, retrieve transcripts pre-split into semantic paragraphs with embedded metadata (Ticker, Quarter, Speaker). This drastically improves vector search accuracy for queries like "Show me all guidance updates."
Get me the Q&A session text from Microsoft's Q3 call.
Download the entire history of S&P 500 transcripts for training a custom financial BERT model.
Our pre-embedded feed allows you to query concepts ("Supply Chain Headwinds") without managing your own embedding model.
Plug our database directly into your internal warehouse Snowflake or BigQuery.
{
"ticker": "UBER",
"quarter": "2024-Q3",
"date": "2024-11-05",
"presentation_url": "https://nextmark.data/decks/uber_q3_24.pdf",
"segments": [
{
"segment_type": "Management_Remarks",
"speaker_name": "Dara Khosrowshahi",
"speaker_role": "CEO",
"text": "We are seeing unprecedented demand in the mobility segment...",
"sentiment_score": 0.85,
"linked_slide": 4
},
{
"segment_type": "Q&A",
"speaker_name": "Analyst (Goldman Sachs)",
"text": "Can you elaborate on the margin compression in freight?",
"sentiment_score": -0.12
},
{
"segment_type": "Q&A_Response",
"speaker_name": "Prashanth Mahendra-Rajah",
"speaker_role": "CFO",
"text": "Freight remains a cyclical headwind, but we expect...",
"sentiment_score": 0.05
}
]
} segment_type : Crucial for filtering. Many algo-traders ignore the "Scripted Remarks" (which are PR-polished) and focus entirely on the Q&A_Response segments, where management is more likely to slip up or reveal true sentiment. sentiment_score : A pre-calculated NLP score (-1.0 to +1.0) for that specific paragraph. This allows you to plot the "Emotional Arc" of the call—did the CFO sound confident at the start but defensive during the Q&A? linked_slide : Direct context. We map the spoken text to the specific slide number being presented, allowing your analysts to view the chart the CEO is describing in real-time.
Our team of experienced financial advisors is here to provide personalized guidance and support.
We have full coverage of US Equities (Russell 3000) going back to 2008. Global coverage (Europe/APAC) typically starts around 2014.
We use a hybrid "AI + Human-in-the-Loop" process. A specialized financial speech-to-text model generates the first draft, and human editors verify proper nouns, specialized financial jargon (e.g., "EBITDA"), and speaker attribution for accuracy >99%.
y default, we flag and separate the standard "Safe Harbor" and "Forward-Looking Statements" legal disclaimer at the start of the call, so your NLP models don't waste tokens processing boilerplate legalese.
es. The API provides a direct link to the parsed PDF of the presentation deck. We also provide an OCR endpoint that returns the raw text content of each slide as a JSON object.
For global companies (e.g., Toyota, Samsung) that hold earnings calls in their native language, we provide Dual-Channel Transcripts. You get the original native text and an English translation side-by-side. Our metadata flags these as translated: true, allowing you to decide whether to process the raw source or the translated version.
es. Our "Concept Tagging" engine automatically tags transcripts with themes like "Guidance Raise," "Supply Chain Disruption," or "Share Buyback Announcement." Instead of writing complex keyword regex (e.g., "buyback" OR "repurchase"), you can simply query concept:share_buyback to instantly retrieve every relevant management discussion from the S&P 500 history.