• How JurisTech’s Deep Research Agentic AI Outperformed Gemini and GROK in Financial Analysis

    AI Agents Are Competing. We Put Them to the Test.

    The AI arms race has reached the research desk, and agentic AI is leading the charge.

    Financial institutions are under pressure to do more with less. They need faster insights, better forecasts, and explainable analysis that stands up to scrutiny. Agentic AI, which combines generative language capabilities with goal-driven planning and execution, is increasingly seen as the next frontier. From Google’s Gemini to xAI’s Grok, the promise is clear: automate complex business intelligence and unlock faster decisions.

    But how well do these systems actually perform when tested in the field?

    JurisTech, a lending and recovery software solutions company with over two decades of experience in building enterprise software for banks, credit agencies, and lenders, wanted to find out. At the heart of this exploration was one crucial question: can agentic AI systems truly handle the complexity, precision, and structure that financial research demands?

    Financial analysis isn’t just about summarising numbers. It involves interpreting them in context, modelling outcomes, and accounting for regulatory, macroeconomic, and industry-specific factors. Most generative AI systems are built for language fluency, not financial fluency. While they can generate content quickly, they often miss nuances in capital structure, valuation logic, and risk sensitivity. This makes them unreliable for high-stakes financial environments where every assumption must be traceable, auditable, and correct.

    Agentic AI is designed to bridge that gap. By combining generation with planning, execution, and memory, it aims to produce not just text, but structured insight. The theory is compelling, but the reality is untested. JurisTech built its own system, Juris Deep Research, and put it through a rigorous benchmark to find out what actually works.

    The test? A full-spectrum business investigation of a publicly listed company, using only data that any institutional analyst could access: financial reports, investor presentations, and public disclosures.

    Each AI agent received the same prompt. Each had to generate a comprehensive research report on Capital A Berhad, a major ASEAN conglomerate undergoing strategic transformation. The results were then evaluated using OpenAI’s O3 model, an advanced agentic reasoning system.

    The outcome wasn’t close.

    Juris Deep Research was the only solution rated investment-grade, outperforming Gemini and Grok across almost every category, from analytical depth and financial modelling to accuracy, structure, and scenario planning.

    What follows is a breakdown of how the benchmark was done, what it revealed, and what it means for the future of AI in finance.

    The Test Case: Capital A, A Real-World Corporate Transformation

    To fairly evaluate the capabilities of each agentic AI system, JurisTech needed a subject that would challenge not just their language generation, but their strategic reasoning and financial intelligence.

    That subject was Capital A Berhad.

    Formerly known as AirAsia Group, Capital A is one of Southeast Asia’s most prominent transformation stories. The company has pivoted from a capital-heavy airline operator to an asset-light digital conglomerate, spanning aviation, logistics, fintech, and lifestyle services. It’s a complex corporate structure with multiple business units, regulatory challenges, and a high-debt balance sheet. In other words, the perfect stress test for AI-driven research agents.

    Each AI agent, Juris Deep Research, Gemini, and Grok, received the same instruction: generate a comprehensive research report on Capital A Berhad using only publicly available data as of April 2025.

    This included:

    • Disclosures from Bursa Malaysia
    • Annual reports and financial statements
    • Investor presentations
    • Market commentary and news sources

    Each agentic AI was expected to assess Capital A’s financial health, business unit performance, risk profile, forward strategy, and investment potential.

    To ensure the benchmark reflected real-world standards, the outputs were evaluated by OpenAI’s O3 model, an agentic reasoning framework capable of structured comparative analysis. The evaluation was conducted independently, based on a pre-defined rubric assessing eleven critical report dimensions.

    Transparency and Evaluation Ethics

    This benchmark was designed with strict adherence to ethical standards and transparency. Only publicly available data was used throughout the test. No proprietary information, insider data, or manual prompt engineering was applied to influence outcomes. All AI agents operated independently under identical conditions, and evaluation was carried out without interference or human rewriting of results. This ensures that the performance gaps observed are attributable solely to each system’s intrinsic design, execution, and intelligence.

    Disclaimer:
    This benchmark was conducted using publicly available data on Capital A Berhad. The insights reflect the output of each AI system and do not represent investment advice or the official view of JurisTech.

    A Note on Scope

    The goal was not to critique Capital A itself. Rather, the company served as a dynamic case study to evaluate how well each AI system could analyse a fast-evolving business environment — and deliver insights that would be usable in a boardroom, credit committee, or investment strategy session.

    The Results: Juris Deep Research Outperformed on Every Front

    Each AI agent, Juris Deep Research, Gemini, and Grok, produced a full research report based on the same data set and prompt. 

    The O3 model scored each report across eleven key dimensions, covering everything from executive summaries and financial analysis to report formatting and future scenario modelling.

    The outcome was decisive.

    AI Agent Benchmark Scorecard

    Evaluation Juris Deep Research Gemini Grok
    Executive Summary 5 4 3
    Corporate Info & Governance 4 4 3
    Business Unit Coverage 5 5 3
    Company Reputation & ESG 4 3 3
    Market Position vs Competitors 4 4 3
    Financial Health Analysis 5 3 4
    Future Outlook Modelling 5 4 3
    Loan & Risk Assessment 5 4 4
    Style & Structure 5 2 3
    Data Accuracy 4.5 3.5 4
    Analytical Depth 5 4.5 3
    Average Score 4.7 3.9 3.3

    The O3 evaluation highlighted that Juris Deep Research delivered a clean, structured, and numerically precise report, combining clarity, granularity, and accuracy in ways the others did not.

    Gemini demonstrated solid content generation, but suffered from poor formatting and structural inconsistency. Grok’s analysis was readable but lacked depth and missed key financial logic. In contrast, Juris Deep Research delivered a report that was not only insightful but usable, the kind of output analysts would confidently present to a risk committee or board.

    Juris Deep Research was the only system rated investment-grade.

    The rest of this article breaks down exactly how it achieved that.

    Want to explore the full report?

    [Download the full benchmark output] including three-case forecasts, capital allocation models, and financial risk analysis.

    Why Juris Deep Research Stood Out

    Gemini and Grok are among the most advanced generative AI systems available today. However when applied to deep financial analysis, both revealed critical gaps, either in structure, depth, or domain accuracy.

    Juris Deep Research, built by JurisTech, didn’t just outperform. It demonstrated what specialised, agentic AI can achieve when designed with financial institutions in mind. Its edge came from more than just model strength. It came from architecture, execution, and financial fluency.

    This wasn’t just about raw intelligence. It was about how that intelligence was applied.

    Structured Agentic Execution

    Unlike single-threaded AI models that respond linearly, Juris Deep Research used a multi-agent workflow. A planner agent first broke down the prompt into task segments. Specialist agents then executed each part – financial modelling, market analysis, and risk scenarios – in parallel. A final synthesiser agent stitched everything into a seamless report.

    This architecture produced clarity, consistency, and coherence across more than 30 pages of content. It prevented repetition, ensured every critical dimension was addressed, and allowed insights to build on one another, just like in a professional equity research document.

    Financial Fluency and Domain Specialisation

    Where other agents generated content, Juris Deep Research demonstrated understanding.

    It accurately calculated lease-adjusted ratios like Net Debt/EBITDAR, adjusted for operating leases, and built pro forma models reflecting how Capital A’s planned divestment of its aviation business would impact group leverage and equity.

    The system constructed three distinct five-year scenarios – optimistic, base, and conservative – each with its own set of assumptions and capital forecasts. It quantified FX and interest rate sensitivity, showing, for example, how a 1% shift in the MYR/USD rate could swing net profit by RM90 million.

    It even integrated niche aviation finance concepts such as Power-by-the-Hour lease structures, and constructed a capital allocation waterfall, modelling how disposal proceeds could be used to reduce leverage or reinvest into high-growth segments.

    “Drills into leverage math, digital breakeven, and sensitivity tables.”
    — O3 Model Evaluation

    These weren’t surface-level touches. They reflected real financial literacy, the kind analysts, CFOs, and risk committees rely on.

    Investment-Grade Reporting Structure

    Juris Deep Research didn’t just provide insights. It delivered them in a format that matched how financial professionals read, process, and present research.

    The report included:

    • A title page and table of contents
    • Nine logically sequenced sections
    • Executive summary → company background → segment analysis → financial modelling → risk assessment → strategic outlook
    • Bullet-pointed insights, tabular KPIs, and clearly labelled assumptions

    By comparison, Gemini’s output was densely packed and difficult to scan. Grok’s version, while cleaner, lacked the logical structure and narrative flow of an investment-grade report.

    Juris Deep Research was the only system to combine substance with structure.

    Accuracy and Validation

    In environments where financial decisions carry real-world consequences, accuracy isn’t optional, it’s fundamental.

    Juris Deep Research had the highest alignment with Capital A’s audited FY2023 financials, with minimal discrepancies across balance sheet ratios, income breakdowns, and segment results.

    Crucially, it distinguished between reported data and forward-looking estimates, a detail missed by both Gemini and Grok. It walked through Capital A’s maturity wall, evaluated covenant headroom in its debt instruments, and included stress-tested scenarios tied to macroeconomic assumptions.

    “Clearly distinguished audited facts from projected estimates — rare in generative systems.”
    — O3 Model Evaluation

    This level of precision gives stakeholders confidence that the AI’s outputs can be trusted, audited, and actioned.

    Juris Deep Research didn’t succeed because it had the largest model. It succeeded because it was built with intention, by a team that understands finance, and understands how intelligence must be structured to support real decisions.

    The Bigger Picture: Why Domain Agentic AI Outperforms Generalists

    The results of this benchmark reveal something more important than just a scorecard.

    They show that when it comes to business-critical research, specialisation matters.

    Gemini and Grok are engineered as general-purpose language models. They’re incredibly capable across a wide range of tasks. But their very strength, breadth, can also be a limitation. When faced with complex, domain-specific challenges like financial analysis, they often struggle with structure, nuance, and depth.

    Juris Deep Research was purpose-built to address those gaps. Its agentic design wasn’t trained to chat. It was trained to think like an analyst, not an assistant.

    From lease-adjusted debt ratios to capital allocation modelling, Juris Deep Research didn’t just understand the vocabulary. It understood the logic. It reasoned like a banker, analysed like an equity research team, and presented like a risk committee submission. Every ratio, scenario, and insight was contextualised, calculated, and formatted with purpose.

    “JurisTech didn’t build the biggest model. It built the one that understands your numbers.”
    — O3 Model Evaluation

    This is where AI is heading. As adoption accelerates, organisations will move beyond generic tools and toward systems that integrate domain knowledge, task logic, and explainability.

    General AI will always have its place. But in regulated, high-trust sectors like finance, domain AI will define the next wave.

    JurisTech’s investment in agentic architecture and finance-specific intelligence isn’t just ahead of the curve. It’s reshaping what that curve looks like.

    Why It Matters for Banks, Investors, and Financial Analysts

    For financial institutions, the implications of this benchmark go beyond comparison. They’re operational.

    Juris Deep Research isn’t just a proof of concept. It’s a ready-to-deploy solution that can strengthen key processes across banking, investment, and strategy teams.

    For banks and lenders, it supports due diligence, credit evaluations, and credit committee documentation. Instead of analysts manually compiling borrower data and ratio trends, Juris Deep Research generates reports that include debt sustainability analysis, risk stress-testing, and covenant checks — complete with maturity walls and financial headroom analysis.

    For investment teams, it accelerates deal screening and portfolio reviews. Rather than spending days compiling memos, analysts can validate the AI’s findings, adjust assumptions, and move quickly on real opportunities.

    For financial strategists and analysts, it acts as a second brain. One that delivers clear, consistent output, structures thoughts across multiple business units, and provides justifications that can be traced back to financial disclosures.

    And importantly, Juris Deep Research is enterprise-ready. It runs on-premise or on sovereign cloud infrastructure, with no reliance on US-based GPU access or restricted APIs. That makes it a viable choice for institutions with compliance, security, or data sovereignty requirements.

    In a world where speed, trust, and clarity define competitive advantage, general-purpose AI systems often fall short. Juris Deep Research was built to close that gap — and it’s doing so with the precision, structure, and explainability that high-stakes finance demands.

    Looking Ahead: What This Benchmark Tells Us About the Future of Agentic AI in Finance

    This benchmark revealed more than which system performed best. It showed where the market is heading.

    In the early wave of generative AI adoption, the spotlight was on model size and linguistic capability. But that phase is already fading. For financial institutions, the next wave is about execution design. Not just intelligence, but how that intelligence is structured and applied.

    Juris Deep Research proved that specialisation delivers more than speed. It enables analysis that mirrors the way finance professionals think, model, and present decisions.

    What worked here won’t just apply to equity research. It’s relevant for internal audit, stress testing, risk reporting, and strategy development. As institutions integrate AI deeper into their workflows, they’ll need tools that go beyond summarising data. They’ll need agents that reason, model, and explain — in formats that are familiar, verifiable, and ready to use.

    This is where domain-specific agentic systems will shine. They will not replace analysts. But they will extend their capabilities, enhance consistency, and shorten the distance between raw data and board-level insight.

    General-purpose AI models will continue to improve, but when trust, transparency, and task-specific outputs are non-negotiable, purpose-built platforms like Juris Deep Research will lead the way.

    Conclusion: Setting a New Standard for Financial AI

    This benchmark wasn’t a theoretical exercise. It was a real-world test that asked three of the most advanced AI agents available to analyse one of Southeast Asia’s most complex corporate transformations.

    The results spoke for themselves.

    Juris Deep Research, developed by JurisTech, stood out for its clarity, depth, structure, and financial precision. It didn’t just keep up with the global leaders. It set a higher bar.

    In a space where financial decisions can’t be left to general approximations, Juris Deep Research showed what’s possible when AI is designed for the domain. It structured intelligence. It was modelled with precision. And it delivered output that institutions can trust.

    As AI becomes a permanent layer in how we analyse and decide, the systems that win won’t be the biggest — they’ll be the most aligned with how people actually work.

    In a world of generalists, the specialist wins.

    Want to see how Juris Deep Research Works in Practice?

    Book a free demo with JurisTech and discover how agentic AI is transforming the way financial insights are generated, trusted, and delivered.

    By | 2025-05-29T14:55:31+00:00 29th May, 2025|Artificial Intelligence, Featured, Insights|

    About the Author:

    The Marketing & Communications team at JurisTech comprises skilled digital marketing strategists and content creators who deliver invaluable insights drawn from our experts in lending and recovery software solutions. For media queries, please contact us at mac@juristech.net.