What is the difference between evidence and validation?

The founder brings evidence. The system validates. Evidence is what the founder has gathered — interviews, contracts, usage data. Validation is the system's grading of how strongly that evidence supports the claim. The E1–E5 multiplier is a validation weight, not an evidence weight.

FAQ — 84 Questions About AI Venture Assessment

Q: What are the 22 elemental dimensions?

The 22 elementals are foundational measurements grouped into five categories: Foundation (Customer Definition, Pain Point, Need-to-Have), Positioning (Product Advantage, Why Now, Unfair Advantage), Business Model (Monetisation, Acquisition, Go-to-Market), Impact (Intentionality, Measurability, Scale, Stakeholder Depth, Durability), and Team & Durability (Vision, Team, Moat, Risk Mitigation, Capital Strategy, Flywheel, Optionality, Governance).

Q: What are the 15 composite dimensions?

The 15 composites are architectural properties that emerge when elementals combine. Four categories: Market & Category (Category Creation, Timing Discipline, Reverse-Engineering Risk, Regulatory Runway), Value Capture (Platform Potential, Distribution Economics, Compounding Moat, Capital Efficiency, Pricing Power), Execution (Execution Coherence, Scale Readiness, Founder-Market Fit, Capability-Strategy Match), and Impact (Impact-Native, Stakeholder Alignment).

Q: What is the evidence scale?

Every claim is graded on five levels. E1: assumption — the founder believes it but has no external validation. E2: anecdotal — informal feedback from contacts. E3: validated — independent third-party confirmation. E4: behavioural — observable actions like signed LOIs or pilot usage. E5: transactional — revenue, binding contracts, repeated transactions.

💡

The basics

What it is, what it isn't, and why it exists

What is TSM?

TSM is an AI-powered venture assessment system. It evaluates a venture across 22 elemental dimensions and 15 composite architectural dimensions, grades every claim on a five-level evidence scale, identifies what is blocking value growth, and produces investment-grade outputs for founders, investors, accelerators, and university programmes.

The same engine produces multiple deliverables from one assessment run: a Executive Assessment from public information only, a Detailed Assessment when confidential materials are available, and an Investor Readiness Assessment that adds the complete investor-conversation package.

What is hidden value?

Hidden value is the difference between what a venture's valuation would be if all of the founder's claims were validated, and what it is based on what's actually been proven. It is not speculative value — it is the value the venture's own assertions imply, discounted by the gap between assertion and validation.

A founder may have genuinely solved a real problem, but without external validation that value stays hidden — it isn't reflected in what anyone will pay. The system maps where hidden value sits across the 22 elementals and tells the founder which validation effort produces the highest return.

What are value growth constraints?

A value growth constraint is whatever is preventing hidden value from being validated. Constraints are not gaps — a gap is the absence of something; a constraint is the reason the absence persists. Identifying the gap tells you what's missing. Identifying the constraint tells you why it's still missing and what kind of intervention would remove it.

The system classifies five constraint types, each requiring a different response. Evidence constraints — the founder hasn't gathered the data yet; the fix is a specific discovery task. Capability constraints — the founder lacks a skill, social capability, or emotional capacity to absorb the answer. Structural constraints — something external blocks progress: gate sequencing, team gaps, capital, regulation. Time constraints — the system hasn't observed enough sessions to assess accurately. Willingness constraints — the founder could validate but chooses not to.

Matching the wrong intervention to the constraint wastes time and erodes trust. The system diagnoses the constraint type first, then recommends the intervention that fits.

Is this just a chatbot? How is it different from ChatGPT?

It is not a chatbot in the conversational-AI sense. A general-purpose language model gives generic advice. TSM operates on a structured methodology distilled from years of hands-on mentoring across hundreds of founders — formalised into 22 elemental and 15 composite dimensions, six readiness gates, a five-level evidence scale, founder archetypes, adaptive coaching dynamics, and a complete document generation pipeline.

The difference is in the specificity and rigour. It detects deflection patterns. It tracks coachability through observed behaviour, not self-report. It adjusts coaching approach based on how the founder responds. It classifies every claim by evidence quality and prevents high-confidence assumptions from passing readiness gates. A general AI assistant does none of this.

What stages of venture does it work for?

From ideation through to growth stage. The framework adapts to where the venture is. At ideation, the focus is on Foundation pillars — who the customer is, whether the pain is urgent, whether someone will pay. At validation and traction, the Business Model and Positioning categories activate. At scaling and growth, the composite layer (Compounding Moat, Capital Efficiency, Scale Readiness) carries more weight.

Six readiness gates correspond to stage progression. A venture cannot meaningfully advance to later gates without passing earlier ones — Foundation before Positioning, Positioning before Business Model. The Impact gate is assessed in parallel.

What industries does it cover?

The framework is industry-agnostic at its core — the 22 elementals and 15 composites apply to any venture. On top of that, ten industry-specific validation overlays adjust benchmarks, evidence expectations, timelines, and coaching emphasis for sectors where standard validation advice doesn't apply. You can't "talk to 10 customers this week" in biotech or regulated finance.

The ten overlays: FinTech, AgriTech, HealthTech & Digital Health, Biotech & Life Sciences, Enterprise B2B & SaaS, Marketplace & Platform, Hardware & Deep Tech, Climate Tech, EdTech, and Regulated Industries. The system also includes a tarpit detection engine that flags structurally difficult ideas — AI wrappers, marketplace crowding, behaviour-change-required models, enterprise mirages, and other recurring failure patterns.

What languages does it support?

Over 40 languages. The founder works in whatever language they think in — Hindi, Mandarin, Portuguese, Arabic, Bahasa, Dutch, German, and many others. The system assesses and coaches in that language. The output data is structured: pillar scores, evidence levels, and gate results are language-independent. An E3 evidence rating means the same thing regardless of which language the session happened in.

This removes English proficiency as a proxy for entrepreneurial capability. A founder in Jakarta with validated pricing conversations and strong behavioural evidence stands exactly where they should — ahead of a founder in London who hasn't talked to a customer.

Does it replace human mentors?

No, but it changes what human mentors should spend their time on. Most mentoring time is spent on structured assessment, evidence tracking, and between-session continuity — the parts of the job that are essential but tedious and inconsistently delivered at scale. The system handles those parts reliably.

That frees human mentors to focus on what they're best at: relationship, judgment, nuanced pattern recognition, and the kind of support that requires a human in the room. The founder gets both: structured rigour from the system, human wisdom from the mentor, and a shared data layer that keeps them aligned.

Why use mentoring for venture assessment?

Because mentoring sees what assessment alone cannot. A pure assessment system reads what the founder has produced. A mentoring system observes what the founder does when challenged: how they respond to a hard question, whether they execute on homework, whether they revise their position when the evidence says they should. That behavioural data is the single largest source of variance in early-stage outcomes, and it is invisible to a pitch deck.

TSM is structured assessment with mentoring built into the input mechanism. The output looks like a due diligence report. The data behind it is richer than any due diligence report can produce on its own.

How do I get in touch?

Email info@evidimensional.com or use the contact form. We respond within a working day, usually faster.

If you have a venture in mind already, including the company name and a sentence on what you want to learn from the assessment helps us point you to the right product on the first reply.

🧭

The framework

22 elementals, 15 composites, six gates, five evidence levels

What does TSM measure?

Two layers of value dimensions. The elemental layer is what can be measured directly — 22 foundational dimensions across customer, pain, monetisation, team, moat, governance, and so on. Each elemental captures a single, irreducible aspect of the venture. The composite layer is what emerges when elementals combine — 15 architectural properties such as Category Creation, Platform Potential, Compounding Moat, Execution Coherence. Composites are not the average of their components; they are properties of combination.

A venture can score well on four elementals individually and still fail the composite they should produce, because the components don't cohere. The composite layer surfaces this. The elemental layer tells an investor what raw material the venture is working with. The composite layer tells them whether that material has been assembled into something architecturally sound.

What are the 22 elemental dimensions?

The 22 elementals are organised into five categories. Foundation: Customer Definition, Pain Point, Need-to-Have. Positioning: Product Advantage, Why Now, Unfair Advantage. Business Model: Monetisation, Acquisition, Go-to-Market. Impact: Impact Intentionality, Impact Measurability, Impact Scale, Stakeholder Depth, Impact Durability. Team & Durability: Vision, Team, Moat, Risk Mitigation, Capital Strategy, Flywheel, Optionality, Governance.

Each elemental is scored 0–100 against a stage-appropriate evidence threshold and graded E1–E5 on the evidence scale. The scoring reflects how far along the venture is on that dimension, not how important the dimension is.

What are the 15 composite dimensions?

The 15 composites are architectural properties grouped into four categories. Market & Category: Category Creation, Timing Discipline, Reverse-Engineering Risk, Regulatory Runway. Value Capture: Platform Potential, Distribution Economics, Compounding Moat, Capital Efficiency, Pricing Power. Execution: Execution Coherence, Scale Readiness, Founder-Market Fit, Capability-Strategy Match. Impact: Impact-Native, Stakeholder Alignment.

Composite scores are computed as the geometric mean of their components — weakness in any component constrains the composite. A strong elemental cannot offset a weak one. Composite evidence is set by the weakest component: a single E1 elemental limits the composite to E1 validation, however strong the others.

How do composites differ from elementals?

An elemental measures one aspect in isolation — how strong the team is, how clear the monetisation model is, how deep the moat is. A composite is what falls out when those elementals interact. Category Creation isn't a single score; it's what happens when product advantage, timing, unfair advantage, and vision cohere. Pricing Power is what happens when pain, need-to-have, advantage, and moat reinforce each other.

Composites can look different from the average of their components. Four mediocre elementals that happen to cohere can produce a respectable composite. Four strong elementals that don't combine coherently can produce a weak one. The composite layer tests architecture — whether the parts fit.

What are the Elemental Value Map and Composite Value Map?

Two radar charts that visualise the same venture from two angles. The Elemental Value Map plots all 22 elementals as spokes. It carries two polygons: the constraint line (how far the venture has progressed on each dimension) and the validation line (how much is independently proven). The gap between them is the hidden value — real potential not yet validated.

The Composite Value Map plots all 15 composites as spokes, grouped by architectural category. Outer edge is blue-ocean territory — composite architecture assembled. Centre is red-ocean — primitives not yet combined. The two maps are read together: the elemental view tells you the raw material; the composite view tells you whether the architecture has been assembled.

What are the four architectural categories?

The 15 composites group into four architectures. Market & Category Architecture: where the venture sits in the competitive terrain — whether it is creating a category, exploiting timing, defending against reverse engineering, navigating regulatory runway. Value Capture Architecture: whether value flows to the company and stays there — platform dynamics, distribution economics, compounding moat, capital efficiency, pricing power. Execution Architecture: whether the venture can deliver what it has described — coherence, scale readiness, founder-market fit, capability-strategy match. Impact Architecture: whether impact is integral to the business model or bolted on.

A balanced composite shape across all four categories indicates a venture that has assembled its architecture broadly. Spikiness indicates strength in one architecture and gaps in others — common, and diagnostic.

What is the evidence scale (E1–E5)?

Every claim is graded on five levels. E1: Assumption — the founder believes it but has no external validation. E2: Anecdotal — informal feedback from the founder's network; biased sample. E3: Validated — independent third-party evidence confirms the claim. E4: Behavioural — observable actions: signed LOIs, pilot usage, co-investment. E5: Transactional — revenue, binding contracts, repeated transactions.

The jump from E2 to E3 — from "people I know agree" to "strangers confirmed it" — is the single largest evidence multiplier in the system. A high pillar score at E1 cannot pass a readiness gate, however confidently it is asserted.

What's the difference between evidence and validation?

The founder brings evidence. The system validates. Evidence is what the founder has gathered: customer interviews, signed contracts, usage data, pilot results, financial records. Validation is the system's grading of how strongly that evidence supports the claim being made.

The E1–E5 multiplier is a validation weight, not an evidence weight. The same piece of evidence can validate one claim strongly and another weakly depending on the claim being supported. Three customer interviews validate "the pain is real" at E3; the same interviews validate "willingness to pay at €500/month" at E1 unless price was actually discussed.

How does evidence weighting affect valuation?

Evidence levels apply multipliers to the valuation engine. The same pillar score produces different valuation contributions depending on the evidence behind it. A pillar scoring 80 at E1 contributes far less than the same 80 at E4 — and rightly so, because the first is an assertion and the second is a behaviour.

The practical consequence: moving evidence levels up is often a faster path to valuation expansion than improving raw scores. A founder asking "what should I work on?" usually gets directed to evidence-gathering on the highest-scoring under-validated pillars rather than to raising scores that are already validated.

What are the readiness gates?

Six progression checkpoints that test whether a venture has earned the right to advance. Foundation — does the customer–pain–need triangle hold? Positioning — is differentiation real and defensible? Business Model — does monetisation work and can it be acquired against? Scale Readiness — can the team and structure absorb growth capital? Scaling — is durable moat evident and is the flywheel turning? Impact — assessed in parallel for ventures positioned as impact-native; not a fail for purely commercial ventures.

Gates 1–5 are sequential. A high score on a later gate built on an unvalidated earlier gate is provisional. The gate sequence is a structural test, not a checklist — failing Foundation makes everything downstream weaker.

Does the system produce valuations?

The Detailed Assessment produces a valuation range using seven established methods: Berkus, Scorecard, Risk Factor Summation, First Chicago, Comparable Company Analysis, plus two scenario-based methods. The system runs all seven, discards the highest and lowest as outliers, and reports the middle range as floor / midpoint / ceiling.

Evidence levels apply multipliers across all methods. Two ventures with identical pillar scores produce different valuations if one is at E1 and the other at E4. The output also surfaces the cap reason — the single factor most limiting the ceiling — and the uplift paths that would expand the range. The Executive Assessment produces an indicative range only, anchored on the most recent priced round; precise valuation requires confidential inputs.

What does "impact-native" mean?

Impact is native when the customer the venture sells to coincides with the beneficiary the impact targets. If yes, impact scales with revenue. If no, impact is bolted on — there is a commercial customer and a separate impact story, and revenue growth doesn't automatically produce impact growth.

This is the most common diagnostic failure mode for impact-intentional ventures. A consultancy whose customers are ministries but whose impact targets citizens has articulated impact without designing it into the business model. A food-service business whose paying customers are parents but whose beneficiaries are children has customer and beneficiary effectively aligned. Impact-Native is a composite — it tests whether the customer shape confirms what the impact intent claims.

🎙️

Sessions & inputs

Public, confidential, live — and how they combine

What's the difference between public and confidential inputs?

Public inputs are everything available without the venture's involvement: websites, registries, news, LinkedIn, published materials, regulatory filings. Confidential inputs are materials the venture provides: pitch deck, financial model, cap table, customer contracts, pipeline data, IP filings, team documentation. Live interaction adds a third dimension — the founder in the room, responding in real time.

The framework is the same regardless of input depth. What changes is the evidence ceiling. Public-only assessments typically plateau at E2–E3 because no internal data is available. Confidential access reaches E4–E5 wherever signed contracts, retention data, or revenue exist. Live sessions add behavioural evidence that no document can produce.

How do depth of input and breadth of output combine?

Two axes define the assessment model. Depth of Input ranges from public information through confidential materials to live founder interaction. Assessment Output ranges across four modes: Snapshot (point-in-time), Longitudinal (how things change over time), Behavioural (how the founder responds under challenge), and Lateral (how the venture compares to the assessed population).

Each product widens both axes. A Executive Assessment is public input, snapshot output. A Detailed Assessment is confidential input, snapshot or longitudinal output. Live mentoring sessions add the behavioural axis. Cohort and portfolio aggregation produces the lateral comparison. The full aperture is all inputs, all outputs.

How does a session work?

A live session is a structured conversation between the founder and the system, lasting 60 to 90 minutes. The system has read the materials in advance (deck, business case, prior assessments) and arrives with a focused agenda — typically the two or three pillars where evidence is thinnest or scores are most contested. The conversation tests claims, gathers evidence, and adjusts the assessment in real time.

Sessions are interactive rather than interrogative. The founder can challenge any score, any reasoning, any framing. The system holds its position when the founder is wrong and updates when the founder is right. The result is a more accurate assessment than any document could produce, because the document doesn't push back when the founder overstates.

Is it text-only or is there a voice option?

Both. Text sessions are the default for institutional deployments — written transcripts produce structured data and clean evidence trails. Voice sessions feel closer to a real mentor conversation and produce richer behavioural signal: hesitation patterns, confidence shifts, reformulation under pressure. The system runs the same framework either way; the channel changes the texture of what's observed.

For high-stakes assessments (investor readiness, board reporting), a hybrid approach often works best: voice for the substantive conversation, text follow-up for evidence submission and homework.

How many sessions does a venture need?

For a one-off Detailed Assessment, a single session is sufficient — the system reads the materials, runs one structured conversation, and produces the output. For a longitudinal engagement (accelerator cohort, portfolio monitoring, founder coaching), the cadence is typically monthly: each session updates evidence levels, tests gate progression, and surfaces what changed.

The behavioural signal compounds with sessions. One session captures a snapshot of how the founder responds. Three sessions reveal pattern. Six or more produce a coachability trajectory — whether the founder is genuinely revising or just re-framing the same position with new vocabulary.

What happens between sessions?

The session ends with a small set of homework items — specific evidence-gathering tasks targeted at the highest-leverage validation gaps. Customer interviews on a specific question, pricing conversations with named prospects, document production, technical milestones. The homework is not a generic to-do list; it is calibrated to the constraint type the system diagnosed during the session.

Between sessions the founder executes. At the next session the system tests what was completed, grades the new evidence, and updates the assessment. Homework completion patterns themselves become a signal — they predict outcomes more reliably than founder self-assessment.

Does the system remember previous sessions?

Yes. Each session builds on the prior fixture — the structured record of pillar scores, evidence levels, gate status, coachability observations, and homework outcomes. The system remembers what the founder said three sessions ago, what evidence they committed to gathering, and what they actually delivered. This is what makes longitudinal coachability tracking possible: the system can compare today's assertion to last month's.

Each Project in the system has its own isolated memory. Cross-venture leakage is not possible.

Can founders game the system?

Some, marginally. Founders can present claims persuasively. They can rehearse answers to common questions. They can avoid topics they know are weak. What they cannot do is fabricate evidence the system can't trace — E3 requires independent third-party validation; E4 requires observable actions; E5 requires transactions. Confident assertion at E1 stays at E1, however polished the delivery.

Live sessions reduce gaming further by testing claims under pressure. A coached pitch falls apart when the system asks for the specific evidence behind a specific claim. Pattern detection across sessions catches founders who change the position without changing the evidence — a tell that signals the position was performance, not conviction.

Can you assess a venture without running a live session?

Yes. The Executive Assessment runs from public sources only — no founder involvement required. The Detailed Assessment runs from confidential materials when supplied — pitch deck, financial model, customer contracts, IP documentation. Both produce the full 22-elemental and 15-composite output, the seven-method valuation range (Detailed Assessment), readiness gate scoring, and the two value maps.

What document-only assessments cannot produce is behavioural evidence — coachability, deflection patterns, self-awareness gap. Those require a live session. For investor screening, prospect review, and pre-diligence preparation, document-only is often sufficient. For founder development and investment readiness, the live layer adds signal that materially changes the picture.

What about data privacy and GDPR?

Confidential materials are used solely to build the assessment. They do not feed any AI model training. They are not sold or shared with third parties. They are permanently deleted within 30 days of the engagement ending unless the venture explicitly requests retention for a longitudinal engagement.

EviDimensional is registered in the Netherlands and operates under EU GDPR. For institutional deployments, data processing agreements and BCR-equivalent terms are signed before any confidential material is shared. For founder-only engagements, the standard terms cover the founder's own data; founders remain responsible for any third-party data they bring into the session (customer interviews, employee records, etc.).

📦

The products

Executive, Detailed, Investor Readiness, Bespoke

What products does TSM offer?

Four products, each calibrated to a different question and a different moment. Executive Assessment — the full framework on public information only, suited to outbound prospect review and longlist screening. Detailed Assessment — the framework run against confidential materials, suited to internal go/no-go decisions and pre-investment deep-reads. Investor Readiness Assessment — Detailed Assessment plus the complete investor-conversation package (memo, pitch deck, business case, technical dossier, commercial strategy). Bespoke — custom deliverables drawn from any tier's data.

All four share the same engine. What changes is the depth of input and the scope of the deliverable.

What is the Executive Assessment?

The Executive Assessment runs the full 22-elemental and 15-composite framework on publicly available information only — websites, registries, press, LinkedIn, public filings, regulatory documents. Evidence levels plateau at E2–E3 by design because no internal data is used. Where the assessment cannot reach without confidential inputs, the report says so explicitly rather than guessing.

Suited to outbound prospect review, longlist screening, market mapping, preparation before opening diligence, or any moment the question is whether to engage further with a venture you haven't yet approached. Output includes the two value maps, readiness gate scoring, the seven-method valuation range (indicative, anchored on most recent priced round), red and green flags, and growth levers.

What is the Detailed Assessment?

The Detailed Assessment runs the framework against confidential materials — pitch deck, financial model, cap table, customer contracts, pipeline data, IP filings, team documentation. Evidence levels reach E4–E5 wherever signed contracts, retention data, or revenue exist. The two value maps become testable rather than inferred. Composite scores carry validated weight.

Suited to internal go/no-go decisions, pre-investment deep-reads, post-investment portfolio monitoring, or any moment the question is what the full picture looks like rather than just the public slice. Output includes everything the Executive Assessment produces plus computed unit economics, runway, capital strategy assessment, dilution trajectory, and a narrowed valuation range via DCF and scenario methods on top of comparables.

What is the Investor Readiness Assessment?

The Investor Readiness Assessment is the Detailed Assessment plus the complete investor-conversation package. One engagement, one coherent dataset, every output a founder needs to walk into an investor meeting prepared.

The package includes the Detailed Assessment, an investor memo, a pitch deck, a pitch review, a business case, a technical dossier (for deep-tech), a commercial strategy, a business-model strategy, and a technical strategy. All produced from the same assessment data, so the numbers in the financial model match the numbers in the memo, the strategy in the deck matches the strategy in the business case, and the technical claims in the pitch match the claims in the dossier. Coherence by construction.

What is the Bespoke option?

Bespoke is custom deliverables drawn from any tier's underlying data, scoped per engagement. Cohort reports across an accelerator portfolio. Sector-percentile benchmarks across an investor's deal flow. Board-reporting dashboards that match a specific institutional template. Programme-fit analyses for university intake. Anything the assessment data can produce that doesn't fit the standard product shape.

Bespoke engagements are scoped collaboratively. The assessment data is the same; what differs is the format, audience, and synthesis.

What documents does each assessment produce?

The Executive Assessment produces a single document: the assessment report itself, typically 50–70 pages, with executive summary, two value maps, readiness gate scoring, evidence quality analysis, valuation range, growth levers, KPI dashboard, full pillar-level detail across all 22 elementals and 15 composites, and a sources bibliography. PDF deliverable.

The Detailed Assessment produces the same report at greater depth (E4–E5 evidence, computed unit economics, narrowed valuation) plus an internal-only commercial strategy document, a business-model strategy document, a technical strategy document (where relevant), and a plan of action.

The Investor Readiness Assessment adds the external-facing investor package: investor memo, pitch deck, pitch review, business case, technical dossier (for deep-tech), gap closure plan, plus optional white paper and science article for academically-grounded ventures.

When will there be a self-service product?

A self-service version is on the roadmap. The framework, generators, and document pipelines are operational; the remaining work is on the public-facing infrastructure — secure document upload, automated billing, account management, and the support capacity to handle questions at scale. We'll announce timing on the roadmap page when it's ready.

Until then, all engagements run through direct contact — which has the advantage of letting us calibrate the output to what you actually need rather than what a self-service flow assumes.

🚀

For founders

What you get, how to use it, what to share with investors

What does the founder actually get?

An evidence-graded map of the venture. The assessment shows where the venture is strong, where it is constrained, where the evidence is thin, where hidden value sits, and what specific actions would move it forward. The two value maps make the diagnostic visual. The growth levers and KPI dashboard make it actionable.

For a founder preparing to raise, the Investor Readiness Assessment additionally produces the documents an investor conversation requires — memo, deck, business case, technical dossier — all built from the same coherent dataset, so nothing contradicts anything else.

Why would founders be willing to do this? It costs them a lot of time.

Because the alternative costs more time. A founder who walks into investor meetings without an evidence-graded map of their own venture answers the same questions repeatedly, learns about their weak pillars from investors saying no, and spends months in conversations that close as soft passes. The assessment surfaces those weaknesses in advance and tells the founder what to fix before the conversation, not after.

The session itself takes 60–90 minutes. The output saves weeks of investor-cycle iteration. Founders who run the assessment before fundraising consistently report that their first three investor meetings go materially better than they would have without it.

Why would founders trust an AI mentor?

Because the AI doesn't have the conflicts a human mentor often does. It isn't selecting deals for a fund. It isn't building a reputation by association with successful founders. It isn't trying to fill a programme cohort. It is grading evidence against a structured framework and reporting what it finds. That's a narrower job than human mentoring, but it's the part most worth automating.

Founders should not, however, treat the system as a substitute for human judgment. The assessment is one input. The founder's own conviction, the wisdom of trusted human mentors, the specific signals from the market — those remain primary. The system is a tool that makes those primary inputs better-calibrated.

Why would founders use this instead of a human mentor?

Most founders don't have access to a great human mentor. The ones who do should keep using them — the system complements rather than replaces human mentoring. What the system delivers that human mentors typically cannot is structured, evidence-graded analysis across all 22 elementals and 15 composites in 60–90 minutes, repeatable across sessions, with a complete written record.

Human mentors are better at the relational, intuitive, and reputation-leveraging parts of the role. The system is better at the structured, comparable, document-producing parts. The founder who has both gets the best of each.

Why would founders be honest with the system?

Because dishonesty doesn't help. Asserting something at E1 produces an E1 grade — confident assertion doesn't move the needle. Hiding a weak pillar produces a low score on that pillar with a flagged note that the founder declined to provide evidence. The system cannot be talked into a higher score; it can only be shown evidence that warrants one.

Most founders also use the assessment for their own benefit, not to impress an external audience. In that frame, dishonesty defeats the purpose — the founder is paying to see where they actually stand, not where they would like to stand.

Is what I say in a session confidential?

Yes. Session content is confidential between the founder and EviDimensional It is not shared with third parties, not used to train AI models, and not aggregated into any external benchmark in identifiable form. Anonymised pattern data may inform sector benchmarks, but no individual session content reaches anyone outside the engagement.

Where the engagement is institutional (an accelerator or investor running the assessment on a portfolio venture), the institution receives the assessment output as part of the agreed deliverable; that is contractual, not a leak. The founder always knows in advance who will see the output.

How does the system decide what the founder should work on next?

Three filters, applied in order. First, the gate sequence — the system prioritises the current frontier gate, because work on later gates that depends on an unvalidated earlier gate produces fragile progress. Second, the evidence multiplier — within the current gate, the system targets pillars where moving evidence levels up produces the largest valuation expansion. Third, the constraint type — the action recommended depends on whether the constraint is evidence, capability, structural, time, or willingness.

The result is a small number of specific actions, not a generic to-do list. The founder finishes the session knowing what to do, why it matters, and what evidence to bring back next session.

What happens when evidence discovery shows the idea doesn't work?

The system says so, plainly. It does not soften the result to protect the founder's feelings, and it does not declare the venture dead. What it does is surface the specific evidence that contradicts the thesis and identify whether the failure is fixable (wrong segment, wrong price, wrong distribution) or fundamental (no real pain, no willingness to pay at any price, no defensible position).

Many ventures discover this way that the original thesis is wrong but a related thesis is right — the customer is different, the wedge is different, the model is different. That's the pivot conversation, and the system structures it: what must change, what can stay, what evidence the new direction would need.

What can a founder actually do with the assessment?

Three things. Diagnose — read the maps, identify which pillars and composites are constraining value growth, and prioritise where attention will produce the highest return. Communicate — share the relevant sections with co-founders, board members, advisors, or investors, and use the structured output to anchor difficult conversations in evidence rather than opinion. Build — execute the homework, gather the missing evidence, and re-assess in 30–90 days to see what moved.

For founders preparing to raise, the Investor Readiness Assessment additionally produces the materials the conversation requires.

Can a founder use this without being part of a programme?

Yes. Founder-only engagements are common and don't require any institutional sponsor. The output is for the founder; the institution sees nothing unless the founder chooses to share. This is often the fastest path for fundraising founders who want a clean, evidence-graded view of their own venture before investor conversations begin.

I've already raised funding. Is this still useful?

Yes — possibly more useful, because the next round will require evidence the last round did not. A seed-funded venture preparing for Series A faces a different evidence threshold across most pillars, and the system maps the gap explicitly: which pillars need to move from E2 to E4, which composites need to cohere, which gates the venture has not yet passed at Series A standards.

For post-investment ventures, longitudinal monitoring also gives the existing investor a structured view of what has changed since the round closed — useful for board reporting and for deciding whether to follow on.

🏛️

For institutions

Investors, accelerators, universities, programme managers

How is this different from traditional due diligence?

Traditional due diligence reads the pitch deck and the financial model. TSM grades every claim across 22 elementals and 15 composites against an evidence scale, surfaces architectural coherence at the composite layer, and separates what has been validated from what is asserted. Where live sessions are available, it also observes how the founder responds under challenge — behavioural signal that document-based diligence cannot capture.

The output is structured, evidence-graded, and comparable across every venture in your pipeline. You stop selecting on the quality of the pitch and start selecting on the quality of the founder and the architecture they have assembled.

Can I rely on this assessment for investment decisions?

Not blindly, and we would never suggest that. But you can rely on it for something no other input gives you — a structured separation of what the founder has validated from what the founder believes. A pitch deck is written to sell. Mentor impressions are unstructured. Reference checks tell you what people are willing to say out loud. None of these tell you which specific claims are backed by evidence and which are assumptions stated with confidence.

The assessment makes that gap visible. Every claim graded E1 through E5. When a venture scores 85% on customer demand at E1, you know exactly what you're looking at: a confident founder who hasn't tested the hypothesis. When another scores 60% at E3, you know that founder has done the work and the result is moderate but real. The second is a better bet — and no pitch deck will tell you that.

Should you make decisions solely on this assessment? No — just as you wouldn't decide solely on a pitch deck or a financial model. But if your evaluation of early-stage ventures isn't structured and evidence-graded, you are ignoring the single largest source of variance in outcomes.

Does the system make investment recommendations?

Not explicitly. The system produces investment-readiness assessments, valuation ranges, gate classifications, evidence-graded pillar scores, and red-flag severity ratings. It will tell you that customer validation is E4, that founder coachability is high, that pricing is unvalidated, and that the Business Model gate is not yet passed. An experienced investor will absolutely interpret that as directional.

The distinction we maintain is that the system assesses readiness — it does not say "invest" or "don't invest." Investment decisions involve thesis fit, portfolio construction, risk appetite, and strategic considerations the system shouldn't pretend to evaluate. A venture that scores strongly might still be wrong for your fund. A venture with critical gaps might be exactly right if you can help close them.

Can it be used for screening before selection?

Yes — and it's one of the strongest use cases. Instead of selecting from application forms and pitch decks, every applicant gets a Executive Assessment from public sources or a Detailed Assessment from submitted materials. The selection committee then reads evidence-graded assessments, not self-reported applications. The quality of selection improves, and the applicant experience improves too — even rejected candidates leave with a structured map of what to work on.

For larger funnels, Bespoke cohort-level outputs surface systemic patterns: which pillars cluster low across applicants, which sectors over-represent at which evidence levels, where the cohort sits relative to comparable populations.

Our existing mentors will feel threatened by this.

This is a real deployment concern, worth addressing directly. The system does not compete with your mentors for the same work. It handles structured assessment, evidence tracking, and between-session continuity — the parts of mentoring that are essential but tedious or difficult to do consistently at scale. It frees your mentors to focus on what they're best at: relationship, judgment, nuanced pattern recognition, and the kind of support that requires a human.

Mentors who use the system typically report that it makes their sessions better. They walk in already knowing where the venture stands, what the evidence gaps are, and what the founder has been told to work on. They spend their hour on the conversation that matters, not on re-establishing context.

What if we have 50+ ventures?

Cohort-scale deployment is a core use case. Each venture gets its own assessment with isolated memory; the institution receives the individual outputs plus aggregate dashboards that surface patterns across the cohort. Common pillar weaknesses, evidence-level distributions, gate progression rates, sector clustering — all visible at a glance and traceable back to the specific ventures.

The aggregate view is often more strategically useful than any individual assessment. It tells programme managers where their cohort is structurally strong or weak, which suggests where to invest mentor time, which curriculum to add, and which selection criteria to revise.

How does the dashboard work for programme managers?

The portfolio dashboard surfaces what was previously invisible: real-time visibility into where each venture stands across the 22 elementals and 15 composites, evidence-level distribution per venture, gate progression, homework completion patterns, coachability trajectories, and red flags as they emerge. Drill from the cohort view into any individual venture to see its complete assessment.

Aggregate views answer the questions programme managers actually ask. Which ventures are stuck and on which pillars. Where mentor time is producing the most movement. Which cohorts are performing better than others on which dimensions. Which sectors over-represent at which evidence levels.

Can we use the assessments for board reporting or accreditation?

Yes. The structured output is well-suited to board materials precisely because it separates evidence from assertion. A board member reading "70% of cohort at E3 or above on Customer Definition" gets a defensible signal of programme quality. Compare that to "the founders are passionate" or "the cohort is strong" — both common in board updates and neither verifiable.

For accreditation contexts (university entrepreneurship programmes, regional development funds, EU grant compliance), the evidence-graded output is auditable in a way self-reported metrics are not. Bespoke deliverables can match specific reporting templates.

We use Venture Design (or Lean Startup, or another methodology). Won't this conflict?

No — the assessment framework is methodology-agnostic at its core. It measures what is true about the venture (what is the customer, what is validated, what is the architecture) regardless of which methodology produced the current state. Lean Startup, Venture Design, Customer Development, Disciplined Entrepreneurship — all produce ventures the framework can grade.

What the system adds is the evidence layer on top of whatever methodology you teach. A Lean Startup cohort that has done forty customer interviews still benefits from grading those interviews E1–E3 and surfacing which claims they actually validate. The methodology is the doing; the assessment is the grading.

Can we customise the assessment for our programme's focus?

Yes, within structured limits. Industry overlays adjust evidence expectations and timelines for sector-specific realities. Programme-specific weighting can emphasise pillars that match a focus area (impact, deep tech, emerging markets). Custom outputs can match specific reporting templates. What we don't do is re-engineer the framework to score ventures higher than the evidence warrants — the integrity of the assessment is what makes it useful.

What if a founder gets a harsh assessment and leaves the programme?

The assessment isn't designed to be harsh; it's designed to be accurate. Founders sometimes interpret accurate assessments as harsh, particularly if they are used to environments that grade on enthusiasm. The system surfaces the gap between conviction and evidence in a structured way, which lands differently than a mentor saying "I'm not sure this is working."

The mitigation is in delivery. Programme staff who introduce the assessment as a diagnostic for value growth — not a verdict — see much higher founder receptivity. Founders who understand that low scores at E1 are normal for early-stage ventures, and that the path forward is specific evidence-gathering, treat the result as a map rather than a judgment. Most ventures who run the assessment ask to run it again 60 days later.

Is this appropriate for student teams?

Yes. The framework adapts to ideation and validation stages, where most student teams sit, and the evidence scale is particularly valuable for student teams because it reframes "we did the assignment" as "we gathered E2 evidence" — which is honest about how much the work proves. Students learn the difference between activity and validation, which is the single largest gap between a student business plan and a real venture.

For courses that grade on project work, the assessment provides a reproducible standard the professor can hold without relying on subjective judgment. For competitions, it surfaces which teams have done the validation work that the pitch alone may obscure.

Can students use this independently or does a professor need to be involved?

Both modes work. Students can run the assessment on their own venture as a development tool — the same as any other founder. For course deployments, the professor typically defines the engagement scope (one assessment per team per semester, longitudinal across the course, comparison at the end), and the institutional dashboard gives the professor cohort-level visibility while preserving each team's individual session privacy.

What about academic integrity? Could students use this to write their business plans?

The assessment isn't a business plan — it's an evidence-graded diagnostic. Students can use it the same way they use a financial calculator: the tool does the structured analysis, the student does the work that produces the inputs. A student who hasn't talked to customers cannot fake an E3 evidence rating; the system asks for the specific interviews, what was discussed, and how the conclusions were drawn.

For courses that want to verify academic integrity, the session transcripts are auditable. A student who outsourced the validation work to AI rather than doing it themselves will produce a thin transcript that the professor can recognise.

What systemic patterns can cohort or portfolio aggregation reveal?

More than most institutions realise they have. Across 50+ ventures, the system surfaces which pillars systematically score low across the cohort (often a programme-wide weakness, not a per-venture issue), which evidence levels stall before E3 (suggesting curriculum or mentor gaps), which composites fail to cohere even when elementals are strong (a structural pattern), and how the cohort's sector mix compares to outcome-based benchmarks.

For investors, the same aggregation across a portfolio surfaces evidence-velocity differentials between portfolio companies — which ventures are accumulating evidence faster than peers, often a leading indicator of which will reach the next round. The pattern signal is qualitatively different from any single assessment.

🔍

Trust & methodology

Who built it, how it was tested, and what it can't do

Who built this?

TSM was built by Roelof Vuurboom, founder of EviDimensional (Driebergen, Netherlands). Roelof has worked with hundreds of founders across institutional accelerator and university programmes over many years, and the system encodes the tacit mentoring expertise accumulated across those engagements into a structured AI assessment framework.

The product is built on Anthropic's Claude as the underlying language model, with a methodology layer (frameworks, fixtures, generators, document pipelines) that is original to EviDimensional.

Why did you build this?

Because the same diagnostic conversations were happening over and over again with different founders, and the structure of those conversations was reproducible enough to be encoded. The first version of the system was a personal tool — a way to make Roelof's own mentoring more consistent across sessions. It became a product when other mentors and programme managers asked whether they could use it too.

The deeper motivation: most early-stage venture failure is diagnostic. Founders work hard on the wrong thing because no-one tells them what the right thing is. The framework makes the right thing visible.

What is your vision?

Investment-grade venture assessment available wherever there is a founder. Today, the difference between a founder in Amsterdam with access to top-tier mentors and a founder in Jakarta or Lagos working in the same problem space is mostly a difference in support infrastructure, not a difference in capability. The framework is language-independent, sector-aware, and stage-adaptive — it can hold the same standard for a deep-tech founder in Eindhoven and a fintech founder in Manila.

The longer arc: every venture programme — accelerator, university, fund, regional development agency — operating from the same evidence-graded layer, with founders moving between contexts without losing their assessment history.

Why should I trust this methodology?

Three reasons, and we're transparent about each. Origin — the framework is distilled from years of hands-on mentoring across hundreds of founders, not derived from theory. The pillars and composites are what actually predict outcomes in practice, surfaced through pattern observation rather than hypothesised. Structure — every claim is graded on a published evidence scale, with the multipliers that apply to valuation transparent and traceable. Output — read an actual assessment. The diagnostic either matches what you would have concluded yourself or surfaces something you missed. That is the test.

What we don't claim: that the framework is finished, that every score is precise to the point, or that it replaces investor judgment. The system is a structured input, not a verdict.

What patterns has the system found across venture types?

Several recurring patterns. The single largest evidence stall is at the E2-to-E3 jump — most founders gather anecdotal evidence from their network and stop there, never running the structured customer discovery that would move pillars to validated. Composite incoherence is the most common diagnostic failure for ventures that look strong on individual pillars: four good elementals that don't combine into the architecture they should. Impact-Native composites systematically read low for ventures positioned as impact-driven, because the customer the venture sells to rarely coincides with the beneficiary the impact targets.

For deep-tech specifically, the gap between research strength (often E4–E5) and commercial validation (often E1–E2) is structural and hard to close — the team that built the science isn't usually the team that closes the commercial loop.

What is "symbiosis" and how was the methodology developed?

Symbiosis is the working pattern by which the methodology was built — and continues to evolve. The system mentors EviDimensional's own venture (TSM itself), identifies value gaps, and Claude builds the features that close them, in the same conversation. The website you're reading, the assessment generators, the dashboard, the document pipelines — all built inside mentoring sessions where the gap was identified and the implementation followed in a single cycle.

The product is its own proof of concept. The methodology evolved through use, not through specification.

Has this been tested with real founders?

Yes — extensively. The framework was developed inside live mentoring engagements and has been run on ventures across stages from ideation to growth, across sectors from deep tech to fintech to consumer, across geographies from the Netherlands to South-East Asia. Both Executive and Detailed Assessments have been delivered to investors, founders, and programme managers, with feedback driving iterative refinement of the framework.

What we have not yet completed is a controlled outcome study — ventures assessed at time T, with funding-and-survival outcomes measured at T+3 years. That data is being accumulated; first results are 12–18 months away.

How do I know the assessments are accurate?

Read one. The most reliable test is to run the framework against a venture you already know well — your own, a portfolio company, a deal you've worked closely on — and compare what the assessment surfaces to what you've concluded. If the system reads the same shape you do, it's calibrated. If it surfaces something you missed, that's signal. If it concludes something you disagree with, the disagreement itself is diagnostic — read the evidence trail and decide whether the system or your prior view holds up better.

Accuracy is not "the system says X and X is true." Accuracy is "the system grades evidence consistently against a published standard, and the conclusions follow from the evidence." That standard is reproducible.

What AI model does the system use?

Claude, by Anthropic. The methodology layer (frameworks, fixtures, generators, document pipelines, evidence scoring) is original to EviDimensional; the underlying language capability is Claude. We chose Claude for its reasoning depth, its honesty under pressure (it pushes back when it disagrees, rather than agreeing for politeness), and Anthropic's safety posture.

The model is not fine-tuned on customer data. Confidential materials submitted by ventures are used solely to build the assessment and are not retained for training.

Is this really innovative?

Several elements are genuinely novel: the two-layer architecture (elemental plus composite, with composites computed as geometric means that punish incoherence), the validation grading separated from evidence gathering, the four-quadrant assessment model that lets the same engine produce different outputs from different input depths, and the loop architecture (assess → guide → monitor → assess again) that makes the same session serve different stakeholders simultaneously.

Other elements (evidence scales, readiness gates, valuation methods) draw on established frameworks — Steve Blank's customer development, Bill Campbell's coaching insights, the Berkus and Scorecard methods. The innovation is in the integration and the encoding into a working AI system, not in inventing the underlying ideas from scratch.

Who is the competition?

Three categories. Traditional consultancies and analysts — McKinsey, BCG, boutique venture analysts — offering due diligence at €25K and up per engagement. The system competes on speed and price (one tenth of the cost, ten times faster, comparable depth of desk work). AI startup tools — applicant scoring systems, founder assessment apps, AI pitch coaches. Most compete on a subset of what the system does (often just pitch deck scoring), without the framework rigour or the document output package. Internal systems — fund analyst teams, accelerator programme staff, university entrepreneurship departments. The system augments rather than replaces these; it is a tool they use, not a substitute for them.

The most candid framing: the real competitor is the absence of structure. Most early-stage decisions today are made on pitch decks and gut feel. The framework competes by replacing some of the gut feel with evidence-graded structure.

Why wasn't this built before?

Two reasons. The capability didn't exist — the underlying language models that can hold a structured framework, conduct a coherent multi-hour conversation, and produce investment-grade documents only became reliable in the last 18–24 months. Before that, AI assessments were either generic or shallow.

And the specific combination — a methodology developed across hundreds of mentoring sessions, encoded by someone with both the mentoring experience and the technical capability to build the system, deployed at the moment the underlying AI capability matured — is unusual. The system exists at the intersection of three skills (mentoring, framework design, AI engineering) that don't often appear in the same person or team.

What if we disagree with an assessment?

Read the evidence trail. Every score has reasoning, every reasoning has cited evidence, every piece of evidence has a graded source. The disagreement is then specific: which claim, which evidence, which grading. Often the disagreement reveals confidential information the system didn't have — in which case the assessment updates when the information is supplied. Sometimes the disagreement reveals a methodology question worth raising — those flow into framework refinement.

What the system won't do is shift a score because the founder or institution doesn't like it. The integrity of the assessment is what makes it useful; a system that adjusts its conclusions on demand isn't an assessment, it's a flattering letter.

What about bias in the assessment?

The framework was designed to reduce some specific biases — pitch quality, English fluency, network signalling, founder demographics — that often correlate with selection in traditional venture screening but don't predict outcomes. The evidence scale grades on what was validated, not who validated it.

That said, no system is bias-free. The methodology was developed primarily from Western startup ecosystems, and although the geographic ecosystem layer adjusts for context, it may not capture every cultural nuance. The evidence standards reflect what works in a venture-capital-funded environment, which is not the only legitimate path to venture success. We take this seriously, and engagement with partners across geographies and population types is part of how the framework continues to be refined.

We've tried AI tools before and they overpromised. Why is this different?

Read an output. The most reliable test against overpromise is to look at what the system actually produces — the assessment is 50–70 pages of evidence-graded analysis, every claim sourced, every conclusion traceable. If it doesn't hold up, you'll see that immediately. If it does, the document speaks for itself.

The system also doesn't overpromise on what it can do. It is explicit about what requires confidential inputs, what requires live sessions, where evidence levels plateau, what behavioural data document-only assessments cannot capture. Where it cannot reach without more input, it says so rather than guessing.

What if the AI gives bad advice and a founder follows it?

The system is structured to minimise this risk in two ways. First, the assessment is diagnostic, not prescriptive — it surfaces what the evidence shows and what is missing, rather than dictating what to do. Second, the homework items recommended by the system are evidence-gathering tasks (talk to these customers, test this price, document this technical claim), not strategic decisions. Strategic decisions remain with the founder.

Founders should treat the assessment the way they would treat any expert input: as one input among several, weighted against their own judgment and the wisdom of trusted human advisors. A founder who follows the system blindly is making a different mistake than the system is making.

What's defensible about the IP? What stops someone copying this?

The frameworks are publishable; we publish significant parts of them. What is harder to copy is the integrated system — the fixtures, generators, document pipelines, evidence scoring, validation logic, gate sequencing, composite computation, and the document templates that produce coherent, investment-grade output across all of them. That is months of integrated engineering, calibrated against hundreds of real assessments.

The deeper moat is the methodology evolution. The framework is refined through use; every assessment surfaces something that improves the next one. A copy would start from a 12–18 month deficit in calibration data and would need to build comparable integration to catch up.

How does the system separate assessment from advice?

Assessment grades the venture against the framework. Advice tells the venture what to do. The system does the first explicitly and the second narrowly. Where the system recommends action, it recommends evidence-gathering ("interview five customers about pricing"), not strategic direction ("pivot to enterprise"). The strategic question stays with the founder.

The reason for this discipline is that assessment is reproducible and verifiable; advice is contextual and contestable. A founder reading the assessment can verify the evidence and the grading. A founder following advice has to trust the adviser's strategic judgment, which is not what the system is built to provide.

Still have questions?

The best way to evaluate the framework is to read what it produces. See sample assessments, or get in touch to discuss what you need.

Get in touch →

Everything you need to know — directly