arrow_backAll posts
InvestigationsJanuary 28, 2026·7 min read

Modern sleuthing: how PI shops are actually using AI for OSINT

OSINT is one of those professions where the tools changed faster than the methodology. Here is what is working in 2026, what isn't, and the chain-of-custody trap that keeps catching teams.

PK
Pavan K
Founder, Mudish Technologies
OSINTInvestigationAI Tools
Modern sleuthing: how PI shops are actually using AI for OSINT

OSINT is one of those professions where the tools changed faster than the methodology. A skilled investigator in 2010 had a Rolodex, a Lexis subscription, and patience. The same investigator in 2026 has dozens of automated tools and the same temptation every other knowledge worker has — to confuse coverage with insight.

We have done a fair bit of work with PI shops, fraud-investigation teams, and a couple of corporate intel groups in the last year. The pattern is consistent. AI is genuinely changing parts of the workflow. Other parts it is making worse. The investigators who are pulling ahead are the ones who are unromantic about which is which.

What AI is actually good at in OSINT

  • arrow_rightCross-source disambiguation. Same name, different person — the tedious work of merging records across registries, court databases, social profiles, and corporate filings. Models trained on entity resolution outperform a junior investigator and free up senior time.
  • arrow_rightMultilingual triage. Country-condition reports, foreign-language press, and overseas registries used to need a translator and a week. A model can summarize the relevant 3% in an afternoon. Verification still needs a human.
  • arrow_rightPattern surfacing in transaction data. When a case lands with five hundred bank statements and a deadline, a model can flag suspicious clusters faster than any analyst, and it does not get bored on page 200.
  • arrow_rightImage and document de-duplication. The same photo across forty platforms, the same scanned PDF with different filenames. A vector index closes a class of work that used to be a junior's full week.

What AI is bad at — and why it matters in a deposition

  • arrow_rightInventing source citations. Models will confidently cite a court case or a news article that does not exist. In a deposition, that is not a hallucination; it is a career risk.
  • arrow_rightDistinguishing primary from derivative sources. A model will summarize Wikipedia as if it were the original, then surface the original as a confirming citation. The investigator has to do that work.
  • arrow_rightTime-bounded fact-checking. The model does not know when a fact stopped being true. Most OSINT errors we see now are stale facts presented confidently.
  • arrow_rightAnything adversarial. A model is reading the same internet your subject is publishing to. Subjects who know they are being investigated have learned to seed it.

The stack we see working

Practically every working PI stack we have audited in the last year is built around three layers. A search and aggregation layer — the modern descendants of Maltego, Spiderfoot, and a half-dozen vertical tools. A document and entity layer — typically a private retrieval index over the firm's casework, deduplicated and embedded. And a model layer — increasingly two models, with the more capable model used only for synthesis and the cheaper model used for triage. The layer most teams skip is the audit layer that records which sources contributed to which conclusion. That is the layer that makes the work defensible later.

The chain-of-custody trap

The single biggest unforced error we see is that AI tools mangle chain of custody. A junior investigator pastes a screenshot from a public profile into a chat with a model, asks for context, and saves the model's output as part of the case file. The model's output is not a primary source. The screenshot is, but it now has no defensible capture timestamp because it lived in a chat window before it lived in the case folder. By the time anyone notices, the original profile may have changed or been deleted.

The fix is operational, not technical. Capture sources to evidence storage first, with timestamps and integrity hashes. Only then run them through any AI tooling, and tag the AI-derived analysis as analysis, not evidence. This is the discipline forensic investigators have always practiced; what is new is that the AI workflow makes it very easy to skip.

A practical adoption path

  • arrow_rightPick one repetitive workflow — entity resolution, multilingual triage, or document de-dup. Replace the manual version with a model-assisted version on internal cases first.
  • arrow_rightBuild the audit layer before you scale. Every AI-touched output should be traceable to a source, a timestamp, and a model version.
  • arrow_rightTrain staff on the failure modes. Hallucinated citations, stale facts, and adversarial seeding are the three you will encounter the most.
  • arrow_rightDecide which use cases will never touch a third-party model. Anything subject to attorney work product, anything involving a witness identity, anything regulated. Your default model and your sensitive-case model should not be the same.

AI is not going to make a bad investigator into a good one. It will make a good investigator faster, and it will make a sloppy one dangerous. The shops that are pulling ahead in 2026 are not the ones with the most tools. They are the ones with the discipline to treat AI as a research assistant and not as a witness.

Found this useful?
Share it with your team.

Have a project in mind? Let's talk.

Tell us where you are and where you want to go. We will come back with a working prototype, not a proposal deck.