I’m an agentic AI engineer. The 19 years before are what make me good at it.
Full-time in AI agents for about ten months: MCP, multi-model orchestration, classification and eval pipelines. Before that, the better part of two decades shipping production software where mistakes cost real money: trading systems, fintech, CNN. Here’s where I think it lines up with what ESET is building.
The arc
Most of my career wasn’t AI. For about 19 years I worked as a self-taught engineer and designer, building things where reliability wasn’t optional: real-time trading UIs and data backends at Aquis Exchange, client onboarding for a fintech, a new storytelling format with CNN’s AMP team. Latency, correctness, low error rates. That was the job.
About ten months ago I went all-in on agentic AI. Not a side experiment. The actual work. And in those ten months I’ve shipped a live RAG product with real users, built a multi-stage classification and hypothesis pipeline, wired up multi-model workflows, and put together a reusable eval harness.
It came together fast, but it wasn’t magic. The thing is, those 19 years are what make the ten months work. I already knew how to ship something reliable, watch it in production, handle failure, and not trust a system just because it answered confidently. AI didn’t replace any of that. It gave me faster tools to do it.
So here’s where I’m careful: I don’t train detection models, and I’ve never worked in security. But the engineering underneath is the same: classify, ground, evaluate, keep a human in the loop, watch how it breaks. That part transfers. And I think it maps onto a few things ESET is building right now.
Where I think I could help
I went through what ESET’s been putting out: the €40M AI push, the white paper, the Secure AI Relay and the Skills Checker. Three places stood out where what I do actually connects. Not “I could learn this.” Things I already work with every day.
The agent layer, where you’re building Secure AI Relay and the Skills Checker
This is the one I’m closest to. I work in the application and agent layer every day: MCP servers, tool use, sub-agent orchestration, prompt design. Which means I’ve spent a lot of time watching how all of it breaks. Prompt injection. An agent calling a tool it shouldn’t. A skill whose instructions quietly do the wrong thing. Retrieval pulling in something it shouldn’t trust.
Your Secure AI Relay and Skills Checker point right at that surface. I get it from the builder’s side. Not as a threat researcher, but as the person who made these systems misbehave and then had to fix them. Where tool permissions blur, where a skill’s instructions can hide what it really does, where retrieval and model output have to be treated as untrusted. That’s daily ground for me.
Your AI SOC work, where alert triage looks a lot like something I’ve built
I’ll be careful here, because it’s the easiest place to overclaim. I’ve built a multi-stage pipeline that takes a messy stream of input, runs cheap classifiers over every piece to pull out structure, builds and updates hypotheses as more comes in, routes to the right kind of response, and scores the result against what we expected.
Structurally, that’s the same shape as part of alert triage: cheap signal extraction feeding a more expensive decision, with a human in the loop. ESET already tiers analysis across endpoint and cloud, so the pattern’s familiar, even if the security domain isn’t mine.
What doesn’t carry over is the hard part of a real SOC. Adversarial inputs actively trying to fool you. Telemetry that’s noisy and uneven in ways a chat transcript never is. The cost of a false positive at your scale. The analyst’s real workflow. How much damage a wrong automated action can do. And some of it needs to be deterministic, the same input giving the same answer every time. A probabilistic model won’t do that by itself. I haven’t lived that part. Where I’d actually be useful is the engineering around it, the classifying, routing, and evaluating, working next to the people who already know the threat side.
How I actually build: measure, cross-check, don’t trust one model
This one’s less a feature and more a habit. I don’t trust a single model’s answer. I’ll run a task through two or three of them (Opus, GPT, a cheap one for the easy parts), cross-check them against each other, and pick models for what they’re genuinely good at instead of the benchmark of the week. Every time I find something useful (this one’s four times faster and a tenth the cost for classification; that one’s stronger on long-document reasoning), I write it down as a short research note.
From what I’ve read, that’s close to how ESET already works: layered checks, careful integration, a human accountable for the call. And the question under all of it, who’s responsible when the model gets it wrong, is exactly what I build around.
A bit of how the work actually looks
So what does the work actually look like? Talk is cheap, especially now that AI writes such confident paragraphs. Here’s some of the real thing, kept abstract because the actual systems are under NDA.
The pipeline. The idea’s simple. Cheap, fast classifiers do the grunt work on every turn, turning messy input into structured signal. That builds up context and a set of running hypotheses, so the expensive model at the end isn’t guessing, and isn’t burning compute on work the cheap ones already did. A human stays in the loop, running the evals and tuning the prompts and context.
Flow diagram. Messy input goes to a set of cheap classifiers that run on every turn and produce structured signal. The signal builds running hypotheses and context. A routing step decides what happens next and hands the hard work to one expensive model call. The output is scored against what was expected, a human stays in the loop at the state and review points, and failures feed back in as the next tests.
How I check it. I run the pipeline against labeled examples and score the output, sometimes with another model as judge, sometimes by hand, and I track cost, latency, and quality every time. When something fails, that failure becomes the next test. It’s smaller than an enterprise eval setup, but the discipline is the same, and that’s the part I’d bring.
A small live example.
| model | accuracy | p50 latency | cost / 1k |
|---|---|---|---|
| claude-haiku-4-5 | 91.7% | 1.17 s | $0.25 |
| claude-sonnet-4-6 | 95.8% | 1.58 s | $0.67 |
| claude-opus-4-8 | 98.3% | 1.52 s | $1.52 |
Where it breaks. The failure modes are the interesting part. Hallucination when retrieval is thin or the prompt leaves room for it. Tools called with the wrong arguments. An agent stuck in a loop. I’ve hit all of these and built around them: grounding everything in sources, structured prompts, a “go fetch more” step when confidence is low, monitoring and failover once it’s in production. Same reflex as the project days before AI. Assume it’ll break, and watch for it.
Where I fit, and where I don’t
I’d rather say this plainly than let it surface later.
Where I fit: the application and agent layer. Building agentic systems, MCP and tool behavior, classification and hypothesis pipelines, multi-model orchestration, evaluation, and the production side, shipping and monitoring and failover and keeping things actually reliable.
Where I don’t: I don’t train detection models. That’s a real discipline and you’ve got people far better at it than me. I’ve never worked in security, so no threat intelligence, no malware analysis, no red-teaming. And my AI work has been at startup and small-company scale, not hundreds of millions of users.
That’s the honest map. The useful spot is the application and agent layer, right next to the systems your teams are already securing.
The short version
I’m self-taught. Started in systems and network administration, then transitioned in graphic design and photography, taught myself web development and then full-stack engineering across 11 years freelancing out of London. Along the way: CNN’s AMP team, trading UIs and real-time backends at Aquis Exchange, fintech onboarding at Direct Fidoo. Now I’m an Agentic AI Engineer at Dev4Evolution, and I still run the whole stack myself, design through to infrastructure.
This page is a small example of the delivery side. With my agentic workflows, I researched ESET, wrote it, designed it, and put it live, in a couple of days. The real technical proof is up above; this is just the craft side of how I work.
If there’s something here
If this looks useful, the way to test it is with real work. A scoped, paid piece of R&D in the agent layer: something around the Secure AI Relay, the Skills Checker, or an internal agentic system on that same surface. Work with a deliverable you’d want anyway (an eval, a prototype, a failure-mode pass on an agentic workflow your team is already building).
If it works, there’s a clear reason to keep going. If it doesn’t, you found out fast, and it cost one engagement.
I’d lean contractor for a first step, but I’m open to whatever shape fits. A short call’s enough to figure out the right scope.