I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · edit-2 18 hours ago

I made my LLM stop bullshitting. Nothing leaves your machine.

utopiah@lemmy.ml · 8 hours ago

Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM? Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source. I don’t think adding confidence and sourcing changes anything, the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source. Thanks for making the process public though, doing better than OpenAI does.

SuspciousCarrot78@lemmy.world · edit-2 7 hours ago

Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM?

No. The footer tells you what the source is. Anything the model generates on its own is confidence: unverified | source: model - explicitly flagged by default. To get to source: docs or source: scratchpad, it needs direct, traceable, human-originated provenance. You control what goes in. The FAQ outlines the sources and strength rankings; it’s not vibes.

Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source.

SHA verifies the document hasn’t been altered since it entered your stack. Source quality is your call. GIGO is always an issue, but if you scope the source correctly it won’t drift. And if it does, you’ll know, because the footer tells you exactly where the answer came from.

The cheatsheet system is the clearest example of how this works in practice: you define terms once in a JSONL file, the model pegs its reasoning to your definition forever. It can’t revert to something you didn’t teach it. That fingerprint is over everything.

… the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source.

Yes, deliberately. That’s a feature.

Like I said, most LLM tools are trying to replace your thinking, this one isn’t. The human stays in the loop. The model’s limitations are visible. You decide what to trust. Maybe that’s enough, maybe it isn’t.

EDIT: giant wall of text. See - https://codeberg.org/BobbyLLM/llama-conductor#some-problems-this-solves

utopiah@lemmy.ml · 6 hours ago

Isn’t it “source: model” basically roulette? We go back to the initial problem. Also anything else that is not model might also be hallucinated if at any point the string that gives back “source:” goes through the model.

SuspciousCarrot78@lemmy.world · 6 hours ago

Nope.

Source: Model is not pretending otherwise
It is basically “priors lane.” That’s the point of the label: explicit uncertainty, not fake certainty.
Source footer is harness-generated, not model-authored
In this stack, footer normalization happens post-generation in Python. I’ve specifically hardened this because of earlier bleed cases. So the model does not get to self-award Wiki/Docs/Cheatsheets etc.
Model lane is controlled, not roulette

deterministic-first routing where applicable
fail-loud behavior in grounded lanes
provenance downgrade when grounding didn’t actually occur

So yes: Source: Model means “less trustworthy, verify me.” Always do that. Don’t trust the stochastic parrot.

But also no: it’s not equivalent to a silent hallucination system pretending to be grounded. That’s exactly what the provenance layer is there to prevent.

JustinTheGM@ttrpg.network · 7 hours ago

Fair, but that’s the same problem human thinkers face. Faulty inputs == faulty outputs. You should always be validating your sources.

utopiah@lemmy.ml · 6 hours ago

Right but if one person keeps on giving me wrong answers, knowingly or not, my distrust in them in not linear. They’ll have to “earn” it back and it’s going to be very challenging. If they do learn though, then it might come back faster. In this setup I have no guarantee of any progress. There no “one” in there trying to fix any mistake.

SuspciousCarrot78@lemmy.world · edit-2 5 hours ago

You’re describing trust dynamics correctly and that’s exactly why this project doesn’t ask you to trust the model. It asks you to trust observable outputs: provenance labels, deterministic lanes, fail-loud behaviour.

When it fails, you can see exactly which layer failed and why. Then you can fix it yourself. That’s more than you get right now (and in part why LLMs are considered toxic).

The correction mechanism is explicit rather than hoped for (“it learns”): encode the fix via cheatsheets, memory, or lane contracts and it sticks permanently.

The model can’t drift back to the wrong answer. That’s not the model earning trust back - it’s you patching the ground truth it reasons from. Progress is measured in artifacts, not vibes.

Until someone makes better AI, that’s all we’ve got. Generally, we don’t get even this much.

Sadly, AI isn’t “one mind learning”; it can’t. So trust is earned by shrinking failure classes and proving it stuck again and again and again (aka making sure the tool does what it should be doing). Whether that’s satisfying in the way a person earning trust back is satisfying - honestly, probably not. But it’s more auditable.

LLMs aren’t people and I’m ok with meeting them where they are.