I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · 19 minutes ago

You’re welcome. Hope it makes sense. If not, you can marvel at the (many, many) nestled swears in my -commit messages.

SuspciousCarrot78@lemmy.world · edit-2 24 minutes ago

Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM?

No. The footer tells you what the source is. Anything the model generates on its own is confidence: unverified | source: model - explicitly flagged by default. To get to source: docs or source: scratchpad, it needs direct, traceable, human-originated provenance. You control what goes in. The FAQ outlines the sources and strength rankings; it’s not vibes.

Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source.

SHA verifies the document hasn’t been altered since it entered your stack. Source quality is your call. GIGO is always an issue, but if you scope the source correctly it won’t drift. And if it does, you’ll know, because the footer tells you exactly where the answer came from.

The cheatsheet system is the clearest example of how this works in practice: you define terms once in a JSONL file, the model pegs its reasoning to your definition forever. It can’t revert to something you didn’t teach it. That fingerprint is over everything.

… the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source.

Yes, deliberately. That’s a feature.

Like I said, most LLM tools are trying to replace your thinking, this one isn’t. The human stays in the loop. The model’s limitations are visible. You decide what to trust. Maybe that’s enough, maybe it isn’t.

EDIT: giant wall of text. See - https://codeberg.org/BobbyLLM/llama-conductor#some-problems-this-solves

SuspciousCarrot78@lemmy.world · 2 hours ago

“I have introduced myself. You have introduced yourself. This is a very good conversation.”

Confidence: Zero | Source: Model

SuspciousCarrot78@lemmy.world · 2 hours ago

Well, you know what they say - there’s no force quite like brute force :)

But to reply in specific:

[1] Decision tree + regex: correct, and intentional. The transparency is a feature not a bug. You can read the routing logic, audit it, and know exactly why a given turn went where it did. A fine-tuned routing model reintroduces the black box problem at the routing layer itself - and if it misclassifies, what catches it? You’ve pushed the problem one layer up, not solved it.

[2] Deterministic-first doesn’t mean deterministic-only. Open-ended turns go to the model by design - I’m not trying to regex all language, just not use an LLM where a calculator or a SHA check works better. The model is still involved. Case in point - see the car wash test.

[3] On edge cases - yep, and that’s what 8,764 benchmark runs were for. Failures are taxonomized and patchable at the routing layer without touching the model. If a rule fails, I can show the exact failure and patch it. Yeah, that’s going to be wack-a-mole for a while but…if a routing model fails, I’d need new training data and still may not know why. Models are inherently black box. Python code (as your robots have shown you) is the opposite.

My way, I know where the fuck up is and I can figure out a global maxima solution myself, cheap and easy.

[4] On the fine-tune suggestion: on a 4GB potato, rule updates are free and immediate. Retraining cycles are…not. Send money, will will buy Strix or cloud GPU access :)

[5] The hybrid direction is already on the roadmap! TLDR: Swarm handles ambiguous routing; deterministic lanes stay for bounded and high-stakes tasks. Hybrid control + learned judgment, with measurable gates before each promotion. That sequencing is deliberate.

Slightly longer version of what that should look like:

User turn

→ Classifier (labels intent)

→ Contradiction detector (user turn + last N turns)

→ Refusal/risk assessor (user turn + classifier label)

→ State tracker (full session summary from memory)

→ Synthesiser (user turn + all worker outputs as FACTS block)

→ Critic (hunts violations in synthesiser output)

→ Output or retry

Each worker does one job with full attention on that task. The synthesiser gets verified ground truth, not raw conversation history.

I’m looking at sub-1B classifiers (SmolLM2-135M/360M, Gemma-3-270M maybe) for fast-gate roles. Phi-4-mini is already a part of the main show (used in as the second pass critic model). Unlike everyone else, I am optimizing for we in the GPU peasant class.

That’s the fine-tune direction, anyway. Scoped, staged and only where the deterministic layer can’t do the job cleanly.

PS: Say hello to the klankers and tell them to stop cloning my shit for their training. :)

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

Hmm. The post has swearing, a personal ASD disclosure, a Feynman quote, statistics, reference to Lawrence of Arabia and ends with “a meat popsicle wrote this,” with a link to a blog as proof and a scientific pre-print with almost 10,000 data points (with raw data and errata). If you have an LLM that can do that, kudos to you.

If there are specific passages that pattern-match to LLM output for you, point them and I’ll look.

But “confident tone” and “LLM tone” aren’t the same thing - I’m just not apologetic about what the project does.

The data is the data.

I’m not going to alter the way I write to approximate Reddit Common.

SuspciousCarrot78@lemmy.world · edit-2 3 hours ago

Sure.

It summarise short articles, translate between languages (LLM dependent), provides sentiment analysis, solves multi-step volume/overflow problems, detects positional bias in pairwise rankings, validates output behaviour across 8,764 benchmark runs designed to break things - premise reversals, theory-of-mind separation, evidence label discipline, retraction handling, contradiction adjudication, and hard refusal-floor checks where the only correct answer is “I don’t know” - manages deterministic memory without touching the model, adapts to tone and register, stores and recall facts exactly, folds information you provide naturally into answers (with correct attribution provenance), pits two different model families against each other to catch hallucinations before the answer reaches you, OCRs, provides real-time currency and weather lookup, looks up Wikipedia and word etymology deterministically, reasons across multiple source documents simultaneously to find contradictions, verifies source provenance via SHA checksums, stops the model being a sycophant, condenses clinical note-taking, creates management plans, and tells you when it doesn’t know the answer instead of making something up.

But yes, it summaries short articles.

On a 4GB VRAM potato, no less.

SuspciousCarrot78@lemmy.world · 3 hours ago

Well, this was a social media post, aimed at an intelligent, non-scholarly audience. The preprint is a different document with a different structure entirely: bounded claims, explicit limitations, disclosed adjudication gaps, no words like “novel” or “revolutionary” anywhere in it. Not my first rodeo :)

If the preprint has specific passages that read as editorialized, point them and I’ll fix them. But “tone it down for journals” is feedback for a document that isn’t trying to be submitted to journals.

The draft is here

SuspciousCarrot78@lemmy.world · edit-2 2 hours ago

Much obliged, but I need to do a little push back here. “Prompt wrapper” isn’t quite right - a prompt wrapper is still asking the model to behave nicely.

This isn’t that. This is more like holding a gun to its head.

Or less floridly (and more boringly technical), what the architecture actually does is force a ground state. The lane contracts define the admissible output space per task type. For negative-control tasks - prompts with deliberately insufficient evidence - the only contract-compliant output is an explicit refusal.

Fabrication gets rejected by the harness. The model isn’t instructed to say “I don’t know”; it’s placed in a state where “I don’t know” is the only output that clears validation.

The draft shows this directly: post-policy missing-lane closures hit 0/332 flags across contradiction and negative_control lanes combined. Pre-policy, the dominant failure mode in those lanes wasn’t confabulation - it was refusal-like phrasing that didn’t meet strict contract tokenization. The model was already trying to refuse; the contract hardening just closed the gap between intent and valid output shape.

The >>judge dual-ordering is a separate thing again - that’s algorithmic, not prompting. Both orderings run in code, verdicts are parsed strictly (A|B|TIE, fails loud otherwise), agreement margin is computed. The model doesn’t know it’s being run twice. Positional bias gets caught structurally, not by asking nicely.

So yes - it solves a lot but not everything. The bounded claims are in the paper too. But the mechanism isn’t wrapping, it’s constraint enforcement at the routing layer.

PS: yes, it’s fully open source. AGPL-3.0 license. You can use it, fork it, modify it etc. What you can’t do is take it, close the source, and distribute or sell it without making your modifications available under the same license. Which means if you run it as a network service (i.e. a SaaS product built on it), you still have to share the source. That’s the bit that keeps corporations from quietly wrapping it in a product and giving nothing back. Theoretically, at least.

SuspciousCarrot78@lemmy.world · 4 hours ago

Yeah, I did stop it bullshitting. Quite literally.

Also, “bullshitting” isn’t a rhetorical flourish; it’s a defined term in AI ethics literature. The model produces fluent, confident output without any mechanism to assess truth. That’s domain accepted definition of bullshit. No bullshit. See -

https://link.springer.com/article/10.1007/s10676-024-09775-5

SuspciousCarrot78@lemmy.world · 4 hours ago

Me too! I built it to be used, so if people use it, that’s my win.

SuspciousCarrot78@lemmy.world · 4 hours ago

Getting shit published - especially as an outsider to the field - involves getting raked over coals. If someone in the field can vouch for me on arXiv (later) that might help because that’s at least a low level signal what I have is interesting and within the field.

Writing journal articles, especially contentious ones, is usually 6-8 weeks of writing and then 6 months of back and forth with reviewers / trying really hard not to hang yourself from the ceiling fan.

SuspciousCarrot78@lemmy.world · 5 hours ago

TL;DR:

The post has a section called “So, wait…are you saying you solved LLM hallucinations?” followed by the word “No.” in large letters.

You’d have found it if you’d read past the title. I’ll go back and bold it for you.

But if you have a hook up at NVIDIA that wants to buy me a shiny new car, I’ll put on a pretty dress and bat my eyelashes.

SuspciousCarrot78@lemmy.world · edit-2 11 hours ago

That’s exactly what I did. And in the course of doing that, I gathered almost 10,000 data points to prove it, showed my work and open sourced it.

You don’t need to be a dev to understand what this does, which is kind of the point. I don’t consider myself a dev - I’m was just unusually pissed off at ShitGPT, but instead of complaining about, did something.

Down-vote: dunno. Knee jerk reaction to anything AI? It’s a known thing. Ironically, the thing I built is exactly against AI slop shit.

To say I dislike ChatGPT would be to undersell it.

SuspciousCarrot78@lemmy.world · edit-2 12 hours ago

I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · 3 days ago

I love the conspiracy theory that it’s Elgin Air Force Base, aka Area 51 version 2 .

SuspciousCarrot78@lemmy.world · 3 days ago

RedReader still works (for now) - and it’s magnificent.

https://f-droid.org/packages/org.quantumbadger.redreader/

SuspciousCarrot78@lemmy.world · 3 days ago

I (mildly) am concerned about that also…but bear in mind…the difference between Lemmy and Reddit is you can…defederate…from known bad instances. If Lemmy goes in that direction - and we undertake the idea I mentioned here - https://lemmy.world/post/44633911/22828600

then we can basically recreate a blacklist / whitelist (ala AdBlock). Instance-wide crawlers can still scrape public data, but that’s an ActivityPub protocol constraint, not a Lemmy failure.

Instance crawling with bots? Sorry, no soup for you.

Spam bots on bad instances? Blocked from your feed.

Peak “fine, I’ll do it myself” energy? Yes. But if you’re reading this, you’re 1) part of the resistance (lol) and (2) already here, so …

SuspciousCarrot78@lemmy.world · 3 days ago

Yes, I believe so. Time will tell, but the architecture is baked in.

SuspciousCarrot78@lemmy.world · 4 days ago

That’s kind of the point.

You can selectively federate with instances you trust, rather than opening the floodgates to the entire fediverse all at once. Start small, allowlist specific instances, and expand from there.

You get the social connectivity without immediately inheriting everyone else’s bot problem.

SuspciousCarrot78@lemmy.world · edit-2 4 days ago

You know you can host your own instance, right? With total population n=1 (just you)? Federating with a micro instance might be difficult but from what ive read, it should be possible - you just need an old laptop to act as your always on server and some know-how.

SuspciousCarrot78@lemmy.world · 4 days ago

Be careful what you phish for.

SuspciousCarrot78@lemmy.world · edit-2 12 days ago

Goodbye Google - I self-host everything now on 4 tiny PCs in a 3D printed rack (CaptainRedsLab)

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

Its not much but it's something

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

I'm tired of LLM bullshitting. So I fixed it.