# HugstonOne                                                             2026-03-30

**HugstonOne Enterprise Edition **  
**Local-Privacy-first AI workbench with CLI, server, RAG, agent, terminal, preview and privacy controls**  


## What this build is

HugstonOne is an Electron desktop workbench which backend **llama.cpp** local runtimes. It now supports a practical dual-runtime workflow:

- **Load CLI** for local, privacy-first interactive work.
- **Load Server** for a local API compatible server flow.
- **Memory by session and tab** so the same conversation can continue after a stop.
- **RAG mode** for local document/code/json retrieval.
- **Agent mode** for CLI continuity on long tasks.
- **Online Search mode** for opt-in live retrieval from user-defined sources only.
- **Editable terminal command bar** for runtime control without clicking the UI.

## Honest current status

This repository is in a much better place than before, but is work in progress

### In beta

- CLI soft-stop first, hard-kill fallback on Windows.
- Server memory persistence in the same session + tab.
- Faster CLI follow-up turns because the whole transcript is no longer blindly replayed each time.
- Raw extra flags and raw disabled flags for both CLI and server.
- RAG file/folder indexing.
- CLI-only Agent mode.
- Opt-in Online Search with user-defined sources.
- Local API bridge at `http://localhost:8000/v1` by default.

### Still partial or unfinished

- **MCP is not finished** in this build.
- **RAR and 7z are not fully extracted** yet.
- Several UI controls still exist visually but are **not fully wired** yet: the Decoding & Sampling panel, prompt-template save box, llama-server CORS checkbox, and embeddings checkbox.
- The app needs a new user password for first time users.

## Quick start

1. Click **Pick model** and select your local model file.
2. Optionally click **MMProj** if your multimodal model needs a projector.
3. Click **Load CLI** for local chat or **Load Server** for local server mode.
4. Keep **Memory: on** if you want stop/resume continuity in the same tab.
5. Turn on **RAG mode** before adding files/folders if you want the app to retrieve from a local knowledge set.
6. Turn on **Agent mode** only when you want CLI continuity for long coding/research tasks.
7. Turn on **Online Search** only when you intentionally want live web context.
8. Use **Offline mode** when you want to block external network traffic and stay loopback-only.

## Runtime modes

### CLI

Use **Load CLI** when you want the most private and self-contained workflow. This is the best mode for:

- coding sessions
- agent continuity
- terminal-driven use
- strict local-only work

### Server

Use **Load Server** when you want a local HTTP runtime and easier tool integration. This is the best mode for:

- local API use
- multimodal server requests
- keeping a server profile loaded for repeated calls

### Local API

The built-in Local API exposes:

- `GET /health`
- `POST /v1/chat/completions`

It routes to **CLI first** if CLI is loaded, otherwise to **llama-server** if the server is ready.

## Supported files

| Type | Current behavior |
|---|---|
| Images | Best multimodal path. Previewed inline and sent to the runtime. |
| Audio / video | Accepted, previewed and stored, but no offline transcription pipeline is included yet. |
| Text / code / HTML / JSON / CSV | Previewed as text, sent to the model, and ideal for RAG. |
| PDF | Text extracted with local pdf.js if the PDF already has a text layer. |
| DOCX / XLSX / PPTX | Text-oriented extraction through OOXML / JSZip. |
| ZIP | Manifest + text-like content extraction when possible. |
| RAR / 7z | Accepted, but currently only stub-noted unless you add your own local unpacker. |
| Folders | Indexed recursively through **Add folder** for RAG. |

## Advanced controls 

### Custom llama.cpp flags and new flags on top of them or custom made.

The advanced workbench lets you:

- add **CLI extra flags**
- remove default **CLI flags**
- add **Server extra flags**
- remove default **Server flags**

Example flags that fit this app:

- CLI: `--flash-attn --n-gpu-layers -1 --ctx-size 131072 --batch-size 1024 --cache-type-k q8_0 --cache-type-v q8_0 --fit on`
- Server: `--host 127.0.0.1 --port 8080 --parallel 4 --cont-batching --threads-http 8 --no-mmap`

The app passes these through raw, so bad flags fail exactly like native llama.cpp.

### Terminal commands

The terminal input accepts commands such as:

- `help`
- `status`
- `load-cli`
- `unload-cli`
- `load-server`
- `stop-server`
- `send <message>`
- `model <name>`
- `flags-cli <...>`
- `flags-server <...>`
- `disable-cli <...>`
- `disable-server <...>`
- `rag on` / `rag off`
- `rag-files`
- `rag-folder`
- `agent on` / `agent off`
- `coding on` / `coding off`
- `online on` / `online off`
- `clear`

## Privacy and offline behavior

- Offline mode blocks outbound HTTP(S) except loopback.
- Online Search is opt-in only.
- The app is Local-first, Privacy-first.


## Recommended directory expectations

This patched build expects a local structure along these lines when packaged or arranged for development:

- `app/src/` for renderer assets and optional local libraries
- `app/runtimes/gpu/` for GPU llama.cpp binaries
- `app/runtimes/cpu/` for CPU llama.cpp binaries
- `app/models/` or any folder you pick for local models

The code switches to `llama-mtmd-cli.exe` automatically when a multimodal projector is in use, otherwise it uses `llama-cli.exe`. The server runtime uses `llama-server.exe`.



This build is now good enough for serious local work, especially for coding, document-assisted chat, and privacy-first workflows. 