# CoreNode-AI-Investor 12B — laptop quick-start (Windows, **8 GB VRAM** NVIDIA)

> **2026-06-17 update — the laptop GPU is 8 GB, not 12 GB.** Use the new
> **IQ4_XS** quant. The Q5_K_M / Q4_K_M files in this bundle are sized for a 12 GB
> card and will **not** full-offload into 8 GB (see "Which file?" below). They're
> kept only as reference / fallback for a bigger machine.

This bundle (copy the whole `laptop-transfer` folder to the laptop via USB):
- `CoreNode\CoreNode-AI-Investor-12B-GGUF\` — **`...-IQ4_XS.gguf` (6.2 GB) = the 8 GB daily driver**, plus Q4_K_M / Q5_K_M (12 GB-class, reference only). **For LM Studio (text + tools).**
- `mmproj-CoreNode-AI-Investor-12B-f16.gguf` (117 MB) — vision+audio projector. **For full multimodal via llama.cpp (Part B).**

**Version: 0.5× re-merge + v2 persona (2026-06-06), IQ4_XS quant (2026-06-17).** These
files answer the literal question — ask "what do you see?" of an image and it describes
the image plainly; it keeps full investor expertise (SEIS, valuation, etc.) when you ask
investor questions, and answers general questions directly instead of pivoting to fundraising.

The investor persona is **baked into the model** — you do NOT need to set a system prompt in any tool.
It runs on **CUDA** on this laptop (not ROCm/Vulkan like the Strix Halo).
**Images must be PNG or JPEG** — llama.cpp cannot decode WEBP/AVIF/HEIC (you'll get
"Failed to load image or audio file"). Convert webp → png first.

---

## Which file? (8 GB VRAM)

| File | Weights | Loaded @ 8K ctx (q8_0 KV) | Fits 8 GB? |
|------|---------|---------------------------|------------|
| **IQ4_XS** ⭐ | 6.2 GB | **~7.0 GB** (6312 weights + 323 KV + 519 compute MiB, measured) | ✅ ~1 GB headroom |
| Q4_K_M | 6.9 GB | ~7.7 GB | ⚠️ Too tight — OOM once the display takes its share |
| Q5_K_M | 8.0 GB | ~8.8 GB | ❌ No |

IQ4_XS is the dense 12B (the **best** CoreNode model, eval_loss 1.010) at 4.25 bits/weight.
Gemma 4's sliding-window attention keeps the KV cache tiny (~0.3 GB at 8K), which is what
lets the full model fit 8 GB.

---

## Part A — Text + tools in LM Studio (easy, recommended daily driver)

1. Install **LM Studio** for Windows (lmstudio.ai). It auto-detects the NVIDIA GPU (CUDA).
2. Find LM Studio's model folder: **App Settings → Model Directory** (default `C:\Users\<you>\.lmstudio\models`).
3. Copy the **`CoreNode`** folder from this bundle into that model directory, so you end up with:
   `...\.lmstudio\models\CoreNode\CoreNode-AI-Investor-12B-GGUF\CoreNode-AI-Investor-12B-IQ4_XS.gguf`
   (LM Studio needs the `publisher\model\file.gguf` structure — this bundle already has it.)
4. In LM Studio, select **CoreNode-AI-Investor-12B (IQ4_XS)** and load it with:
   - **GPU Offload = max** (all 48 layers)
   - **Flash Attention = ON**
   - **KV Cache Quantization = Q8_0** (recommended on 8 GB — halves the KV cache)
   - **Context = 8192** (drop to 4096 if you ever see an OOM)
5. **Turn OFF the "Code Interpreter" and "Chat with Files" tool chips** in the chat bar.
   With them on, the model emits a `run_javascript` call and you get
   `400 ... 'javascript' required`. (That's an LM Studio tool toggle, not a model fault.)
6. Just chat — persona is baked in, no system prompt needed.

If you ever move to a 12 GB+ machine, Q5_K_M is the higher-quality choice there.

---

## Part B — Full multimodal (vision + audio) via llama.cpp (CUDA)

LM Studio's text path above does NOT do images. For vision/audio you run llama.cpp's
multimodal server with the `mmproj`. (The Strix Halo build won't work here — different GPU
backend; you need a Windows **CUDA** build.)

1. Make sure you have a recent **NVIDIA driver** installed.
2. Download a **current** llama.cpp Windows CUDA build:
   https://github.com/ggml-org/llama.cpp/releases → `llama-b####-bin-win-cuda-x64.zip`
   **Get a recent one** — the Gemma 4 vision projector (`gemma4uv`) is new; if a build errors
   with `unknown projector type: gemma4uv`, grab a newer release (or build from source).
   The cuda zip bundles the needed CUDA DLLs.
3. Extract it (e.g. `C:\llamacpp\`). Put `CoreNode-AI-Investor-12B-IQ4_XS.gguf` and
   `mmproj-CoreNode-AI-Investor-12B-f16.gguf` somewhere handy (e.g. `C:\models\`).
4. Open **PowerShell** in the llama.cpp folder and run (one line):
   ```
   .\llama-server.exe --jinja -ngl 99 -fa on -c 8192 --cache-type-k q8_0 --cache-type-v q8_0 -m C:\models\CoreNode-AI-Investor-12B-IQ4_XS.gguf --mmproj C:\models\mmproj-CoreNode-AI-Investor-12B-f16.gguf --host 127.0.0.1 --port 8080
   ```
5. Open **http://127.0.0.1:8080** in a browser (llama.cpp's built-in chat UI — supports image
   upload), or point OpenWebUI / any OpenAI-compatible client at `http://127.0.0.1:8080/v1`.

### VRAM notes (8 GB)
- `-ngl 99` offloads all layers to the GPU. **IQ4_XS (6.2 GB) + mmproj (0.12 GB) + a q8_0 KV
  cache fits ~8K context on 8 GB** (~7.0 GB measured for the text model; vision adds the small
  mmproj + a transient image-encode buffer). If you get an out-of-memory error: lower context
  (`-c 4096`), keep `--cache-type-k/v q8_0`, and close other GPU apps (browsers eat VRAM).
- Quick CLI vision test instead of the server:
  ```
  .\llama-mtmd-cli.exe --jinja -ngl 99 -m C:\models\CoreNode-AI-Investor-12B-IQ4_XS.gguf --mmproj C:\models\mmproj-CoreNode-AI-Investor-12B-f16.gguf --image C:\path\to\image.png -p "Describe this image."
  ```

---

## What this model is
Dense Gemma 4 12B fine-tuned as an investor/fundraising advisor (CoreNode-AI-Investor).
Text + tool calling + vision + audio. eval_loss 1.010 / token-acc 0.749 (the best CoreNode model).
Source/recipe + the full deployment history: the `local-model-fine-tuning` repo and the Strix
Halo cheat sheet on the main machine. It leans investor-advisory on any input by design, and (like
any fine-tune) can have occasional factual slips — treat specifics as draft, verify the numbers.
