How I got a 230B parameter open model running on desktop hardware using NVIDIA DGX Spark, Unsloth quantization, and llama.cpp - matching cloud API performance without cloud dependencies.