Skip to main content
There are currently three ways to run the Inference Node, each one is slightly different and better suited for particular uses. Choose to install what fits you best by referencing the table below.
https://mintcdn.com/fortytwo-f43ac997/uwSYAqolVe4ncSs3/resources/icons/Fortytwo-App-tabs.svg?fit=max&auto=format&n=uwSYAqolVe4ncSs3&q=85&s=67b82eee932ec9ae5f4c9be76c96616f
Fortytwo
App
https://mintcdn.com/fortytwo-f43ac997/a-5jvSN4lM7BhyWM/resources/icons/Fortytwo-Container-tabs.svg?fit=max&auto=format&n=a-5jvSN4lM7BhyWM&q=85&s=bf8e978da53bb8ba83b35d7522923782
Fortytwo
Container
https://mintcdn.com/fortytwo-f43ac997/a-5jvSN4lM7BhyWM/resources/icons/Fortytwo-CLI-tabs.svg?fit=max&auto=format&n=a-5jvSN4lM7BhyWM&q=85&s=8f1dcfdff183049fd5bcb6a9736bf242
Fortytwo
CLI
Runner tierBeginnerDocker userConsole user
Best forPersonal deviceServer/VMPersonal device
Interaction typeGUIConsoleGuided console
OS
Nvidia GPUSupportedRequiredSupported
Apple SiliconSupportedSupported
features
Manual Mode
Auto Mode
Editable KV Cache
Multi-GPU
Split GPUs
Load Custom GGUF

Features Explained

You can maximize your node’s potential but it requires knowledge and commitment:
  • You choose which model your node runs.
  • Performance depends on your choices.
  • This mode is intended for noderunners who are familiar with language models.
Your node does all the work:
  • Models are selected automatically.
  • Performance is balanced.
  • You don’t need to know anything about language models.
Use Fortytwo App for more complex, real time model auto-management on your system. Fortytwo CLI can recommend options based on your resources but will not be as optimal in its recommendations as the Fortytwo App.
By default, in our applications we use adaptive KV Cache size, so the node can adapt to your hardware. When you launch the node it analyses your available resources and reserves the following:
  • GPU-based systems (primarily Windows, Linux) — reserves 90% of idle VRAM.
  • ARM-based systems with unified memory (primarily macOS) — reserves 80 to 85% of leftover RAM.
If KV Cache is editable, you can control the amount of resources taken by caching. Otherwise, it falls back to the default option.Read more here: ‘Performance Balancing’.
On systems with several GPUs installed, or when several GPUs are allocated to a single process, the node will utilize all of the available resources from these GPUs.For example: your system is equipped with 2 GPUs, each with 24 GB of VRAM. In this case, your node will read it as a total of 48 GB VRAM and will be able to run bigger models than a single GPU could allow.
Allows to assign a particular GPU or several GPUs from an available array to a single node.For example: if 8 GPUs are available, it is possible to run up to 8 nodes on this device.
Allows to select an externally downloaded model in GGUF format.
Otherwise, only allows loading models from the Hugging Face repository, like Strand-Rust-Coder 14B on Hugging Face .
Note that not all GGUF models are immediately supported.