Skip to main content

Intro

By default, Fortytwo Node allocates resources automatically. It tries to use as much available memory as possible in order to deliver the fastest inference speed. This means that when the node starts, the system may allocate most of your VRAM and RAM depending on the model you selected. You can limit how much resources the node is allowed to utilize with a custom KV Cache configuration. This is handy if you plan to use your computer for other tasks in parallel, like gaming, video rendering, graphic design and other resource-intensive applications. A detailed explanation of what KV Cache is and how it works can be found here: Selecting a Model for Your Node.

What Is KV Cache?

KV Cache is a reserved memory area used to store the model’s internal key–value tensors during inference.
These tensors allow the model to avoid recalculating previous tokens, which significantly speeds up generation and enables longer context windows.
Because KV Cache grows as the conversation (or prompt) becomes longer, it directly affects:
  • Maximum context length — larger KV Cache allows the model to store more tokens.
  • Inference speed — cached states reduce repeated computation.
  • Total memory consumption — the node must hold both the model and its KV Cache at the same time.
  • System stability — if KV Cache grows too large, other applications may experience slowdowns.
KV Cache requires additional memory on top of the model itself.The node needs enough available RAM/VRAM to keep both parts simultaneously.

How do custom KV Cache limits work

When you specify a value in GB or tokens, this number represents the maximum KV Cache size the node is allowed to use. It is not a total memory cap, it is the upper boundary designated specifically for the KV Cache. The model will always load using:
memory_used = model_size + kv_cache_size
If your system has enough available memory, the node will use the full value you provided.
For example:
  • available system memory: 20 GB
  • model size: 5 GB
  • custom KV Cache limit: 12 GB
The node will allocate 5 + 12 = 17 GB in total, because enough memory is available. However, if your system does not have enough memory to satisfy this maximum KV Cache value, the node will automatically reduce the KV Cache size so that the model can still start safely. This ensures two things:
  1. The node never exceeds physical memory limits.
  2. KV Cache always increases to the selected size whenever possible, giving you the largest context window your hardware can support.

Simple Rules

To ensure smooth noderunning:
  1. Determine how much memory you can afford to allocate to your node.
  2. Define a custom KV Cache value, e.g. 2 GB.
  3. Select a model that needs 2 GB (KV Cache) less than the total size of the memory you allocate to your node.
  4. Start your node.
  5. Start other applications. Your node should not take more resources than allocated to it.

How To Setup Custom KV Cache

Available modes

These modes control how aggressively the node will allocate memory for the model + KV Cache.
  • [0] Auto The node automatically determines the amount of memory to allocate based on your system and selected model.
  • [1] Min
    Limits the node to 33% of your available memory.
  • [2] Medium
    Allows the node to use 66% of your available memory.
  • [3] Max
    Gives the node full access to all available system memory.
  • [4] Custom
    Lets you manually define the exact amount of memory your node is allowed to consume — including a fully custom KV Cache size.
This mode is intended for advanced users who want precise control over:
  • model memory budget,
  • KV Cache size,
  • total resource usage,
  • balance between TPS and context window,
  • system performance while multitasking.
Custom mode is ideal if you:
  • want to dedicate a very specific memory amount (e.g. 2 GB KV Cache),
  • need the node to run alongside heavy applications (games, rendering, creative tools),
  • understand your hardware limits and want a fine-tuned setup.
  • https://mintlify.s3.us-west-1.amazonaws.com/fortytwo-f43ac997/resources/icons/Fortytwo-CLI-b.svg Fortytwo CLI
  • https://mintlify.s3.us-west-1.amazonaws.com/fortytwo-f43ac997/resources/icons/Fortytwo-App-a.svg Fortytwo App
1
Open the Terminal and “Run” your CLI script.
2
Type 0 on your keyboard to go to Settings.
3
Type 1 on your keyboard to go to KV Cache management.
4
Choose preferred mode (auto|min|medium|max) or select Custom Size in (tokens|GB).
5
Restart your node. Monitor system resources to see if any further adjustment is needed.