added
[CAPSULE:0.1.13] ⎯ 2025-03-20
about 2 months ago
Feature update with a new embeddings model, flash attention and improved ranking capabilities.
Added
- New embeddings model.
- Flash attention for faster processing.
max_tokens
support for ranking requests.
Changed
- Expanded context window for embeddings.
- Improved ranking logic.
Fixed
- LLM context window bug.
- General resource optimization.