Chat2Find-Instruct-v1 - Sri Lanka's First Open-Source Trilingual LLM

COLOMBO, SRI LANKA – May 31, 2026 – Chat2Find PVT LTD today announced the official open-source release of Chat2Find-Instruct-v1, a specialized 7-billion parameter instruction-tuned Large Language Model (LLM) designed from the ground up for Sri Lanka’s unique linguistic and cultural landscape.

As the flagship model of the Chat2Find ecosystem, chat2find-instruct-v1 is engineered to deliver native trilingual intelligence in Sinhala, Tamil, and English, seamlessly handling, agentic tool operations, and deep chain-of-thought (CoT) reasoning.

The weights and dataset are now officially available on Hugging Face for developers, researchers, and enterprises worldwide.

The Evolution of Trilingual Intelligence

Most modern language models struggle with non-English languages, particularly when handling South Asian scripts, grammatical nuances, and colloquial speech. In Sri Lanka, daily communication frequently transitions between Sinhala, Tamil, and English.

Chat2Find-Instruct-v1 solves this. Built on top of the robust Chat2Find-CPT (Continued Pre-Trained) Base Model, it has been fine-tuned using state-of-the-art alignment algorithms to excel at complex trilingual instructions, code-switching, and cultural context.

Key Capabilities & Features

Deep Chain-of-Thought (CoT) Reasoning
Chat2Find-Instruct-v1 reasons by default. It generates step-by-step logic inside tags before presenting its final output in tags. This makes it exceptionally skilled at solving multi-step mathematical, logical, and structured queries in Sinhala, Tamil, and English.

Native Agentic Tool & Function Calling
Equipped with robust function-calling architectures, the model natively supports tool execution. It can seamlessly decide when to invoke external APIs, query databases, or execute real-time searches (such as using web search tools) to answer user questions, acting as a reliable autonomous agent.

Fluent in Sinhala & Tamil
Unlike standard models that error out or break syntax when languages are blended, Chat2Find-Instruct-v1 understands and responds fluently to colloquial mixtures. Whether a user inputs questions in Sinhala script, Tamil script, Romanized phonetic layouts, or a mix of all three, the model responds with flawless contextual understanding.
Specialized Local Context & Nuance
Having been trained on datasets reflective of local geographical entities, government frameworks, cultural concepts, and localized e-commerce search dynamics, the model understands the cultural pulse of Sri Lankan users in a way global LLMs cannot.

Under the Hood: The Training Pipeline

The extraordinary performance of Chat2Find-Instruct-v1 is powered by a rigorous training regimen:

Foundational Knowledge Base: Continued Pre-Training (CPT) on the Chat2Find Data Corpus – a massive 1.38 GB dataset containing over 255 million words of high-quality unstructured trilingual text.
Instruction Fine-Tuning: Trained on over 279,000 conversational instruction pairs specifically designed to refine task execution, multi-turn reasoning, and trilingual safety.
Optimization: Fine-tuned using high-rank QLoRA (Rank 32, Alpha 32) on 4-bit BF16 configurations to achieve commercial-grade efficiency and rapid inference on consumer hardware.

Getting Started

Developers can deploy Chat2Find-Instruct-v1 immediately using Hugging Face’s transformers library. The model supports standard Hugging Face chat templates out of the box:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Chat2Find/chat2find-instruct-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
# Leverage the custom reasoning & tool system prompt
messages = [
    {
        "role": "system", 
        "content": "You are a helpful assistant. Today's date is 2026-05-30. The user is located in Sri Lanka. Provide your reasoning inside <reasoning> tags and the final answer inside <answer> tags."
    },
    {
        "role": "user", 
        "content": "ශ්‍රී ලංකාවේ ප්‍රධාන අපනයන බෝග මොනවාද?" # "What are the main export crops of Sri Lanka?"
    }
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

Open-Source Democratization

Chat2Find PVT LTD is committed to democratizing artificial intelligence. By releasing the weights of both our foundational Chat2Find-CPT model and the Chat2Find-Instruct-v1 model, alongside our training datasets, we hope to empower Sri Lankan developers, startups, and academic institutions to build the next generation of intelligent systems.

Explore the models and datasets today on Hugging Face.

Chat2Find-Instruct-v1 – Sri Lanka’s First Open-Source Trilingual LLM

The Evolution of Trilingual Intelligence

Key Capabilities & Features

Under the Hood: The Training Pipeline

Getting Started

Open-Source Democratization

Chat2Find