<aside>
🪨 What is Pebbles?
Pebbles is an on‑premise AI engine with a full RAG (Retrieval-Augmented Generation) framework that allows enterprises to run large language models locally, integrate securely with internal data, and deploy secure, compliant AI at scale.
With Pebbles, you can:
- 🔍 Transform documents into answers — query and summarize hundreds of thousands of pages instantly using built-in RAG pipelines
- 📂 Manage your knowledge — index and organize PDFs, contracts, emails, and databases into a single, queryable workspace
- 🤖 Chat with your own AI — connect to the open source LLMs you like (LLaMA, Falcon, Mistral, etc.)
- 🔒 Stay compliant and private — all inference, retrieval, and embeddings are performed on‑premise; no data ever leaves your environment
What It Looks Like
Chat, Search, Discover
https://widgets.commoninja.com/iframe/7da45613-509f-4d57-a4b6-f7ae8bd02337
How Pebbles Runs Inference
Pebbles is our proprietary local inference engine. It’s built to run large language models efficiently on edge hardware.
It uses a deterministic, tightly optimized inference path for transformer layers:
- Speculative decoding: predicts multiple token paths in parallel, committing the best one.
- Layer-wise quantization: chooses precision per layer, letting bigger models run on smaller machines.
- Cache-aware scheduling: organizes memory access to avoid GPU/CPU stalls.