🪨 What is Pebbles?

Pebbles is an on‑premise AI engine with a full RAG (Retrieval-Augmented Generation) framework that allows enterprises to run large language models locally, integrate securely with internal data, and deploy secure, compliant AI at scale.

With Pebbles, you can:

🔍 Transform documents into answers — query and summarize hundreds of thousands of pages instantly using built-in RAG pipelines
📂 Manage your knowledge — index and organize PDFs, contracts, emails, and databases into a single, queryable workspace
🤖 Chat with your own AI — connect to the open source LLMs you like (LLaMA, Falcon, Mistral, etc.)
🔒 Stay compliant and private — all inference, retrieval, and embeddings are performed on‑premise; no data ever leaves your environment

What It Looks Like

Chat, Search, Discover

https://widgets.commoninja.com/iframe/7da45613-509f-4d57-a4b6-f7ae8bd02337

How Pebbles Runs Inference

Pebbles is our proprietary local inference engine. It’s built to run large language models efficiently on edge hardware.

It uses a deterministic, tightly optimized inference path for transformer layers:

Speculative decoding: predicts multiple token paths in parallel, committing the best one.
Layer-wise quantization: chooses precision per layer, letting bigger models run on smaller machines.
Cache-aware scheduling: organizes memory access to avoid GPU/CPU stalls.