Brave Browser Integrates RTX-Enhanced Local LLMs with Leo AI

Understanding AI Integration in Everyday Applications

From gaming and content creation to software development and productivity tools, AI is steadily enhancing user experiences and improving efficiency. A prime example of this integration is apparent in the field of web browsing. Brave, a privacy-focused web browser, has recently introduced an intelligent AI assistant named Leo AI. This assistant not only provides search results but also aids users in summarizing articles and videos, extracting insights from documents, answering questions, and much more.

The technology powering Brave and other AI-driven tools is a sophisticated blend of hardware, libraries, and ecosystem software optimized for the unique demands of artificial intelligence.

Why Software Is Crucial

NVIDIA’s GPUs are at the heart of AI operations globally, whether they are running in data centers or on local PCs. These GPUs are equipped with Tensor Cores, which are specifically engineered to accelerate AI applications like Leo AI through massively parallel number crunching. This means that they can handle a vast number of calculations simultaneously, instead of processing them one at a time.

However, superior hardware is only part of the equation. For applications to fully harness the power of this hardware, the software running on top of the GPUs is equally important. This software is critical for delivering the fastest and most responsive AI experience possible.

The first component is the AI inference library. This library acts as a translator, taking requests for common AI tasks and converting them into specific instructions for the hardware to execute. Some popular inference libraries include NVIDIA TensorRT, Microsoft’s DirectML, and the library used by Brave’s Leo AI via Ollama, known as llama.cpp.

Llama.cpp is an open-source library and framework. Through CUDA, the NVIDIA software application programming interface (API) enables developers to optimize for GeForce RTX and NVIDIA RTX GPUs, providing Tensor Core acceleration for hundreds of models, including well-known large language models (LLMs) like Gemma, Llama 3, Mistral, and Phi.

On top of the inference library, applications often employ a local inference server to simplify integration. This server handles tasks such as downloading and configuring specific AI models, so the application itself doesn’t need to manage these complexities.

Ollama is an open-source project built on top of llama.cpp, providing access to the library’s features. It supports an ecosystem of applications that deliver local AI capabilities. Across the entire technology stack, NVIDIA works to optimize tools like Ollama for its hardware to deliver faster and more responsive AI experiences on RTX.

NVIDIA’s commitment to optimization spans the entire technology stack, from hardware to system software to the inference libraries and tools that enable applications to deliver faster and more responsive AI experiences on RTX.

Local vs. Cloud Processing

Brave’s Leo AI can operate either in the cloud or locally on a PC through Ollama. There are multiple advantages to processing AI tasks using a local model. For one, it ensures privacy and constant availability since prompts aren’t sent to an external server for processing. This means Brave users can seek help with sensitive topics such as finances or medical queries without sending any data to the cloud. Additionally, running AI processes locally eliminates the need to pay for unrestricted cloud access. Ollama allows users to utilize a wider variety of open-source models compared to most hosted services, which often support only a limited number of AI models.

Users can also interact with models that have various specializations, including bilingual models, compact-sized models, and code generation models, among others.

When running AI locally, RTX ensures a fast and responsive experience. For example, using the Llama 3 8B model with llama.cpp, users can expect responses at a rate of up to 149 tokens per second, which is roughly equivalent to 110 words per second. This means that when using Brave with Leo AI and Ollama, users will experience quicker responses to inquiries, content summarizations, and more.

NVIDIA’s internal throughput performance measurements on NVIDIA GeForce RTX GPUs, showcasing a Llama 3 8B model with an input sequence length of 100 tokens, generating 100 tokens.

Getting Started with Brave, Leo AI, and Ollama

Installing Ollama is straightforward. Users can download the installer from the Ollama website and let it run in the background. From a command prompt, users can download and install a variety of supported models and then interact with the local model through the command line.

For simple instructions on how to add local LLM support via Ollama, users can refer to Brave’s blog. Once configured to point to Ollama, Leo AI will use the locally hosted LLM for prompts and queries. Users can easily switch between cloud and local models at any time.

Brave with Leo AI running on Ollama and accelerated by RTX is a great way to enhance your browsing experience. You can even summarize and ask questions about AI Decoded blogs!

Developers interested in learning more about how to use Ollama and llama.cpp can find more information on the NVIDIA Technical Blog.

Generative AI is revolutionizing gaming, videoconferencing, and various interactive experiences. Stay updated on what’s new and what’s next by subscribing to the AI Decoded newsletter.

For more Information, Refer to this article.

Brave Browser Integrates RTX-Enhanced Local LLMs with Leo AI

Understanding AI Integration in Everyday Applications

Why Software Is Crucial

Local vs. Cloud Processing

Getting Started with Brave, Leo AI, and Ollama

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply