
November 25, 2024
Semantic search is a transformative approach to searching data, designed to understand the meaning behind the words rather than just exact matches. In this article, we explore building a semantic search feature using Rails, OpenAI, Langchain.rb, and PG Vector. This journey will cover everything from setting up the project to implementing the search function, integrating it into the UI, and planning for future enhancements.

Introduction and Project Setup
The current search implementation in our application is limited to exact matches within titles, content, bodies, or transcripts. Our goal is to enable semantic search, returning results based on the intent and context of the query rather than exact words.
🚀 Need Expert Ruby on Rails Developers to Elevate Your Project?
Key Tools and Technologies
- Rails: The backbone of our web application.
- OpenAI: To generate embeddings, capturing the essence of content.
- Langchain.rb: A toolkit for managing embeddings and language models.
- PG Vector: A PostgreSQL extension for storing and querying vector data.
Challenges and Inspirations
While an earlier attempt with Pinecone Vector DB fell short due to gaps in implementation, resources like Rabbit Hole and James Briggs provided valuable insights into the process.
Chunking Content for Semantic Search
Semantic search begins with breaking content into smaller, manageable chunks. Here’s how it’s structured:
- Content Chunking: Content is split into smaller pieces using the Langchain chunker, optimized with options like chunk size, overlap, and separators.
chunker = Langchain::Chunker.new(text: content, chunk_size: 1536, overlap: 200) chunks = chunker.split
- Storage Model: Each chunk is stored in the database alongside metadata, references to the original content, and embeddings. This enables both search and future functionalities like chatbot integration.
- Nearest Neighbor Tool: Neighbor, a gem by Andrew Kane, is utilized to find related chunks efficiently.
Generating and Storing Embeddings
Embeddings are vector representations of content in a high-dimensional space. Using the Ada 002 model with 1,536 dimensions:
- A new model Chunk is created with a polymorphic association back to the content type.
- A migration enables the PG Vector extension to store these embeddings in PostgreSQL.
class Chunk < ApplicationRecord belongs_to :content, polymorphic: true end
Chunking Logic Implementation
To make the chunking process reusable across models, a concern named Chunkable is created. It provides:
- A chunk method for breaking content based on separators.
- Flexibility to handle various content types, such as articles or podcast transcripts.
Building the Search Function
The search function leverages embeddings and nearest neighbor searches to find related content:

- Query Embeddings: A query is transformed into an embedding vector using OpenAI via Langchain.
- Similarity Calculation: The inner product of query and chunk vectors is calculated to measure similarity. Closer values to 1 indicate higher relevance.
- Results: The nearest neighbor method fetches the top 10 similar chunks and maps them back to the original content.
UI Integration and Testing
With the search function ready, integrating it into the UI involves:
- Adding a search form to the articles index.
- Wiring the function through the articles controller.
The result is a semantic search capable of retrieving relevant content, even when the query isn’t explicitly mentioned in the data.
Cost Optimization and Future Plans
Optimizing Costs
Currently, every search query hits OpenAI’s API, which could become costly. To address this:
- Cache results for repeated queries to reduce API calls.
Future Developments
- Chatbot Integration: Reusing the stored chunks, a chatbot will allow users to interact with content dynamically.
- Retrieval-Augmented Generation (RAG): Embed context into chatbot prompts for intelligent responses.
Conclusion

This project demonstrates the power of combining Rails, OpenAI, Langchain.rb, and PG Vector to build a robust semantic search system. With scalability, cost optimization, and innovative features like chatbot integration on the horizon, this approach exemplifies the potential of modern web applications.
