Designing Search for Blogs (Beyond LIKE Queries)

So your blog is growing, and users are asking for a search feature. Your first instinct might be to throw together a quick SQL query with a LIKE operator and call it a day. I mean, how hard can search be, right?

Well, turns out pretty hard if you want it to actually work well. Let's talk about why that simple LIKE '%keyword%' approach is going to cause you headaches, and what you should do instead.

Why LIKE Queries Will Let You Down

Let's start with the obvious problem. When you write something like:

SELECT * FROM posts 
WHERE title LIKE '%nodejs%' OR content LIKE '%nodejs%';

You're basically telling your database to scan every single row in your table. No indexes can help you here because the % wildcard at the beginning makes it impossible for the database to use its index efficiently. With a few hundred posts? Sure, it might be fast enough. But once you hit thousands of posts, your search will start crawling.

But speed isn't even the worst part. Here are the real problems:

Case sensitivity issues. Depending on your database, LIKE might be case-sensitive, so "NodeJS" won't match "nodejs". You could use ILIKE in PostgreSQL or LOWER() functions, but that's just adding more overhead.

No relevance ranking. When someone searches for "react hooks tutorial", which result should come first? The post titled "React Hooks Tutorial" or the one that mentions "react" and "hooks" buried somewhere in the content? LIKE can't tell you. It's either a match or it isn't.

No fuzzy matching. If someone types "javscript" (notice the typo), they get zero results instead of JavaScript posts. Not great for user experience.

Phrase matching problems. Searching for "node package manager" won't find posts that say "package manager for Node" because the words aren't in that exact order.

What Actually Makes Search Work

Before we jump into solutions, let's understand what makes search feel good to users.

Ranking by Relevance

When you search Google, the best results appear first. That's not magic, that's ranking algorithms that calculate relevance scores using factors like term frequency (how often a word appears), inverse document frequency (how unique that word is across all documents), and field length normalization.

For a blog, you might want matches in the title to rank higher than matches in the content. A post titled "Getting Started with Docker" is probably more relevant to "docker" than a post that just mentions Docker once in passing.

Understanding Word Stems

When someone searches "running", they probably want results about "run", "runs", and "runner" too. This is called stemming, where search matches are based on a root form or stem of a word. Good search engines handle this automatically.

Handling Common Words

Common words like "the" or "an" are often ignored because they're called stop words. Nobody searching "the best way to learn Python" really cares about matching "the" in their results. Smart search knows to focus on "best", "way", "learn", and "Python" instead.

Full-Text Search Options

Alright, so LIKE queries suck. What should you use instead? You've got options.

Built-in Database Solutions

PostgreSQL has excellent full-text search capabilities built right in. You don't need to look farther than your database management system for a full-text search solution.

Here's a basic example:

-- Add a tsvector column for search
ALTER TABLE posts 
ADD COLUMN search_vector tsvector;

-- Update it with searchable content
UPDATE posts 
SET search_vector = 
  to_tsvector('english', title || ' ' || content);

-- Create a GIN index for fast searches
CREATE INDEX posts_search_idx 
ON posts USING GIN(search_vector);

-- Now search like this
SELECT title, 
       ts_rank(search_vector, query) AS rank
FROM posts, 
     to_tsquery('english', 'nodejs & tutorial') query
WHERE search_vector @@ query
ORDER BY rank DESC;

The document is parsed into tokens, which are small fragments like words and phrases from the document's text. PostgreSQL handles stemming, stop words, and ranking for you.

MySQL has similar capabilities with FULLTEXT indexes:

-- Create a fulltext index
ALTER TABLE posts 
ADD FULLTEXT(title, content);

-- Search with natural language mode
SELECT title, 
       MATCH(title, content) AGAINST('nodejs tutorial') AS relevance
FROM posts
WHERE MATCH(title, content) AGAINST('nodejs tutorial')
ORDER BY relevance DESC;

Dedicated Search Engines

For larger sites or more advanced needs, dedicated search engines might be worth considering.

Elasticsearch is probably the most popular choice. It's a distributed, RESTful search and analytics engine built on top of Apache Lucene, offering powerful full-text search capabilities, real-time indexing, and scalability. The downside? It's another service to maintain, and for a simple blog, it might be overkill.

MeiliSearch and Typesense are newer alternatives that are easier to set up and more resource-friendly. They're great middle-ground options if PostgreSQL's full-text search isn't cutting it but you don't need Elasticsearch's complexity.

Client-Side Search for Static Sites

If you're running a static blog (like with Next.js or Hugo), you might want to handle search entirely on the client side. You would generate a JSON document at build time containing each blog post's metadata, which contains the data needed to perform the search.

Libraries like Lunr.js can run searches right in the browser. Your search index gets downloaded once, then all searches happen locally without hitting your server. Perfect for small to medium-sized blogs.

Index Strategies That Matter

Whether you're using PostgreSQL, MySQL, or something else, how you index your data makes a massive difference.

Understanding Index Types

B-tree indexes are the default in most databases. They're best suited for handling equality and range queries on data that can be sorted into some ordering. Great for IDs, dates, and sorting, but not ideal for full-text search.

GIN indexes (Generalized Inverted Index) are designed to deal with data types that are subdividable and you want to search for individual component values like array elements or lexemes in a text document. This is what you want for full-text search in PostgreSQL.

Here's the thing about GIN indexes though: they're amazing for reads but slower for writes. GIN indexes are slower to write but faster to read, making them great for applications that query more than they update. For a blog where you write new posts occasionally but search all the time, that's perfect.

Composite Indexes for Better Performance

If you frequently filter searches by category or date, consider combining fields:

-- Instead of separate indexes
CREATE INDEX ON posts(category);
CREATE INDEX ON posts(published_at);
CREATE INDEX ON posts USING GIN(search_vector);

-- Consider a composite approach
CREATE INDEX ON posts(category, published_at);
CREATE INDEX ON posts USING GIN(search_vector);

This way, queries like "show me all JavaScript posts from last month" can use a single index lookup.

Maintaining Your Indexes

Indexes don't stay perfect forever. As your blog grows and posts get updated, indexes can become fragmented. In PostgreSQL, you can use REINDEX to rebuild them, though be careful on live sites since it locks the table.

A better approach for production is to create a new index concurrently, then swap:

-- Create new index without locking
CREATE INDEX CONCURRENTLY posts_search_idx_new 
ON posts USING GIN(search_vector);

-- Drop old, rename new
DROP INDEX posts_search_idx;
ALTER INDEX posts_search_idx_new 
RENAME TO posts_search_idx;

Making Search Feel Good (UX Stuff)

Technical implementation is only half the battle. How users interact with search matters just as much.

Autocomplete/Suggestions

Users expect search to guide them. Make sure results are returned from the first keystroke for maximum responsiveness and user control.

You'll want to show suggestions as they type. Include 6 to 8 query suggestions, and with modern autocomplete libraries, query suggestions will include recent searches.

Here's a simple approach with PostgreSQL:

-- Get search suggestions
SELECT DISTINCT 
  ts_headline('english', title, query) AS suggestion
FROM posts, 
     to_tsquery('english', 'javasc:*') query
WHERE search_vector @@ query
LIMIT 8;

The :* is a prefix match, so "javasc" matches "javascript", "javascript", etc.

Highlight Matches in Results

When showing search results, highlight what actually matched. Bolding search terms in autocomplete results allows users to quickly scan through relevant matches.

PostgreSQL can do this for you:

SELECT 
  ts_headline('english', content, query) AS excerpt
FROM posts, 
     to_tsquery('english', 'react & hooks') query
WHERE search_vector @@ query;

This returns snippets with your search terms wrapped in <b> tags.

Keep It Fast

Search needs to feel instant. Aim for responses under 200ms. If your searches are taking longer, check if:

Your indexes are actually being used (use EXPLAIN ANALYZE)
You're searching too much content at once (limit results)
Your search_vector column isn't being updated properly

Mobile Matters

More people probably read your blog on their phones than their computers. On a smaller device such as a mobile phone, which can accommodate fewer results, cap the list of suggestions at 4 to 8.

Also, make the search input big enough to tap easily. Nobody likes trying to hit a tiny search box on a phone screen.

Handle Typos Gracefully

When possible, suggest corrections. "Did you mean JavaScript?" can save a search. PostgreSQL has some fuzzy matching capabilities, but for better results, you might need to look at edit distance algorithms or libraries like fuzzywuzzy.

Putting It All Together

Here's a realistic implementation for a Node.js blog using PostgreSQL:

// Set up your search index
async function createSearchIndex() {
  await db.query(`
    ALTER TABLE posts 
    ADD COLUMN IF NOT EXISTS search_vector tsvector;
    
    UPDATE posts 
    SET search_vector = 
      setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
      setweight(to_tsvector('english', coalesce(content, '')), 'B') ||
      setweight(to_tsvector('english', coalesce(tags, '')), 'C');
    
    CREATE INDEX IF NOT EXISTS posts_search_idx 
    ON posts USING GIN(search_vector);
  `);
}

// Keep it updated with triggers
async function createSearchTrigger() {
  await db.query(`
    CREATE OR REPLACE FUNCTION posts_search_update() 
    RETURNS trigger AS $$
    BEGIN
      NEW.search_vector := 
        setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(NEW.content, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(NEW.tags, '')), 'C');
      RETURN NEW;
    END
    $$ LANGUAGE plpgsql;
    
    CREATE TRIGGER posts_search_update 
    BEFORE INSERT OR UPDATE ON posts
    FOR EACH ROW EXECUTE FUNCTION posts_search_update();
  `);
}

// Search function
async function searchPosts(query, limit = 10) {
  const result = await db.query(`
    SELECT 
      id,
      title,
      slug,
      ts_headline('english', content, query, 
        'MaxWords=30, MinWords=15') AS excerpt,
      ts_rank(search_vector, query) AS rank
    FROM posts, 
         to_tsquery('english', $1) query
    WHERE search_vector @@ query
    ORDER BY rank DESC
    LIMIT $2;
  `, [query.replace(/\s+/g, ' & '), limit]);
  
  return result.rows;
}

// Autocomplete suggestions
async function getSuggestions(partial, limit = 8) {
  const result = await db.query(`
    SELECT DISTINCT title
    FROM posts
    WHERE search_vector @@ to_tsquery('english', $1)
    ORDER BY title
    LIMIT $2;
  `, [partial + ':*', limit]);
  
  return result.rows.map(r => r.title);
}

The setweight() calls make title matches rank higher than content matches, which rank higher than tag matches.

With All Said

Search is one of those things that seems simple until you try to build it properly. LIKE queries are tempting because they're easy, but they'll cause performance and usability problems as your blog grows.

If you're using PostgreSQL or MySQL, start with their built-in full-text search. It's good enough for most blogs and you don't need to add another service to your stack. Focus on proper indexing, relevance ranking, and a decent user interface with autocomplete.

Only reach for Elasticsearch or similar tools when you've actually outgrown your database's capabilities. For most blogs, that day might never come.

And remember: fast search that returns decent results beats perfect search that takes forever. Start simple, measure what matters, and improve from there.