Building a Lightweight In-Memory Database in C#: Custom Storage and Querying

Sep 22, 202524 min readBy Vaibhav Lawand

In-Memory DatabaseIndexingB-TreeMVCCStorage Engine

So you've been working with databases for a while now, right? You know how to connect to SQL Server, write some queries, maybe even work with Entity Framework. But have you ever stopped to think about what makes some databases insanely fast? I'm talking Redis-level fast, where you can store and retrieve data in microseconds.

The secret often lies in in-memory databases. These aren't your traditional disk-based databases - they keep everything in RAM, which makes them lightning quick for certain use cases. But here's the thing: building your own in-memory database isn't just about speed. It's about understanding the fundamental concepts that make all databases work. You'll learn about key-value storage, how indexing actually works under the hood, and why some operations are fast while others are slow.

Don't worry if this sounds intimidating. We'll start simple and build up gradually. By the end, you'll have your own working database that you can actually use in real projects. And trust me, understanding this stuff will make you a much better developer when you go back to working with "real" databases.

Why In-Memory Databases Matter - The Speed Story

Let me paint you a picture. You're building a web application, and you need to cache some frequently accessed data - maybe user sessions, or the results of expensive database queries. You could use Redis, which is great, but what if you want to understand exactly how it works? Or what if you need something custom for your specific use case?

The key insight is this: traditional databases store data on disk. Disk drives are mechanical - they have spinning platters and moving read/write heads. That physical movement limits them to maybe a few thousand operations per second. RAM, on the other hand? It's solid state, no moving parts, and it's directly connected to your CPU. We're talking millions of operations per second.

But there's always a trade-off, right? In-memory databases lose data when your application restarts. That's why they're perfect for caching, temporary data, and scenarios where speed matters more than permanence. Think session storage, real-time game state, or processing large datasets that don't need to survive a server restart.

The interesting part is that many production systems combine both approaches. They use in-memory databases for speed and traditional databases for persistence. It's like having a super-fast cache backed by a slower but reliable store.

Starting with the Basics - Key-Value Storage

Every database, whether it's a massive SQL Server instance or our little in-memory store, needs a way to organize data. The simplest approach is key-value storage. You give it a key (like "user:123:name") and it gives you back a value (like "Alice"). No tables, no schemas, just keys and values.

This simplicity is actually a strength. It makes the database flexible - you can store any type of data, and you can structure your keys however you want. Want to store user data? Use keys like "user:123:name", "user:123:email". Want to store products? "product:456:price", "product:456:category". It's all up to you.

public interface IDatabase
{
    Task GetAsync(string key);
    Task SetAsync(string key, T value);
    Task DeleteAsync(string key);
}

Here's our basic interface. Notice how it's generic - we can store any type T. That flexibility is really powerful. You could store strings, numbers, complex objects, even collections. The async methods make it friendly for modern C# applications.

public class InMemoryDatabase : IDatabase
{
    private readonly Dictionary _store = new();

    public async Task GetAsync(string key)
    {
        await Task.Yield(); // Make it properly async

        if (_store.TryGetValue(key, out var value))
        {
            return (T)value;
        }

        throw new KeyNotFoundException($"Key '{key}' not found");
    }

    public async Task SetAsync(string key, T value)
    {
        await Task.Yield();
        _store[key] = value;
    }

    public async Task DeleteAsync(string key)
    {
        await Task.Yield();
        return _store.Remove(key);
    }
}

This is basically a Dictionary with async wrappers. The Task.Yield() calls might seem strange, but they ensure the methods are truly asynchronous. Without them, the methods would complete synchronously but still return Tasks, which could cause issues in some async patterns.

I know what you're thinking - this isn't thread-safe! You're right. In a real application, multiple threads might try to read and write at the same time. That's where ConcurrentDictionary comes in. But let's keep it simple for now and add thread safety when we need it.

Using Our Simple Database - Getting Hands-On

Let's see this in action. The beauty of a key-value store is how straightforward it is to use.

var db = new InMemoryDatabase();

// Store some user data
await db.SetAsync("user:123:name", "Alice");
await db.SetAsync("user:123:age", 28);
await db.SetAsync("user:123:hobbies", new List { "Reading", "Hiking" });

// Get it back
string name = await db.GetAsync("user:123:name");
int age = await db.GetAsync("user:123:age");
List hobbies = await db.GetAsync>("user:123:hobbies");

Console.WriteLine($"{name} is {age} years old and enjoys {string.Join(", ", hobbies)}");
// Output: Alice is 28 years old and enjoys Reading, Hiking

See how we're organizing the data with structured keys? "user:123:name", "user:123:age" - this creates a hierarchy without needing tables. It's a pattern you'll see in many NoSQL databases. The colon separators make it easy to understand the relationships between keys.

And look at that generic type safety! We store a List and get it back as List. No casting, no runtime errors. That's the beauty of C#'s generics working with our interface.

Thread Safety - Handling Multiple Users

Now, let's address that thread safety issue. In a web application, you might have hundreds of users hitting your database simultaneously. If two threads try to modify the same key at the same time, you could end up with corrupted data or exceptions.

The solution is ConcurrentDictionary. It handles all the locking internally, so multiple threads can read and write safely without you having to worry about race conditions.

public class ThreadSafeDatabase : IDatabase
{
    private readonly ConcurrentDictionary _store = new();

    public async Task GetAsync(string key)
    {
        await Task.Yield();

        if (_store.TryGetValue(key, out var value))
        {
            return (T)value;
        }

        throw new KeyNotFoundException($"Key '{key}' not found");
    }

    public async Task SetAsync(string key, T value)
    {
        await Task.Yield();
        _store[key] = value;
    }

    public async Task DeleteAsync(string key)
    {
        await Task.Yield();
        return _store.TryRemove(key, out _);
    }
}

The API is identical, but now it's safe for concurrent access. ConcurrentDictionary uses sophisticated locking strategies internally - it can often read without locking at all, and writes are handled efficiently. This gives you thread safety without sacrificing performance.

One thing to watch out for: while individual operations are atomic, sequences of operations aren't. If you need to read a value, modify it, and write it back, you might still have race conditions. That's where transactions come in, but we'll get to that later.

Adding Expiration - Turning It Into a Cache

One of the most common uses for in-memory databases is caching. You want to store data temporarily so you don't have to fetch it from a slow source repeatedly. But cached data should expire - you don't want stale weather data or outdated user permissions hanging around forever.

The concept is simple: each piece of data gets a timestamp saying when it expires. When you try to read it, check if it's still valid. If not, pretend it doesn't exist.

private class CacheItem
{
    public object Value { get; set; }
    public DateTime ExpiresAt { get; set; }
    public bool IsExpired => DateTime.Now > ExpiresAt;
}

public class ExpiringDatabase : IDatabase
{
    private readonly ConcurrentDictionary _store = new();

    public async Task SetAsync(string key, T value, TimeSpan? expiry = null)
    {
        await Task.Yield();

        var item = new CacheItem
        {
            Value = value,
            ExpiresAt = expiry.HasValue ? DateTime.Now.Add(expiry.Value) : DateTime.MaxValue
        };

        _store[key] = item;
    }

    public async Task GetAsync(string key)
    {
        await Task.Yield();

        if (_store.TryGetValue(key, out var item))
        {
            if (item.IsExpired)
            {
                _store.TryRemove(key, out _);
                throw new KeyNotFoundException($"Key '{key}' has expired");
            }

            return (T)item.Value;
        }

        throw new KeyNotFoundException($"Key '{key}' not found");
    }
}

Now each value is wrapped in a CacheItem that tracks when it expires. If expiry is null, we set it to DateTime.MaxValue, which means it never expires. Otherwise, we calculate the expiration time by adding the TimeSpan to the current time.

When reading, we check if the item has expired. If it has, we remove it from the store and throw an exception, just like the key wasn't there at all. This automatic cleanup prevents the cache from growing indefinitely.

// Cache weather data for 5 minutes
await db.SetAsync("weather:nyc", weatherData, TimeSpan.FromMinutes(5));

// Later...
try
{
    var weather = await db.GetAsync("weather:nyc");
    Console.WriteLine($"Cached: {weather.Temperature}°F");
}
catch (KeyNotFoundException)
{
    // Cache expired, fetch fresh data
    var freshWeather = await FetchWeatherFromAPI();
    await db.SetAsync("weather:nyc", freshWeather, TimeSpan.FromMinutes(5));
    Console.WriteLine($"Fresh: {freshWeather.Temperature}°F");
}

This pattern is incredibly useful. It prevents you from hammering external APIs or databases while keeping your data reasonably fresh. The try/catch might seem a bit clunky, but it's actually a clean way to handle cache misses. You attempt to use cached data, and if it's not available (or expired), you fetch fresh data.

Indexing - Making Queries Fast

So far, our database is great for retrieving data by key. But what if you want to find all users older than 25? Without an index, you'd have to scan through every single key-value pair, which is slow. That's where indexing comes in.

An index is essentially a lookup table. Instead of searching through all your data, you search through a smaller, organized structure that points to the relevant data. It's like the index at the back of a book - it helps you find what you're looking for without reading the entire book.

public class IndexedDatabase : ThreadSafeDatabase
{
    private readonly ConcurrentDictionary> _indexes = new();

    public async Task AddToIndexAsync(string indexName, string key, string indexValue)
    {
        await Task.Yield();

        if (!_indexes.ContainsKey(indexName))
        {
            _indexes[indexName] = new HashSet();
        }

        _indexes[indexName].Add($"{indexValue}:{key}");
    }

    public async Task> GetFromIndexAsync(string indexName, string indexValue)
    {
        await Task.Yield();

        if (_indexes.TryGetValue(indexName, out var index))
        {
            return new HashSet(
                index.Where(entry => entry.StartsWith($"{indexValue}:"))
                     .Select(entry => entry.Split(':')[1])
            );
        }

        return new HashSet();
    }
}

We're storing indexes as dictionaries where the key is the index name ("age", "city", etc.) and the value is a set of entries. Each entry is formatted as "indexValue:key", so "25:user:1", "30:user:2", etc.

When we query, we find all entries that start with our search value, then extract the actual keys. The HashSet ensures we don't have duplicates, and the LINQ operations make the filtering clean and readable.

// Index users by age
await db.SetAsync("user:1", new User { Name = "Alice", Age = 25 });
await db.SetAsync("user:2", new User { Name = "Bob", Age = 30 });
await db.SetAsync("user:3", new User { Name = "Charlie", Age = 25 });

await db.AddToIndexAsync("age", "user:1", "25");
await db.AddToIndexAsync("age", "user:2", "30");
await db.AddToIndexAsync("age", "user:3", "25");

// Find all users aged 25
var userKeys = await db.GetFromIndexAsync("age", "25");
// Returns: ["user:1", "user:3"]

foreach (var key in userKeys)
{
    var user = await db.GetAsync(key);
    Console.WriteLine($"{user.Name} is {user.Age}");
}
// Output:
// Alice is 25
// Charlie is 25

Now finding users by age is instant, even if you have millions of users. The index lookup is O(1) for the dictionary access, and then we just filter the small set of matching entries. Compare that to scanning all users, which would be O(n) - potentially millions of operations.

The trade-off is that maintaining indexes takes extra space and time when inserting data. You have to update the index every time you add or modify data. But for read-heavy workloads, this trade-off is usually worth it.

Pattern Matching - Finding Related Data

Structured keys are great, but sometimes you want to find all keys that match a pattern. Like "give me all data for user 123" or "find all product keys". That's where pattern matching comes in.

We'll add a simple wildcard system using the * character. It's not as sophisticated as Redis patterns, but it gets the job done for most use cases.

public async Task> FindKeysAsync(string pattern)
{
    await Task.Yield();

    // Convert wildcard pattern to regex
    var regex = new Regex("^" + Regex.Escape(pattern).Replace("\\*", ".*") + "$");
    return _store.Keys.Where(key => regex.IsMatch(key)).ToList();
}

We convert the wildcard pattern to a regular expression. The ^ and $ anchors ensure we match the entire string, and Replace("\\*", ".*") converts asterisks to "match any characters" in regex speak.

await db.SetAsync("user:1:name", "Alice");
await db.SetAsync("user:1:email", "[email protected]");
await db.SetAsync("user:2:name", "Bob");

// Find all data for user 1
var user1Keys = await db.FindKeysAsync("user:1:*");
// Returns: ["user:1:name", "user:1:email"]

// Find all user names
var nameKeys = await db.FindKeysAsync("*:name");
// Returns: ["user:1:name", "user:2:name"]

This makes it easy to work with hierarchical data. You can find all properties of a user, all users in a certain category, or any other pattern you can think of. It's not as fast as proper indexing, but it's much more flexible.

Persistence - Making Data Survive Restarts

In-memory data has a fatal flaw: it disappears when your application restarts. For important data, you need persistence. But here's the thing - adding persistence to an in-memory database creates some interesting challenges. You have to balance speed with durability.

The simplest approach is to save the entire dataset to disk periodically. It's not the most efficient, but it works for smaller datasets and gives you a good foundation to build on.

public class PersistentDatabase : ThreadSafeDatabase
{
    private readonly string _filePath;

    public PersistentDatabase(string filePath)
    {
        _filePath = filePath;
        LoadFromDisk();
    }

    public async Task SaveToDiskAsync()
    {
        await Task.Yield();

        var data = new Dictionary(_store);
        var json = JsonSerializer.Serialize(data);
        await File.WriteAllTextAsync(_filePath, json);
    }

    private void LoadFromDisk()
    {
        if (File.Exists(_filePath))
        {
            var json = File.ReadAllText(_filePath);
            var data = JsonSerializer.Deserialize>(json);

            foreach (var kvp in data)
            {
                _store[kvp.Key] = kvp.Value;
            }
        }
    }

    public async Task SetAsync(string key, T value)
    {
        await base.SetAsync(key, value);
        await SaveToDiskAsync(); // Auto-save on every change
    }
}

When the database starts, it tries to load data from disk. When you set a value, it automatically saves the entire dataset. This ensures data survives restarts, but it's not very efficient. In a real system, you'd probably save periodically (every 5 minutes) or use a write-ahead log for better performance.

The JSON serialization works well for simple data types, but complex objects might need special handling. System.Text.Json is fast and built-in, but you might want to consider other serializers for more complex scenarios.

Transactions - Making Changes Atomic

Sometimes you need to make multiple changes that succeed or fail together. Like transferring money between accounts - you don't want to deduct from one account if adding to the other fails. That's where transactions come in.

A transaction collects all your changes and applies them atomically. If anything goes wrong, none of the changes take effect. It's like git - you can make a bunch of changes, and then commit them all at once or discard them.

public interface ITransaction
{
    Task SetAsync(string key, T value);
    Task DeleteAsync(string key);
    Task CommitAsync();
    Task RollbackAsync();
}

public class SimpleTransaction : ITransaction
{
    private readonly IDatabase _database;
    private readonly Dictionary _changes = new();
    private readonly HashSet _deletions = new();
    private bool _committed = false;

    public SimpleTransaction(IDatabase database)
    {
        _database = database;
    }

    public async Task SetAsync(string key, T value)
    {
        _changes[key] = value;
        _deletions.Remove(key);
    }

    public async Task DeleteAsync(string key)
    {
        _deletions.Add(key);
        _changes.Remove(key);
    }

    public async Task CommitAsync()
    {
        if (_committed) return;

        foreach (var change in _changes)
        {
            await _database.SetAsync(change.Key, change.Value);
        }

        foreach (var deletion in _deletions)
        {
            await _database.DeleteAsync(deletion);
        }

        _committed = true;
    }

    public async Task RollbackAsync()
    {
        _changes.Clear();
        _deletions.Clear();
    }
}

The transaction keeps track of changes and deletions separately. If you set a key and then delete it within the same transaction, it correctly handles the conflict. The Commit method applies all changes at once, and Rollback discards them.

using var transaction = new SimpleTransaction(database);

await transaction.SetAsync("account:123:balance", 1000);
await transaction.SetAsync("account:456:balance", 500);

// Transfer money
await transaction.SetAsync("account:123:balance", 900);
await transaction.SetAsync("account:456:balance", 600);

await transaction.CommitAsync(); // All changes applied atomically

If something went wrong after setting the first balance but before setting the second, the transaction wouldn't commit, and both accounts would keep their original values. Perfect for financial operations or any scenario where consistency matters.

When In-Memory Databases Shine

Now that you've built your own in-memory database, you might be wondering when to use it versus a traditional database. The answer depends on your specific needs, but here are the scenarios where in-memory databases really excel.

First, caching is the obvious one. If you're fetching data from a slow source - like an external API or a complex database query - caching it in memory can speed things up dramatically. The expiration feature we added makes it perfect for this.

Session storage is another great use case. User login data, shopping cart contents, or temporary preferences don't need to survive server restarts. Keeping them in memory means instant access without database round trips.

Real-time applications love in-memory databases. Think chat applications, online games, or collaborative editing tools. These need to handle lots of concurrent users with very low latency. The thread safety and speed of in-memory storage fit perfectly.

Processing large datasets is another scenario. If you're doing data analysis or ETL work, loading everything into memory can be much faster than streaming from disk. The indexing and querying features we built would help here too.

Finally, read-heavy workloads benefit greatly. If your application reads data much more often than it writes, the speed advantage of memory over disk becomes a huge win. Social media feeds, product catalogs, and configuration data are all examples.

Performance Tips and Gotchas

Building a high-performance in-memory database requires attention to detail. The right data structures can make or break your performance.

Dictionary gives you O(1) lookups, which is about as fast as it gets. But if you're storing millions of keys, memory usage becomes important. Each key-value pair has overhead, and large object graphs can fragment memory.

Minimize allocations where possible. Object pooling can help if you're creating and discarding lots of temporary objects. For very large datasets, consider memory-mapped files - they let you access disk storage with memory-like performance.

Watch out for the garbage collector. If you're constantly creating and discarding objects, GC pauses can hurt your performance. The async patterns we used help, but they're not a complete solution.

Finally, remember that in-memory databases are not a replacement for proper persistence. They're a tool in your toolbox, not the only tool. Use them where speed matters, but always have a plan for data that needs to survive system failures.

Summary

Building your own in-memory database was quite the journey, wasn't it? We started with a simple key-value store using a Dictionary, then added thread safety with ConcurrentDictionary, expiration for caching, indexing for fast queries, pattern matching for flexible searches, persistence to survive restarts, and transactions for atomic operations. Each feature built on the previous ones, showing how database concepts layer on top of each other.

The real value here isn't just having a working database - it's understanding the fundamental principles that make databases work. Key-value storage as the foundation, indexing for performance, transactions for consistency, and persistence for durability. These concepts apply whether you're working with Redis, SQL Server, or building your own custom storage solution.

Remember, in-memory databases aren't about replacing traditional databases - they're about choosing the right tool for the job. When speed matters more than permanent storage, when you need to handle thousands of concurrent operations, or when you're building real-time features, in-memory storage becomes incredibly powerful. Just don't forget to persist the important stuff!

You've got the foundation now. Try extending it with features like pub/sub messaging, sorted sets, or geospatial queries. Or use what you've learned to better understand how Redis and other in-memory databases work under the hood. The possibilities are endless when you understand the core concepts.