Building a Log Store in Rust – Part 1

12 Jan

I’ve decided to embark on trying to build a log storage database in Rust. Why a log database? Because Elasticsearch isn’t really up for the task. There is a lot of stuff in Elasticsearch that just doesn’t need to be there for a simple log storage database. Elasticsearch is built on top of Lucene, and Lucene is an indexing engine. So Elasicsearch is really meant as a search engine, not a log storage database.

Why Rust? Because I’m a fan of low-level languages, and in my opinion, Rust is the best low-level language out there today. As their website says, “Rust is a systems programming language that runs blazingly fast, prevents segfaults, and guarantees thread safety.” Those are some pretty intriguing features when it comes to building a distributed database.

My First Attempt

This attempt isn’t my first with Rust, or this project. My first attempt was to build an On-Disk BTree in Rust: https://github.com/wspeirs/btree. That project, despite the 21-stars, didn’t go so well. The code doesn’t work, and I got confused about the OO principals in Rust. About a year has past and the language has gotten better, and my understanding of Rust a bit more refined.

So I’m taking another crack at it, and started with something smaller and easier; the partition logic for the servers. It might not sound super-simple, but it was actually a great place to start. I’m very familiar with the consistent hashing mechanism used in Cassandra, but I dislike it for a few reasons. I think rendezvous hashing is a much more interesting way to solve the problem, and Jump Consistent Hashing is even more interesting and efficient. So I started this attempt by building a library/function that given a key, number of servers, and number of copies, returns a list of the servers to put the copies of the key (log message) on: https://github.com/wspeirs/jump-consistent-hashing. This code as actually surprisingly easy to get working, mostly because I was working with primitives and simple math operations. I actually never ran into the dreaded borrow checker during any of my coding. However, I did run into a few other things.

Overflow in Rust

They’re pretty serious about overflow in Rust. You cannot do it with arrays, and you cannot do it with integers. This sounds great, until you actually bump into it, or want it. The Jump Consistent Hash algorithm relies on 64-bit integer overflow to work properly. With my first attempt, Rust wasn’t having any of this, and would warn me about this overflow. A quick Google Search revealed that I could solve the problem by calling the wrapping_mul() function on a u64: https://goo.gl/TkopHq. I did not realize that primitives in Rust had methods… learned something already!

Type Casting

Rust is also pretty serious about working with similar types. If you aren’t, you have to cast them or face compiler errors. That’s fair… I get it. However, a usize on a 64-bit machine is the same as a u64, but I was still forced to cast it: https://goo.gl/Tnrjcs. That doesn’t make a whole ton of sense to me, but I guess maybe it’s for 64-bit/32-bit compatibility?

Joining Integers as Strings

While testing my code, I wanted to simply print out a list of the servers that were returned for a given key, number of servers, and copies. In a lot of other languages, this would have been pretty easy. With Rust, it felt a little harder than it should have been: https://goo.gl/xVJyk1.

servers.unwrap().iter().map(|x| x.to_string()).collect::<Vec<_>>().join(",");

The unwrap() function is because I’m returning a Result (in case of an error, more on that in the last section). That function is going to return a Vec<u64>. I quickly realized that all the “functional” things you want to do in Rust are hidden behind a call to .iter(). However, this is where I ran into another issue… there’s also a into_iter(). I’m still not really sure which one to use, but they both seem to work interchangeably with my code. I was told on IRC that one takes ownership of the collection and then borrows it out (maybe I’m stating that incorrectly), and the other borrows it. I’m left with, “Who cares? And why should I?” For a language so particular about things, I’m surprised there is more than one way to do something as straightforward as this. I’m sure I’m missing some big implication.

The .map(|x| x.to_string()) is because I have a vector of numbers, and I need to convert them to strings. Again, who knew you could call functions on primitives? And good old to_string() to boot… dare I say I feel like I’m programming in Java?

At first I wasn’t 100% sure what the .collect::<Vec<_>>() function did. The documentation tells me that it “transforms an iterator into a collection.” I thought to myself, “What type of collection?” So during my first attempt at using this call, I neglected the “turbofish” (yup, what it’s actually called), and so the collect function couldn’t figure out into what type of collection I wanted to collect these numbers. Now it makes sense… the template/generic/turbofish is the type of collection to collect into. It’s also interesting that the Vec<_> is enough for the type-inference system to figure out it’s going to be a Vec<str>. (I think that’s the type, but strings in Rust still confuse me.)

The final piece, what I wanted from the start, is the .join() function where I specify my delimiter, and join the strings in the collection. This is straightforward enough, but kinda sucky that I have to go through all the conversion nonsense (vector -> iterator -> map -> vector) just to finally call join. It’s too bad that Vec doesn’t have a map function that produces a new Vec of the mapped type. Or that someone hasn’t created (maybe they did, didn’t search the crates) a macro that does this for you: join!(servers.unwrap(), ","). I also laughed to myself when the folks in IRC were marveling at how readable the above code is. I agree that it is readable; however, who wants to read that much? I just want to print some numbers in a list separated by a comma!

Validating Function Parameters

Because Rust is so picky about everything, I figured they would have a great system for validating the parameters to functions. Turns out, they don’t. I did search the crates for something to do this, and this is all I really found: https://crates.io/crates/validate. The documentation is kinda light, and so maybe the bound() function would do what I’m looking for, but it doesn’t return an error only a boolean.

OK, so I have to do my own if statement for validation, not a huge deal. However, the language doesn’t even have a nice Error for bad arguments. I was looking/hoping for something like Java’s IllegalArgumentException.

Initial Thoughts on Rust, Round 2

At this point I realized that Rust, while amazing in the things it can do and all the code that people have written in it, is still a pretty young language. Some of the niceties/syntactic sugar in languages like Java and Python, just don’t exist yet in Rust. However, to be fair, Java was created in 1995, and Python in 1989, but Rust only started in 2010… not bad for an 8 year old language! I’m still hooked, and looking forward to leveraging the crates for logging, HTTP server/client, memory mapping, and JSON parsing.

Thanks for reading!!!

Leave a Reply

Your email address will not be published. Required fields are marked *