How I structure my apps (in Rust and other languages)

This is going to be a quick overview of how I tend to write my application code. It might be a bit Rust-centric, but I apply similar methods in all programming languages I use.

I think it's an important subject and during past online discussions about learning Rust and writing code "the Rust-way", I was asked multiple times how do I do it. I don't really have a lot of time to write something longer and better structured, so please excuse anything that is confusing. You get what you pay for.

Also, I don't want to suggest this is some sacred, best way or anything like that. I think this is what I'm typically doing. A result of years of professional and Open Source work and experiences I gained during that time. I'm always happy to learn and get to know other points of view, so I'm happy to hear any feedback.

High-level methodology

I think I combine Clean/Hexagonal Architecture with Data Oriented Design. That might be too high-level for most readers, but I'll get into more details soon.

It might be worth saying upfront that I don't do OOP. I use interfaces, classes, polymorphism, but it's much different from the typical OOP code that I see in the wild.

"Ports" in Hexagonal Architecture

I advise you to read more about Clean/Hexagonal Architecture, but here are my main takes:

Every external thing (API, DB, cache, filesystem, even queue, etc.) should be abstracted away by one or more interfaces. If my app takes request, does something to Users in the database, and send messages somewhere, it will have:

trait RequestSource {
   fn get_request(&self) -> Result<Request>;
}

trait UserStore {
  fn get_user(&self, id: &UserId) -> Result<User>;
 ...
}

trait MessageQueue {
  fn send_message(&self, message: Message) -> Result<()>;
}

or something similar. Note: trait in Rust is like an interface in other programming languages. The details might differ a lot, but the point is - there's always an interface. This has a lot of benefits.

Interfaces make it easy to understand my application. In particular: interactions with the outside world are well described and structured.

Testing is much easier: I can easily write simple test implementations of these interfaces for testing purposes. I'd usually start with ChannelRequestSource - which allows me to send requests and responses via some channels; InMemoryUserStore that keeps users in a BTreeMap, and InMemoryMessageQueue that just keeps messages in a Vec. My unit-tests never require a database instance or anything like that. I also never need a mocking framework hackery, which I dislike very much. I just pass "fake" implementations of these interfaces to a real code under test and check if the business logic does what it is intended to do.

The real implementations of these interfaces call real databases and APIs and are tested in integration and/or e2e tests separately.

Onion Architecture and Domain Modeling

I try to at least loosely keep track of layers like in Onion Architecture. This means - I identify early which pieces of logic are abstract and independent of the outside usage and rigorously make sure I don't introduce outside dependency anywhere. For this reason, I often split my app into separate crates (Rust library/compilation unit), so I can't accidentally break this rule.

As an example, I can give cargo-crev, which I have split since very beginning into

crev-data - where abstract domain datatypes live; no IO etc. crev-lib - generic, but no longer purely abstract logic cargo-crev - the binary, that contains CLI UI, integration with external code lives In applications where I create new APIs etc. I'd have a separate crate for abstract types describing messages, responses and other parts of it. I often write both the client and server crates (libraries) that depend on the common datatype crate.

I try not to use raw primitive types in bussines logic. At very least I create type aliases like:

type UserID = u64;

even if just to make the code more self-describing.

Actors

First, I consider decomposing my app into many actors - basically, things that can run on their own (often as one or multiple threads) and handle messages. In some applications, there are a lot of natural actors. Example: if the job can be expressed as a series of steps in a pipeline-like fashion, then each step of the pipeline could its own actor. A background task doing some maintenance work could be an actor, etc.

I love decomposing applications into actors because it breaks the application into smaller little applications that can be run and tested separately. I use channels to facilitate message passing between actors, which can be conveniently attached to during testing to drive the actor in separation.

My main application code is then just an actor-coordinator: setting-up main actors, gluing them together with channels, starting them, then detecting exit conditions and/or critical failures and coordinating shutdown, etc.

Also, by using actors, one gets a safe and convenient speedup due to using multiple CPUs.

Data-oriented design

For each actor, I design its own data architecture. In my experience, a lot of people, when writing their code thinks somewhat along the lines...

I have users. class User it is. The user can create an order. class Order added. (...) An order needs a reference back to User ... let's add a new member to the User class...

I fundamentally reject OOP. It's probably the worst way one could structure their data.

Instead - I keep my data in a database-like fashion. Depending on the task at hand - it might look like a slightly different type of database, but it is always a database and not a graph of objects referencing each other back and forth.

If I have users I think about them as rows in an user table. I write struct User. Each User will gets an UserId and goes into users_by_id: BTreeMap<UserId, User>, or something like that. Do I need to look them up by name? I add an index: user_id_by_name: BTreeMap<FullName, UserId> and so on. Again, similar to what I would do in an SQL database. Relationships are expressed by IDs, and not by direct references.

In essence: I don't bother trying to write some abstract generic OOP classes for everything. Instead - I treat the data for what it is: data, and store it as I need it to support operations I need, and write code around how it is stored. There's no one right way to store the data and most definitely it is not a graph of OOP objects pointing at each other back and forth.

The big benefit for Rust users here is that it eliminates a lot of problems associated with the Rust ownership system: circular references etc.

Additionally, storing data in a tabular or any other data-oriented form leads to good cache utilization and natural performance benefits. Writing your code around the data structure prevents a lot of performance pitfalls.

The data-oriented design does not suffer from Object-relational impedance mismatch. It's also easier to scale it out. If the dataset no longer fits in memory, it's much easier to translate the code to work with data in a database, or use techniques for scaling out databases. And the beforementioned standalone actors can be quite easily extracted out into their own instances.

Ownership tree

My applications tend to be structured as a tree of ownership. The main module is the creator and owner of a couple of main resources, actors, etc. each of which is potentially the creator of a couple of more, recursively, sometimes permanently, sometimes temporarily.

The data should naturally flow up and down the call-stack which follows the tree of ownership. When data needs to travel between different branches of this tree, it's a sign that maybe a channel or some other form of message passing is needed.

Data types

My data-types tend to fall into two categories:

plain data
resources

Plain data is just ... data. No strings attached data. Data that I can copy, modify, send to another thread, etc. No references to other objects inside, other than potentially some IDs representing some relationships.

Resources are things that carry with them "something else". Like a database interface implementation, with a connection (or a pool of connections) to the underlying database instance. Or an open file. Or an actor with the thread it is running, and a channel endpoint to send it some messages. Resources are typically never cloned/copied and more often are shared via an Arc (atomically reference-counted) if that's required. More often than not resources require destructors. Which in Rust is nicely handled by ownership system and deterministic drops. Destructor of an actor handle would send a termination message and join on the thread handle, file resource would call close and so on.

It's important to keep this distinction clear, to avoid creating weird things that are neither plain data nor a properly handled resource.

Narrowing down mutability

Side-effects should generally be limited to resources and data-stores. That's where locking, concurrency, and data integrity considerations are handled. It's OK to mutate private local variables etc., but in general, the majority of the code should operate on immutable data, passed by immutable references, IDs or straight copies. Business logic should calculate what side effects are needed first, then the corresponding resource/actor should apply that update.

Summary

I think that's all for now. Thanks for reading and feedback.

Edit: Examples

It was pointed out to me that some code examples would be really useful. If you're interested please see this r/rust comment with links to open source code that might be a decent example of the ideas described above.

#rust #programming

High-level methodology#

"Ports" in Hexagonal Architecture#

Onion Architecture and Domain Modeling#

Actors#

Data-oriented design#

Ownership tree#

Data types#

Narrowing down mutability#

Summary#

Edit: Examples#