Dawid Ciężarkiewicz aka dpc

The faster you unlearn OOP, the better for you and your software

Object-oriented programming is an exceptionally bad idea which could only have originated in California.

— Edsger W. Dijkstra

Maybe it's just my experience, but Object-Oriented Programming seems like a default, most common paradigm of software engineering. The one typically thought to students, featured in online material and for some reason, spontaneously applied even by people that didn't intend it.

I know how succumbing it is, and how great of an idea it seems on the surface. It took me years to break its spell, and understand clearly how horrible it is and why. Because of this perspective, I have a strong belief that it's important that people understand what is wrong with OOP, and what they should do instead.

Many people discussed problems with OOP before, and I will provide a list of my favorite articles and videos at the end of this post. Before that, I'd like to give it my own take.

Data is more important than code

At its core, all software is about manipulating data to achieve a certain goal. The goal determines how the data should be structured, and the structure of the data determines what code is necessary.

This part is very important, so I will repeat. goal -> data architecture -> code. One must never change the order here! When designing a piece of software, always start with figuring out what do you want to achieve, then at least roughly think about data architecture: data structures and infrastructure you need to efficiently achieve it. Only then write your code to work in such architecture. If with time the goal changes, alter the architecture, then change your code.

In my experience, the biggest problem with OOP is that encourages ignoring the data model architecture and applying a mindless pattern of storing everything in objects, promising some vague benefits. If it looks like a candidate for a class, it goes into a class. Do I have a Customer? It goes into class Customer. Do I have a rendering context? It goes into class RenderingContext.

Instead of building a good data architecture, the developer attention is moved toward inventing “good” classes, relations between them, taxonomies, inheritance hierarchies and so on. Not only is this a useless effort. It's actually deeply harmful.

Encouraging complexity

When explicitly designing a data architecture, the result is typically a minimum viable set of data structures that support the goal of our software. When thinking in terms of abstract classes and objects there is no upper bound to how grandiose and complex can our abstractions be. Just look at FizzBuzz Enterprise Edition – the reason why such a simple problem can be implemented in so many lines of code, is because in OOP there's always a room for more abstractions.

OOP apologists will respond that it's a matter of developer skill, to keep abstractions in check. Maybe. But in practice, OOP programs tend to only grow and never shrink because OOP encourages it.

Graphs everywhere

Because OOP requires scattering everything across many, many tiny encapsulated objects, the number of references to these objects explodes as well. OOP requires passing long lists of arguments everywhere or holding references to related objects directly to shortcut it.

Your class Customer has a reference to class Order and vice versa. class OrderManager holds references to all Orders, and thus indirectly to Customer's. Everything tends to point to everything else because as time passes, there are more and more places in the code that require referring to a related object.

You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.

Instead of a well-designed data store, OOP projects tend to look like a huge spaghetti graph of objects pointing at each other and methods taking long argument lists. When you start to design Context objects just to cut on the number of arguments passed around, you know you're writing real OOP Enterprise-level software.

Cross-cutting concerns

The vast majority of essential code is not operating on just one object – it is actually implementing cross-cutting concerns. Example: when class Player hits() a class Monster, where exactly do we modify data? Monster's hp has to decrease by Player's attackPower, Player's xps increase by Monster's level if Monster got killed. Does it happen in Player.hits(Monster m) or Monster.isHitBy(Player p). What if there's a class Weapon involved? Do we pass it as an argument to isHitBy or does Player has a currentWeapon() getter?

This oversimplified example with just 3 interacting classes is already becoming a typical OOP nightmare. A simple data transformation becomes a bunch of awkward, intertwined methods that call each other for no reason other than OOP dogma of encapsulation. Adding a bit of inheritance to the mix gets us a nice example of what stereotypical “Enterprise” software is about.

Object encapsulation is schizophrenic

Let's look at the definition of Encapsulation:

Encapsulation is an object-oriented programming concept that binds together the data and functions that manipulate the data, and that keeps both safe from outside interference and misuse. Data encapsulation led to the important OOP concept of data hiding.

The sentiment is good, but in practice, encapsulation on a granularity of an object or a class often leads to code trying to separate everything from everything else (from itself). It generates tons of boilerplate: getters, setters, multiple constructors, odd methods, all trying to protect from mistakes that are unlikely to happen, on a scale too small to mater. The metaphor that I give is putting a padlock on your left pocket, to make sure your right hand can't take anything from it.

Don't get me wrong – enforcing constraints, especially on ADTs is usually a great idea. But in OOP with all the inter-referencing of objects, encapsulation often doesn't achieve anything useful, and it's hard to address the constraints spanning across many classes.

In my opinion classes and objects are just too granular, and the right place to focus on the isolation, APIs etc. are “modules”/“components”/“libraries” boundaries. And in my experience, OOP (Java/Scala) codebases are usually the ones in which no modules/libraries are employed. Developers focus on putting boundaries around each class, without much thought which groups of classes form together a standalone, reusable, consistent logical unit.

There are multiple ways to look at the same data

OOP requires an inflexible data organization: splitting it into many logical objects, which defines a data architecture: graph of objects with associated behavior (methods). However, it's often useful to have multiple ways of logically expressing data manipulations.

If program data is stored e.g. in a tabular, data-oriented form, it's possible to have two or more modules each operating on the same data structure, but in a different way. If the data is split into objects with methods it's no longer possible.

That's also the main reason for Object-relational impedance mismatch. While relational data architecture might not always be the best one, it is typically flexible enough to be able to operate on the data in many different ways, using different paradigms. However, the rigidness of OOP data organization causes incompatibility with any other data architecture.

Bad performance

Combination of data scattered between many small objects, heavy use of indirection and pointers and lack of right data architecture in the first place leads to poor runtime performance. Nuff said.

What to do instead?

I don't think there's a silver bullet, so I'm going to just describe how it tends to work in my code nowadays.

First, the data-consideration goes first. I analyze what is going to be the input and the outputs, their format, volume. How should the data be stored at runtime, and how persisted: what operations will have to be supported, how fast (throughput, latencies) etc.

Typically the design is something close to a database for the data that has any significant volume. That is: there will be some object like a DataStore with an API exposing all the necessary operations for querying and storing the data. The data itself will be in form of an ADT/PoD structures, and any references between the data records will be of a form of an ID (number, uuid, or a deterministic hash). Under the hood, it typically closely resembles or actually is backed by a relational database: Vectors or HashMaps storing bulk of the data by Index or ID, some other ones for “indices” that are required for fast lookup and so on. Other data structures like LRU caches etc. are also placed there.

The bulk of actual program logic takes a reference to such DataStores, and performs the necessary operations on them. For concurrency and multi-threading, I typically glue different logical components via message passing, actor-style. Example of an actor: stdin reader, input data processor, trust manager, game state, etc. Such “actors” can be implemented as thread-pools, elements of pipelines etc. When required, they can have their own DataStore or share one with other “actors”.

Such architecture gives me nice testing points: DataStores can have multiple implementations via polymorphism, and actors communicating via messages can be instantiated separately and driven through test sequence of messages.

The main point is: just because my software operates in a domain with concepts of eg. Customers and Orders, doesn't mean there is any Customer class, with methods associated with it. Quite the opposite: the Customer concept is just a bunch of data in a tabular form in one or more DataStores, and “business logic” code manipulates the data directly.

Follow-up read

As many things in software engineering critique of OOP is not a simple matter. I might have failed at clearly articulating my views and/or convincing you. If you're still interested, here are some links for you:

Feedback

I've been receiving comments and more links, so I'm putting them here: