Organizing Clojure code with Functional Core, Imperative Shell

Shantanu Kumar
10 min readMay 14, 2021

--

At a first casual glance, “Organizing Clojure code” might sound like a trivial problem to some. It’s easy to get it wrong because it’s subtle. The real question is not just about how to organize code, which is a static artifact, but rather how to organize the runtime manifestation of that code in production, tests, development, and REPL, etc. This blog post is inspired by the discussion at https://clojureverse.org/t/organizing-clojure-code-a-real-problem/7567 and here we try to explore the various aspects of organizing Clojure code. You only need elementary knowledge of Functional Programming and Clojure application development at the minimum to follow along.

Almost all systems are distributed systems today. Even a simple application that reads from and writes to a database, is dependent on the shape of the data in persistent storage, which is part of the implementation logic. Organizing code across Space (e.g. distributing code across namespaces, modules, projects, or microservices, etc) and Time (progressively building up the runtime context and generating side effects in the environment with passing time) is a topic of software architecture and applicable to almost all programming systems. Though we only see the combined symptoms of these two factors, one can take the effects apart and see how Space and Time aspects influence the overall system. In this post, we restrict the scope and discuss my opinionated take on organizing Clojure code within a project using the “Functional Core, Imperative Shell” approach for situated programs.

Why care about code organization?

Code organization is the mental model of how a codebase is structured and wired up. A clean, orthogonal, and consistent code organization helps developers fluently anticipate the coding pattern throughout the codebase and bring new code into existence. This mental model is a crucial ingredient for developer productivity, and ought to be shared by the entire development team.

A bunch of confused developers is the last thing you want in your Clojure team. Programmers coming from a non-functional programming paradigm (e.g. Object-oriented or Structured programming) may have the inertia to reuse the code organization of their respective previous paradigm on a Clojure project. This may lead to a great deal of perturbation or friction in the team. A common, well-understood, and reliable code organization removes such impediments and greatly aids developer onboarding.

Functional Programming

Functional Programming is the art of making Pure Functions, Immutable Data and Side Effects work together in harmony. Clojure supports and embodies the philosophy of Functional Programming. Pure Function: For any given input, a pure function (e.g. one that adds two given integers) returns the same result every time it is called, without causing any change to the environment. Pure functions are free of side effects. Immutable Data: Any immutable data is pure value instead of a mutable variable or container for the data. Any change to the value (e.g. increment a counter) leads to a new value, which is another immutable data. Side Effect: A side effect is about causing change to the environment the program runs in. It is a dual of pure function/immutable data. When changes to a value need to be remembered, it is done via “state management”, which is a side effect. When you perform an I/O operation in the program, that is a side effect. When you read the system clock value (that changes with time) or generate a random number, those also introduce side effects. A function that causes side effects is called an Impure Function.

Immutable Data and Pure Functions are invariants. They do not change with time under any condition; you can test pure functions with various inputs in parallel. Code dealing with just pure functions and immutable data is much easier to reason about than equivalent code using side effects.

Side effects do not compose. They are not guaranteed to be repeatable — they do not have a guarantee of invariance. That is another way to say that side effects are not referentially transparent, or that side effects are infectious. Given functions foo, bar, and baz with dependency relationship foobarbaz (i.e. foo calls bar, and bar calls baz) if the function baz is impure, it makes both bar and foo impure even though foo and bar do not cause any side effect other than calling their dependencies. Side effects are very often time-sensitive. You need to factor in time and context to reason about them, which is complex.

Functional Core, Imperative Shell

We cannot achieve Functional Programming by having pure functions call impure functions. We saw why in the previous section. Because side effects are infectious and not referentially transparent, the only way to keep them isolated from pure functions is to have impure functions call pure functions, but not vice versa. Based on this idea, we can extract functionality as much as possible as pure functions and leave only the essential side effects in impure functions, such that one can write (impure) “workflow” functions to call both pure and impure functions to materialize use cases — this is called “pushing side effects to the edge”, and this entire approach is termed Functional Core, Imperative Shell. A similar approach has been discovered and practiced in the Object-oriented Programming world, called the Onion Architecture, and adapted for functional programming (using F#) in the excellent book Domain Modeling Made Functional.

Isolating impure functions as an outer shell for pure functions implies a layering. An outer layer can only call its inner layers; an inner layer does not know the outer layers' existence. The inner layers are made of pure functions (business logic) and immutable data models (domain model), forming a functional core. Outer layers interleave the impure function calls (i.e. side effects) with the functional core calls into a workflow (like a transaction script) to achieve the business use case. The workflow functions and the side effect functions are impure, but everything else remains functional and pure.

Isolating side effects in Clojure

The easiest way to isolate side effects is to not have them in the first place. No side effects mean no need to isolate them. Below are few ways to minimize and disentangle the side effects — the basic idea is to stop Side Effects from leaking into Space and keep them well-isolated in the Time context.

Segregate side effects

Side Effects make the function calls referentially opaque. To achieve the goal of functional programming to increase referential transparency, extract the essential side effects into their respective impure functions. Whatever remains must be pure, which you can refactor using functional programming concepts. It may be a good idea to keep impure and pure functions in separate namespaces for easy identification. Upon splitting the two, you can write ‘workflow’ functions as described in the previous section. Business flows too complex to fit into the ‘transaction script’ model can be broken down into ‘workflow of workflows’. All state and stateful resources (even though singletons) are to be passed wherever required as parameters — this improves design flexibility and testability. Workflow functions accepting side-effecting impure functions as dependency arguments make it vastly easier to mock the behavior and simulate various failure conditions.

Avoid redefining Vars

Redefining vars makes them stateful, global variables. ‘Global’ here means the side effect is extended into ‘Space’, which others share, hence impacting all ‘var references’ at once. Vars are defined and bound first when the namespaced is loaded. Unless redefined, any reference to that var can safely assume referential transparency because it is immutable upon definition — it is easy to reason about. The alternative to redefining a var is to create a first-class state (or stateful resource) and have it passed as a parameter wherever required. However, there are situations where it is useful to redefine vars, e.g. instrumenting vars (not breaking referential transparency) for performance monitoring and input/output/spec validation during development, and redefining reference vars (test scope) to initialized resources during tests.

Avoid global state in Vars

Vars are global, so when you bind a var to some state (or stateful resource) it becomes a global state. It is going to require a conscious effort for somebody not to access the global state from pure functions. A better way is to not create a global state at all, and have the stateful resources passed as parameters to functions. (This may lead to the challenge of propagating state dependencies to functions; can be handled by some libraries.) However, there are some valid use cases where a global state makes sense, e.g. state or resource access in REPL, or in tests.

Vars are global, which implies there can be only one version at any given time. Consider an application talking to an SQL database using a connection pool as a stateful resource. The application is deployed at 10 sites. Since two out of 10 sites overload the database due to high transaction volume, it was decided to use the read-replica for ‘database reads’ at those two sites. If the connection pool (stateful resource) is bound to a var, how would you distinguish between the writable and read-only connection pools? Even if you decide to bind the var to {:write-pool pool1 :read-pool pool2} instead of a connection pool, how would you use an instrumented version of the connection pool for only selected use cases in the application? With more dynamic requirements the single-version limitation of ‘global’ vars starts leading to a concretional design that is hard to keep simple and scale.

Adopt first-class Resource init/de-init lifecycle

Side effects that happen during program code evaluation when loading a namespace are the worst. Loading of namespaces creates vars in namespaces, which is already a side effect, and once the program code also causes side effects, it creates a nightmare of a brittle, intractable mess. Have a first-class initialization phase in the application that creates and initializes the stateful resources, prepares the runtime, and then launches the application. It often helps if the initialization steps are systematic, composable, declarative, etc. Resource initialization also gets heavily involved at the REPL and in tests where one may want partial or selective resource initialization, de-initialization, or re-initialization that may be repeated many times during the development workflow.

Instrumenting core side effects

Core side effects (e.g. I/O operations or business transactions) frequently need more side effects such as logging, error reporting, monitoring, etc. To keep this permutation from devolving into a combinatorial explosion, you may write adapter functions or macros to uniformly instrument the core side effects.

Error handling and translation

Errors may emanate from both pure functions and side effects. In complex applications errors appearing in one context often need to be translated as an error in another context. Clojure supports exceptions as the host’s mechanism to propagate errors. However, exceptions do not compose (not referentially transparent) nor those are values — exceptions are inadequate to support the functional programming endeavor we are discussing. There are libraries to represent and handle errors as data, which may be useful.

Sometimes, exceptions really represent exceptional circumstances. When it is not possible to recover from or respond to an error, e.g. a critical data store being unavailable or insufficient memory, it may be better to have the execution bailed out using exception. Such panic-mode cases may be handled better by catching exceptions at an outer layer.

Organizing the Imperative Shell

Once identified, Functional Core (owing to invariance) is relatively easier to organize than Imperative Shell. Organizing the latter involves managing Time and State. Invoking the application triggers the application entry point, which is also an entry point for the Imperative Shell. Alternatively, when you run application tests or invoke the REPL, even that hooks into the imperative shell. So, depending upon the context, the Imperative Shell may have different entry points and needs for building application context. However, there are generally two distinct parts of the Imperative Shell — Context initialization, and the Runtime. Each of these parts may be wholly or slightly different for each use case, for example running tests may require only some stateful resources to be initialized but running the application would require the entire initialization sequence to be completed. Similarly, in the REPL the runtime is decided by what the user wants to do at the REPL but running the application results in responding to the input at runtime.

Better Machines

In an imperative shell, the runtime is almost always modeled as a workflow. The section ‘Functional Core Imperative Shell’ mentions ‘Transaction Script’ as one of the techniques to build workflows, which works well for simple scenarios. However, there are complex workflows that need a sophisticated arrangement of pure functions and side effects. A network protocol implementation can abstract out network I/O and maintain state to respond contextually to the incoming bytes — the state can be handled in a State Machine, which itself may be largely implemented using pure functions. A workflow involving concurrency and asynchronous operations can maintain internal state machines to yield a well-composed workflow. A complex user interface may employ Functional Reactive Programming to implement the UI, and similarly, a complex application can use Event-Driven Architecture to implement its workflow. Behind all these sophisticated approaches, the fundamental principle remains the same — clean separation of pure functions and side effects in the most orthogonal way possible.

Conclusion

Code organization is the crucial first step to develop, and more importantly to understand and join the ongoing development of, a software program. We discussed code organization techniques for Clojure using a Functional Programming approach that combines pure functions, immutable data, and side effects. The main challenge in this approach is to isolate and push ‘side effects’ to the edge, and the results are indeed quite rewarding — you get a fundamental simplicity in the codebase that you can leverage to take the application to higher levels of sophistication. I hope this post helps you look at Clojure code organization with a renewed perspective, and try Clojure programming with a reinforced sense of real-world Functional Programming.

Functional Programming is a vast treasure trove and we are barely scratching the surface here. To join my journey of exploring Clojure and Functional Programming, follow me on Twitter and Github. Discuss this story on Twitter, Hacker News, or Reddit r/Clojure.

Thanks to Ramakrishnan M, Vijay Mathew and Ravi Kant Sharma for reading the drafts of this post and for making valuable suggestions. (Disclosure: I am the author of Bract, Promenade and Dime linked to in this post.)

--

--

Shantanu Kumar
Shantanu Kumar

Written by Shantanu Kumar

Experienced software artisan. Into Open Source, functional programming, Java, Clojure, databases, distributed systems, scalability, performance.