Let’s start with some definitions: A data structure is simply a collection of values. An Object is a collection of values and functions (or methods) associated with that collection.
Paraphrasing from a chapter in “Clean code”1: Objects should adhere to principle of encapsulation and hide their data (via private fields) behind functions and abstractions. Data structure, on the other hand, exposes their data (via public fields) and have no meaningful functions/methods associated with them.
We write functions, or procedures to use the aforementioned book’s parlance, to manipulate a collection of values and produce output to do meaningful work. Such work is driven by business logic which is subject to regular changes. These changes then translate into localized or global code changes depending on whether we modelled them as data structures or objects. This impact on code change is the focus on this post.
Let’s use an example to illustrate the different models. Suppose we want to model some events, call them OfflineEvent
and LocationEvent
, and we want to be able to operate on them via some procedures such as uploading them.
Modelling as data structure
Modelling Events as data structure would mean we have to write an upload code that case switch on the event type. Using Golang, it might look like the following.
type OfflineEvent struct {}
type LocationEvent struct {}
// Presumably in a different package/file.
func Upload(event interface{}) error {
switch event.(type) {
case OfflineEvent: // Do something
case OfflineEvent: // Do something else
}
}
In the object oriented paradigm like Java, the code might look like the following.
abstract class Event {}
class OfflineEvent extends Event {}
class LocationEvent extends Event {}
class EventUploader {
public void Upload(Event event) {
if (event instanceOf OfflineEvent ) {
// Do something
}
if (event instanceOf LocationEvent ) {
// Do something else
}
}
}
Pay attention to how the code needs to change if we were to add a new event type and a new procedure to operate on the data structures. To add a new event type, we need to go through all functions, most likely in multiple files, where we do the switch statement and update it. In other words, we need to make a global change to update all procedures on the data structure. However, to add a new procedure, we only need to make localized change to a new function/class and add a new switch statement to handle all cases.
Modelling as objects
If we model the Event
as an object and encapsulate the Upload
procedure, it will look something like this.
In Golang:
type Uploadable interface {
func Upload(dep Dependencies) error
}
type OfflineEvent struct {}
func (*OfflineEvent) Upload(dep Dependencies) error
type LocationEvent struct {}
func (*LocationEvent) Upload(dep Dependencies) error
func Upload(x Uploadable) error {
d := createUploadDependencies
return x.Upload(d)
}
abstract class Event {
public void Upload(Dependencies dep)
}
class OfflineEvent extends Event {
public void Upload(Dependencies dep) {}
}
class LocationEvent extends Event {
public void Upload(Dependencies dep) {}
}
class EventUploader {
public void Upload(event Event) {
return event.Upload(dep)
}
}
Now consider how the code changes if we want to add a new event type and a new procedure. A new event type would mean we have to make sure it satisfy the Uploadable
interface and changes would localized to a single package/class. However, adding a new procedure will imply adding a new method to the interface and all objects implementing the interface have to change.
Some closing thoughts
Some programming languages make the idea of “value container” succinct or have evolved to become so. For example, Scala has case class which is an immutable data structure and convenient feature such as decomposing via pattern matching. Python, through PEP-557, has implemented data classes, which expresses the idea of using classes to simply store values. It enables the convenience of retrieving value via attribute lookup.
The examples above illustrate that if goal is to strive for is minimizing code changes when requirements change, then there is a trade-off regardless of how the code is structured. Which design is better depends on the changes we anticipate will happen more frequently.
PS: Classes whose purpose is to simply hold value is also known as Data Transfer Object or DTO.
Related: Anti pattern: turning struct into interface
Clean code - Robert Martin. Chapter on Objects and data structure. ↩︎