Over my career I've worked with some very different languages - PHP (and it's cooler cousin Hack), C, C++, Objective C, JavaScript (and it's cooler cousins TypeScript and Flow), Java, Haskell, Python, Rust, Dart, Lua, OCaml, Elixir, Go etc. This was on projects of varying sizes - from a few hundred lines to multimillion LOC behemoths.
I've noticed a trend in terms of what makes a particular type system easy to use. First off - for small projects, type systems are generally a pain and can very well be avoided. You don't need Haskell's type magic for your exploratory notebook or one-off scraper.
But for all projects of sufficient size, a crucial question you have to ask yourself over and over again is - how do I know the code still works after my changes? With a bad type system, this question will slow you down a huge deal.
The obvious answer is that you write a test for it, of course. But tests come with tons of overhead. You have to write and maintain them in the face of (typically) fast changing product requirements. The more code you have, the worse off you are. Here's an exercise for you: think about the last time you had to investigate a broken test. How did you fix it? Was the code or the test broken? In most cases, it's actually the test that's broken (e.g. due to changing requirements) rather than the code itself. That's not to say that tests are not valuable. They are immensely valuable, but my point is that they do come with considerable overhead that a good type system can alleviate.
So I think the role of a good type system should be to reduce the need for writing tests, because whole classes of mistakes are now impossible to represent. Your type system should be able to represent the fact that a given object can never be null. It should be possible to type away the fact that your container has objects of a particular type.
So let's look at a few things that have, in my experience, made languages a breeze to work with.
Null safety
Look at this piece of JavaScript:
function logCommentCreation(comment) {
logJson({
"timestamp": new Date().valueOf(),
"id": comment.id,
"user_id": comment.user.id,
"post_id": comment.post.id,
"author_id": comment.post.author.id,
})
}
This can easily break if any of comment
,comment.user
, comment.post
, comment.post.author
are null. This is why JavaScript has a very poor type system by default. The Java way of dealing with such issues is to either litter the code with null checks or risk NullPointerException
s. And remember - even if you are diligent at checking whether or not a post always has an author, the product requirements can always change from underneath you, and your code is now broken.
Java gives you the false confidence that its types are sound, but it's easy to forget the fact that any. single. object. can actually benull
.
Opt in restrictions
Most IDs are either numbers or strings. In most languages, you can create an alias for a type. Take for example:
func reportComment(userid int, commentid int) {
// ...
}
If this function is used in enough places by enough people, someone is bound to mix up the arguments. Furthermore, if you want to refactor the code and change the argument order, you're in big trouble. This is why some languages allow you to do this:
type UserID = number;
type CommentID = number;
function reportComment(UserID userId, CommentID commentId) {
// ...
}
If you pass the wrong arguments, the type checker will complain. What's even better is that if you decide to change the underlying type of, say, UserID
from int
to string
, you would only have to do it in a small number of places.
But there's still a problem here. Doing arithmetic on IDs doesn't make any sort of sense, yet it's perfectly allowed by TypeScript. Although not very common, such a problem can happen due to repeated refactoring. Haskell goes a bit further and disallows any sort of behaviour that the user hasn't opted in on:
newtype UserID = Integer
-- this doesn't work
addTwo :: UserID -> UserID
addTwo userId = userId + 2
-- still possible, but you have to be very explicit about this
addTwo :: UserID -> UserID
addTwo (UserID underlyingValue) = UserID (underlyingValue + 2)
The examples I've given above are about IDs, but the same thing applies to time values and other extremely common types as well.
More generally, a good type system should allow you to impose restrictions on your own code to ensure safety.
Refactorability
Not a real word. But what I mean is - let's say that you make some changes. For example, you add a new argument to a function or you change its type signature. A good type system will immediately show you errors for all the pieces of code that are now broken. You fix them one by one and your code now works. This is the joy that a good language can bring.
But JavaScript will happily let you execute that code, and it will even let you call that function, happily passing in undefined
for the extra argument. TypeScript fixes that problem for good.
Greppability is also not a real word, but it's a very valuable thing to have. Ideally, grep should reveal all the callsites for a given function. But this is not the case when the language allows you to call a function by name (e.g. object['some'+'string']()
) or to define magic methods (e.g. __getattr__
in Python or method_missing
in Ruby). Metaprogramming is an extremely powerful tool that should be avoided at all costs. I realise this is not about the type system per se, but it is a huge hurdle for code safety.
Expressiveness
Most languages have escape hatches that let you tell the type system to go away. any
, interface{}
, void*
allow you to express the fact that the type system has failed you.
A good language should avoid the need for this at all costs. The solution to this is not obvious and it likely involves multiple iterations of the language that are specific to its own quirks and needs.
As a salient example, Go's type system can't express "container of type X", and so has to work with interface{}
. This is a very bad thing because a. it allows you to misunderstand the type of an object and b. changing a container's underlying type is now extremely hard and unsafe.
Explicit IO
Monads. The word is Monads. But I don't want to scare anyone. Let's call it async
/ await
, as this seems to be a more friendly name.
This is a somewhat new concept to gain popularity, and it allows you to separate actions that are internal to the codebase from actions that interact with the outside world. This has a few benefits:
- it's informative - it tells you that a given action might take a few seconds to complete, for example
- it's efficient - it allows you to be intentional about sequencing your IO and avoid unnecessary wasted time, by e.g. bundling together two unrelated IO calls
- it can give you a lot of flexibility - some languages allow plugging in your own async/await implementation. This can be extremely useful for mocking the outside world in tests, benchmarking using production traffic and no side effects, replaying production traffic to find obscure bugs, etc.
Easy containers
This is mostly about ergonomics. If you need to write a dozen lines of boilerplate to create a new container, you likely won't bother. Yes, I'm talking about Java. In the real world, there's a huge need for simple containers that are well typed. TypeScript has by far the best ergonomics here. For example:
type User = {id: UserID, name: string, dateJoined: Date, posts: Array<Post>};
This line of code would be 10+ in Java, considering all the getters and setters, curly braces, spacing etc.
Immutability
Or better yet, immutability by default. This comes in different flavours - the combo of not-really-immutability plus ergonomic and efficient object creation (e.g. with the ...
operator in modern JavaScript), or actual immutability that propagates through the type system, like C++ or Rust.
The more, the better. As an industry, we've come to realise that limitations have a lot of merits when applied in the right places. Being able to be certain about the things a given codebase can't do allows you to allocate more of your limited human bandwidth to reasoning about what it can do.
—
There's way more than this, but these are some of the more salient features that seem to exist in some, but not all the widely used languages.
Top wins
TypeScript - easily the biggest upgrade a language has ever had. With a strict linter, this would likely be my top language for writing arbitrary business code. It trades off some safety for ergonomics, which means that I wouldn't consider it for anything absolutely critical. The best thing about it, in my opinion, is the combination of non-nullable types, easy containers, object destructuring and async/await.
Java's declared exceptions - knowing how something can go wrong is a huge reassurance, because you now know how it can't go wrong. Except for that horrible NPE, of course.
Top fails
Go's lack of generics means that non-stdlib containers are a pain to use.
Java's null was called a billion dollar mistake, and probably was.
Ruby and Python code tend to become ungreppable magic that you can't comprehend.
Rust brags about safety, but a runtime panic is three characters away - 1/0
.
Haskell also brags about safety, but the stdlib contains a bunch of not-well-defined functions, like head
, which will throw on empty lists. The good news here is that you can replace the stdlib with something that is actually safe, but it has to be a conscious decision. Not great.