Actor reentrancy in Swift explained

Published on: April 11, 2024

When you start learning about actors in Swift, you’ll find that explanations will always contain something along the lines of “Actors protect shared mutable state by making sure the actor only does one thing at a time”. As a single sentence summary of actors, this is great but it misses an important nuance. While it’s true that actors do only one thing at a time, they don’t always execute function calls atomically.

In this post, we’ll explore the following:

  • Exploring what actor reentrancy is
  • Understanding why async functions in actors can be problematic

Generally speaking, you’ll use actors for objects that must hold mutable state while also being safe to pass around in tasks. In other words, objects that hold mutable state, are passed by reference, and have a need to be Sendable are great candidates for being actors.

If you prefer to see the contents of this post in a video format, you can watch the video below:

Implementing a simple actor

A very simple example of an actor is an object that caches data. Here’s how that might look:

actor DataCache {
  var cache: [UUID: Data] = [:]
}

We can directly access the cache property on this actor without worrying about introducing data races. We know that the actor will make sure that we won’t run into data races when we get and set values in our cache from multiple tasks in parallel.

If needed, we can make the cache private and write separate read and write methods for our cache:

actor DataCache {
  private var cache: [UUID: Data] = [:]

  func read(_ key: UUID) -> Data? {
    return cache[key]
  }

  func write(_ key: UUID, data: Data) {
    cache[key] = data
  }
}

Everything still works perfectly fine in the code above. We’ve managed to limit access to our caching dictionary and users of this actor can interact with the cache through a dedicated read and write method.

Now let’s make things a little more complicated.

Adding a remote cache feature to our actor

Let’s imagine that our cached values can either exist in the cache dictionary or remotely on a server. If we can’t find a specific key locally our plan is to send a request to a server to see if the server has data for the cache key that we’re looking for. When we get data back we cache it locally and if we don’t we return nil from our read function.

Let’s update the actor to have a read function that’s async and attempts to read data from a server:

actor DataCache {
  private var cache: [UUID: Data] = [:]

  func read(_ key: UUID) async -> Data? {
    print(" cache read called for \(key)")
    defer {
      print(" cache read finished for \(key)")
    }

    if let data = cache[key] {
      return data
    }

    do {
      print(" attempt to read remote cache for \(key)")
      let url = URL(string: "http://localhost:8080/\(key)")!
      let (data, response) = try await URLSession.shared.data(from: url)

      guard let httpResponse = response as? HTTPURLResponse,
              httpResponse.statusCode == 200 else {
        print(" remote cache MISS for \(key)")
        return nil
      }

      cache[key] = data
      print(" remote cache HIT for \(key)")
      return data
    } catch {
      print(" remote cache MISS for \(key)")
      return nil
    }
  }

  func write(_ key: UUID, data: Data) {
    cache[key] = data
  }
}

Our function is a lot longer now but it does exactly what we set out to do; check if data exists locally, attempt to read it from the server if needed and cache the result.

If you run and test this code it will most likely work exactly like you’ve intended, well done!

However, once you introduce concurrent calls to your read and write methods you’ll find that results can get a little strange…

For this post, I’m running a very simple webserver that I’ve pre-warmed with a couple of values. When I make a handful of concurrent requests to read a value that’s cached remotely but not locally, here’s what I see in the console:

 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00

As you can see, executing multiple read operations results in having lots of requests to the server, even if the data exists and you expected to have the data cached after your first call.

Our code is written in a way that ensures that we always write a new value to our local cache after we grab it from the remote so we really shouldn’t expect to be going to the server this often.

Furthermore, we’ve made our cache an actor so why is it running multiple calls to our read function concurrently? Aren’t actors supposed to only do one thing at a time?

The problem with awaiting inside of an actor

The code that we’re using to grab information from a remote data source actually forces us into a situation where actor reentrancy bites us.

Actors only do one thing at a time, that’s a fact and we can trust that actors protect our mutable state by never having concurrent read and write access happen on mutable state that it owns.

That said, actors do not like to sit around and do nothing. When we call a synchronous function on an actor that function will run start to end with no interruptions; the actor only does one thing at a time.

However, when we introduce an async function that has a suspension point the actor will not sit around and wait for the suspension point to resume. Instead, the actor will grab the next message in its “mailbox” and start making progress on that instead. When the thing we were awaiting returns, the actor will continue working on our original function.

Actors don’t like to sit around and do nothing when they have messages in their mailbox. They will pick up the next task to perform whenever an active task is suspended.

The fact that actors can do this is called actor reentrancy and it can cause interesting bugs and challenges for us.

Solving actor reentrancy can be a tricky problem. In our case, we can solve the reentrancy issue by creating and retaining tasks for each network call that we’re about to make. That way, reentrant calls to read can see that we already have an in progress task that we’re awaiting and those calls will also await the same task’s result. This ensures we only make a single network call. The code below shows the entire DataCache implementation. Notice how we’ve changed the cache dictionary so that it can either hold a fetch task or our Data object:

actor DataCache {
  enum LoadingTask {
    case inProgress(Task<Data?, Error>)
    case loaded(Data)
  }

  private var cache: [UUID: LoadingTask] = [:]
  private let remoteCache: RemoteCache

  init(remoteCache: RemoteCache) {
    self.remoteCache = remoteCache
  }

  func read(_ key: UUID) async -> Data? {
    print(" cache read called for \(key)")
    defer {
      print(" cache read finished for \(key)")
    }

    // we have the data, no need to go to the network
    if case let .loaded(data) = cache[key] {
      return data
    }

    // a previous call started loading the data
    if case let .inProgress(task) = cache[key] {
      return try? await task.value
    }

    // we don't have the data and we're not already loading it
    do {
      let task: Task<Data?, Error> = Task {
        guard let data = try await remoteCache.read(key) else {
          return nil
        }

        return data
      }

      cache[key] = .inProgress(task)
      if let data = try await task.value {
        cache[key] = .loaded(data)
        return data
      } else {
        cache[key] = nil
        return nil
      }
    } catch {
      return nil
    }
  }

  func write(_ key: UUID, data: Data) async {
    print(" cache write called for \(key)")
    defer {
      print(" cache write finished for \(key)")
    }

    do {
      try await remoteCache.write(key, data: data)
    } catch {
      // failed to store the data on the remote cache
    }
    cache[key] = .loaded(data)
  }
}

I explain this approach more deeply in my post on building a token refresh flow with actors as well as my post on building a custom async image loader so I won’t go into too much detail here.

When we run the same test that we ran before, the result looks like this:

 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read called for DDFA2377-C10F-4324-BBA3-68126B49EB00
 attempt to read remote cache for DDFA2377-C10F-4324-BBA3-68126B49EB00
 remote cache HIT for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00
 cache read finished for DDFA2377-C10F-4324-BBA3-68126B49EB00

We start multiple cache reads, this is actor reentrancy in action. But because we’ve retained the loading task so it can be reused, we only make a single network call. Once that call completes, all of our reentrant cache read actions will receive the same output from the task we created in the first call.

The point is that we can rely on actors doing one thing at a time to update some mutable state before we hit our await. This state will then tell reentrant calls that we’re already working on a given task and that we don’t need to make another (in this case) network call.

Things become trickier when you try and make your actor into a serial queue that runs async tasks. In a future post I’d like to dig into why that’s so tricky and explore possible solutions.

In Summary

Actor reentrancy is a feature of actors that can lead to subtle bugs and unexpected results. Due to actor reentrancy we need to be very careful when we’re adding async methods to an actor, and we need to make sure that we think about what can and should happen when we have multiple, reentrant, calls to a specific function on an actor.

Sometimes this is completely fine, other times it’s wasteful but won’t cause problems. Other times, you’ll run into problems that arise due to certain state on your actor being changed while your function was suspended. Every time you await something inside of an actor it’s important that you ask yourself whether you’ve made any state related assumptions before your await that you need to reverify after your await.

Step one to avoiding reentrancy related issues is to understand what it is, and have a sense of how you can solve problems when they arise. Unfortunately there’s no single solution that fixes every reentrancy related issue. In this post you saw that holding on to a task that encapsulates work can prevent multiple network calls from being made.

Have you ever run into a reentrancy related problem yourself? And if so, did you manage to solve it? I’d love to hear from you on Twitter or Mastodon!

Subscribe to my newsletter