Running tasks in parallel with Swift Concurrency’s task groups

Published on: August 5, 2021

With Apple's overhaul of how concurrency will work in Swift 5.5 and newer, we need to learn a lot of things from scratch. While we might have used DispatchQueue.async or other mechanisms to kick off multiple asynchronous tasks in the past, we shouldn't use these older concurrency tools in Swift's new concurrency model.

Luckily, Swift Concurrency comes with many features already which means that for a lot of our old uses cases, a new paradigm exists.

In this post, you will learn what Swift Concurrency's task groups are, and how you can use them to concurrently perform a lot of work.

Understanding when you should use a task group

Before I show you how you can use a task group, I'd like to explain when a task group is most likely the correct tool for your job. Or rather, I'd like to explain the problem that task groups were designed to solve.

Consider the following example.

Let's say that you fetched a list of ids from your server. These ids represent the ids of movies that your user has marked as a favorite. By returning ids instead of full-blown movie objects, your user can save a lot of data, assuming that clients can (and will) cache movie objects locally. This allows you to either look up a movie in your local cache, or to fetch the movie from the server if needed.

The code to fetch these movie ids might look a bit like this:

func getFavoriteIds(for user: User) async -> [UUID] {
    return await network.fetchUserFavorites(for: user)
}

func fetchFavorites(user: User) async -> [Movie] {
    // fetch Ids for favorites from a remote source
    let ids = await getFavoriteIds(for: user)

    // perform work to obtain `[Movie]`
}

So far so good. If you're somewhat familiar with Swift Concurrency's async/await concept this code shouldn't look too scary.

Now that we have an array of UUID, we need to somehow convert this array to Movie objects. In this case, I don't care about the order of the ids and the resulting movies matching. And I don't want to fetch movies one by one because that might take a while.

I'd like to fetch as many movies at the same time as I possibly can.

This sentence above is essentially the key to knowing when we should use a task group.

In this case, I want to run a variable number of tasks concurrently, and every task produces the same type of output. This use case is exactly what task groups are good at. They allow you to spawn as many tasks as you want, and all of these tasks will run concurrently. One constraint is that every task must produce the same output. In this case, that's not a problem. We want to convert from UUID to Movie every time, which means that our task will always produce the same output.

Let's take a look at an example.

Making use of task group

Task groups can either be throwing or non-throwing. This might sound obvious, but the (non-)throwing nature of your task group has to be defined when you create it. In this case, I'm going to use a non-throwing task group. Let's see how a task group can be created.

Defining a task group

A task group can be created as follows:

await withTaskGroup(of: Movie.self) { group in

}

The withTaskGroup function is a global function in Swift that takes two arguments. The first argument specifies the type of result that your tasks produce. If your tasks don't have any output, you would write Void.self here since that would be the return type for each individual task. In this case, it's Movie.self because all tasks will produce a Movie instance.

If the tasks in a task group can throw errors, you should use withThrowingTaskGroup instead of withTaskGroup.

The second argument is a closure in which we'll schedule and handle all of our tasks. This closure receives an instance of TaskGroup<Output> as its only argument. The Output generic will correspond with your task output. So in this case the actual type would be TaskGroup<Movie>.

The withTaskGroup function is marked async which means that need to await its result. In this case, we don't return anything from the closure that we pass to withTaskGroup. If we did return something, the returned object would be the return value for the call to withTaskGroup and we could assign this output to a property or return it from a function.

In this case, we'll want to return something from fetchFavorites. Here's what that looks like:

func fetchFavorites(user: User) async -> [Movie] {
    // fetch Ids for favorites from a remote source
    let ids = await getFavoriteIds(for: user)

    // load all movies concurrently
    return await withTaskGroup(of: Movie.self) { group in
        var movies = [Movie]()

        // obtain movies

        return movies
    }
}

While this code compiles just fine, it's not very useful. Let's add some tasks to our task group so we can fetch movies.

Adding tasks to a task group

The TaskGroup object that is passed to our closure is used to schedule tasks in the group, and also to obtain the results of these tasks if needed. Let's see how we can add tasks to the group first, and after that I'll show you how you can obtain the results of your tasks by iterating over the group's results.

To load movies, we'll call the following async function from a new task. This function would be defined alongside fetchFavorites and getFavoriteIds:

func getMovie(withId id: UUID) async -> Movie {
    return await network.fetchMovie(withId: id)
}

To call this function from within a new task in the task group, we need to call addTask on the TaskGroup as follows:

func fetchFavorites(user: User) async -> [Movie] {
    // fetch Ids for favorites from a remote source
    let ids = await getFavoriteIds(for: user)

    // load all movies concurrently
    return await withTaskGroup(of: Movie.self) { group in
        var movies = [Movie]()

        // adding tasks to the group and fetching movies
        for id in ids {
            group.addTask {
                return await self.getMovie(withId: id)
            }
        }

        return movies
    }
}

I added a for loop to the task group closure to iterate over the ids that were fetched. For every fetched id I call group.addTask and pass it a closure that contains my task. This closure is async which means that we can await the result of some function call. In this case I want to await and return the result of self.getMovie. Note that I don't need to capture self weakly in the closure I pass to addTask. The reason for this is that the task I created can never outlive the scope it's defined in (more on that later), this means that no retain cycles are created here. The Swift compiler guarantees that our tasks don't outlive the scope they're defined in so we can be absolutely sure that our tasks don't create retain cycles.

Every task that's added to the task group with group.addTask must return a Movie instance because that's the task output type that we passed to withTaskGroup. As soon as a task is added to the task group it will beginning running concurrently with any other tasks that I may have already added to the group.

You might notice that while I add a bunch of tasks to the group, I never actually await or return the output of my tasks. To do this, we need to iterate asynchronously over the task group and obtain the results of its tasks. The TaskGroup object conforms to AsyncSequence which means that we can iterate over it using for await as follows:

func fetchFavorites(user: User) async -> [Movie] {
    // fetch Ids for favorites from a remote source
    let ids = await getFavoriteIds(for: user)

    // load all favorites concurrently
    return await withTaskGroup(of: Movie.self) { group in
        var movies = [Movie]()
        movies.reserveCapacity(ids.count)

        // adding tasks to the group and fetching movies
        for id in ids {
            group.addTask {
                return await self.getMovie(withId: id)
            }
        }

        // grab movies as their tasks complete, and append them to the `movies` array
        for await movie in group {
            movies.append(movie)
        }

        return movies
    }
}

By using for await movie in group the task group will provide us with movies as soon as they are obtained. Note that the results will be gathered in completion order. In other words, whichever movie is fully fetched first, will be returned first. The order in which we added tasks to the group does not necessarily matter. Although for very small/quick tasks it may happen that completion order can be the same as the order in which we added the tasks but this is never guaranteed. This is why I mentioned I didn't care about ordering earlier.

Whenever a task completes, the group provides us with the task output, and we can append this output to the movies array. Once all tasks are completed and we have appended all output to the movies array, we return this array from our task group closure.

This means that we can return the result of awaiting withTaskGroup from fetchFavorites since the output is an array of movies.

Note that we don't return from the closure that's provided to withTaskGroup until all tasks have completed due to the asynchronous for loop. This loop doesn't complete until all tasks in the group complete, and all output has been provided to us. Of course, we could exit our loop early with a break just like you can in a normal loop.

The example you've seen so far follows a pretty happy path. Let's consider two additional situations, in which we'll have to deal with errors thrown by the tasks that were added to the group:

  1. One of the tasks throws an error
  2. The task group is cancelled

Task groups and throwing tasks

I already mentioned that a task group for tasks that can throw should be created with withThrowingTaskGroup. We'd need to do this if the getMovie function you saw earlier could throw an error. If it could, it would look like this:

func getMovie(withId id: UUID) async throws -> Movie {
    return try await network.fetchMovie(withId: id)
}

The code to fetch a user's favorite movies would in turn be updated as follows:

func fetchFavorites(user: User) async throws -> [Movie] {
    // fetch ids for favorites from a remote source
    let ids = await getFavoriteIds(for: user)

    // load all favorites concurrently
    return try await withThrowingTaskGroup(of: Movie.self) { group in
        var movies = [Movie]()
        movies.reserveCapacity(ids.count)

        // adding tasks to the group and fetching movies
        for id in ids {
            group.addTask {
                return try await self.getMovie(withId: id)
            }
        }

        // grab movies as their tasks complete, and append them to the `movies` array
        for try await movie in group {
            movies.append(movie)
        }

        return movies
    }
}

The changes we needed to make to handle throwing tasks are relatively small. All I had to do was to add try where appropriate, and use withThrowingTaskGroup instead of withTaskGroup. However, there's a huge difference here in terms of what might happen.

In this example, I'm fetching movies by calling try await self.getMovie(withId: id). This means that the getMovie operation might throw an error. When it does, it's not a big deal per se. A task can fail without impacting any of the other tasks in the task group. This means that failing to load one of the movie does not necessarily impact the other tasks in my task group. However, because I iterate over the fetched movies using for try await movie in group, a single failure does impact other tasks in my group.

As we iterate over the group's results, a failed task also counts as a result. However, when the group's next() function is called internally to obtain the next result, it will throw the error that was thrown by the failing task so we can inspect and handle it if needed. In a for loop, I can only write try await which means that when the group throws an error from its next() function, this error is thrown out from the withThrowingTaskGroup closure since we don't handle (or ignore) it.

When an error is thrown from the closure provided to withThrowingTaskGroup, the task group will fail with that error. Before this error is thrown, the task group will mark any unfinished tasks as cancelled to allow them to stop executing work as soon as possible in order to comply with Swift Concurrency's cooperative cancellation. Once all tasks have completed (either by finishing their work or throwing an error), the task group will throw its error and complete.

In the example we're working with here, we can prevent a single failure from cancelling all in progress work. The solution would be to make sure the closure I pass to addTask doesn't throw. I could handle the errors thrown by getMovie and return some kind of default movie which probably isn't the best solution, or I could return nil. If returning nil is reasonable for your use case, you could also write try? await self.getMovie(withId: id) to ignore the error and return nil instead of handling the error in a do {} catch {} block.

Depending on how the tasks you add to your task group were written, cancelling one of your tasks might have a similar effect. In Swift Concurrency, it's perfectly acceptable to throw an error from a task when it's cancelled. This means that if your task throws a cancellation error, it could propagate through your task group in the exact same way that other thrown errors propagate through your task group if it ends up being thrown out of your withThrowingTaskGroup closure.

The bottom line here is that individual tasks throwing errors do not impact the task group and its enclosing task per se. It's only when this error ends up being thrown from your withThrowingTaskGroup closure that all unfinished tasks get cancelled, and the original error is thrown from the task group's task once all child tasks have finished. All this talk about errors and completing the task group's task segues nicely into the last topic I want to cover; the lifecycle of your task group's tasks.

Understanding the lifecycle of tasks in a task group

When you add tasks in a task group, you enter into a very important (explicit) contract. Swift's concurrency mechanisms are structured (pun intended) around the concept of Structured Concurrency. Async lets as well as task group child tasks both adhere to this idea.

The core idea behind structured concurrency is that a task cannot outlive the scope of its parent task. And similarily, no TaskGroup child task may outlive the scope of the withTaskGroup closure. This is achieved by implicitly awaiting on all tasks to complete before returning from the closure you pass to withTaskGroup.

When you know that tasks in a group cannot outlive the group they belong to, the error throwing / cancellation strategy I outlined above makes a lot of sense.

Once the task that manages the group throws an error, the scope of the task group has completed. If we still have running tasks at that time, the tasks would outlive their group which isn't allowed. For that reason, the task group will first wait for all of its tasks to either complete or throw a cancellation error before throwing its own error and exitting its scope.

When thinking of the code you've seen in this post, I've awaited the results of all child tasks explicitly by iterating over the group. This means that by the time we hit return movies all tasks are done already and no extra waiting is needed.

However, we don't have to await the output of our tasks in all cases. Let's say we have a bunch of tasks that don't return anything. We'd only write the following:

print("Before task group")
await withTaskGroup(of: Void.self) { group in
    for item in list {
        group.addTask {
            await doSomething()
            print("Task completed")
        }
    }
    print("For loop completed")
}
print("After task group")

Like I explained earlier, the task group's child tasks are always implicitly awaited before exitting the closure in which they were created in order to comply with the requirements of structured concurrency. This means that even if we don't await the result of our tasks, the tasks are guaranteed to be completed when we exit the withTaskGroup closure.

I've added some prints to the code snippet before to help you see this principle in action. When I run the code above, the output would look a bit like this:

print("Before task group")
print("For loop completed")
print("Task completed")
print("Task completed")
print("Task completed")
print("After task group")

The reason for that is the implicit awaiting of tasks in a group I just mentioned. The task group is not allowed to complete before all of the tasks it manages have also completed.

In Summary

In this post you learned a lot. You learned that tasks groups are a tool to concurrently perform an arbitrary number of tasks that produce the same output. I showed you how you can write a basic task group to concurrently fetch an arbitrary number of movies based on their ids as an example. You learned that task groups will run as many tasks at once as possible, and that you can obtain the results of these tasks using an async for loop.

After that, I explained how errors and cancellation work within a task group. You learned that whenever a task throws an error you can either handle or ignore this error. You also saw that if you throw an error from your task group closure, this will cause all unfinished tasks in the group to be marked as cancelled, and you learned that the original error will be thrown from task group once all tasks have completed.

Lastly, I explained how tasks within a task group cannot outlive the task group due to the guarantees made by Swift Concurrency, and that a task group will implicitly await all of its child tasks before completing to make sure none of its tasks are still running by the time the task group completes.

Huge thanks to Konrad for reviewing this post and providing some important corrections surrounding errors and cancellation.