Swift Testing: ready for prime time?

March 9, 2025

I took the new Swift Testing framework out for a spin and used it seriously on a small project.

Conclusion: to me, Swift Testing isn’t ready for serious production use, because either it’s buggy in some important ways, or Apple’s documentation is not specific enough about Testing’s design philosophy and how you should (or should not) use certain features.

My time ratio for “writing tests” to “work out why Testing is crashing” is 1:2 (or worse) at this point.

Testing tools should be well understood and well presented; we trust them with a lot. I think if issues are found with a new framework like Testing, this should be mentioned in the framework documentation.¹

The problem: parameterised tests

One feature of Testing that caught my eye was parameterised tests. It offers a way to split up the setup a list of test-case-specific data from executing a test on that data:

    @Test("bool negation operator", arguments: [
        (true, true),   // control case (no negation used)
        (!true, false)  // negate
    ])
    func testBoolNegation(lhs: Bool, rhs: Bool) {
        #expect(lhs == rhs)
    }

This simple example is verifying that the bool negation operator works as expected on the true value.

This test compiles and runs fine.

Now suppose we also want to check that negating true twice gives true:

    @Test("bool negation operator", arguments: [
        (true, true),     // control case (no negation used)
        (!true, false),   // negate once
        (!(!true), true), // negate twice    <----- new item
    ])
    func testBoolNegation(lhs: Bool, rhs: Bool) {
        #expect(lhs == rhs)
    }

But now we get a fatal crash:

libswiftCore.dylib`_swift_runtime_on_report:
 Thread 5: Fatal error: Internal inconsistency: No test reporter for test case argumentiDs:
 Optional ([Testing.Test.Case.Argument.ID(bytes: [102, 97, 108, 115, 101]),
   Testing.Test.Case.Argument.ID(bytes: [102, 97, 108, 115, 101])])
   in test ScratchTests/testBoolNegation(Ihs:rhs:)/ScratchTests.swift:15:6

What on earth is causing this? How can we tell the problem from that crash info?

A lot of head scratching, experimentation, and web searches was how, in my case.

Eventually I found out that the problem is that the parameter list contains two lines that evaluate to the same tuple:

        (true, true)      <evaluates to: (true, true)>
        (!(!true), true)  <evaluates to: (true, true)>

Oh. That’s weird. The implication of this, if it’s not a bug in Testing, seems to be:

You’re not meant to process arguments inside the argument list in any way; instead you must pass in a collection of unique basic data items and do all processing in the func body.

So the Test macro deals with parameter collections but it is using hashing identity semantics in some way.

This is a bit awkward. Here’s how we’d rewrite our test to honour this idea, if we still wanted to use parameter lists:

    @Test("bool negation", arguments: [
        (true, false),
        (false, true)
    ])
    func testBoolNegation(lhs: Bool, rhs: Bool) {
        #expect(!lhs == rhs)
    }

    @Test("bool double negation", arguments: [
        (true, true),
        (false, false)
    ])
    func testBoolDoubleNegation(lhs: Bool, rhs: Bool) {
        #expect(!(!lhs) == rhs)
    }

But this seems pointlessly verbose. Better off just not using a parameter list in this case?

Have I misunderstood how I’m supposed to use parameter lists in Testing framework? I looked at the Apple docs and I can’t find any suggestion to avoid using operations when defining items in the parameter list.

From posts in the Swift Evolution forums I can see that this and other issues have been encountered, and apparently Apple are looking at it internally.

Slow type inference

In their docs for Testing, Apple show examples that use type inference (without any proviso that I can see).

But be careful, some parameter lists using type inference can cause very slow compilation:

    @Test("CGPoint operation on .zero", arguments: [
        (CGPoint.zero, ...),
        (.zero, ...),
        (.zero, ...),
        ...

I gave up waiting for the compiler to come back when I tried compiling examples like this. If we switch to using CGPoint.zero explicitly on every line, the compilation becomes quick again.²

This is something that the Testing docs could also advise on (even if this is generally down to the typing system rather than Testing itself).

Loose typing issues

The lack of explicit typing with tuples in the parameter list – which potentially gives the compiler lots of work to do, see above – also means that if you make mistakes like misplacing a bracket in the params list, or swapping two different types around on one line, you’ll get an unhelpful Any error:³

Type 'Any' does not conform to the 'Sendable' protocol; this is an error in the Swift 6 language mode

And this error isn’t tied down to any one line, of course, it just applies to the param list as a whole.

A hacky fix: generic tuple wrappers

Below I provide a bit of a hack to fix the issues:

it allows you to use the same (post-evaluation) value multiple times in the parameter list
it gives you specific a error and location for mistakes in param list
type inference can be used

It uses generics to firm up the typing via some tuple wrappers.

public struct Single<T>: Sendable, UniqueHash where T: Sendable {
    let uniqueID = UUID()
    public let a: T

    public init(_ a: T) {
        self.a = a
    }
}

public struct Pair<T, U>: Sendable, UniqueHash where T: Sendable, U: Sendable {
    let uniqueID = UUID()
    public let a: T
    public let b: U

    public init(_ a: T, _ b: U) {
        self.a = a
        self.b = b
    }
}

public struct Triple<T, U, V>: Sendable, UniqueHash where T: Sendable, U: Sendable, V: Sendable {
    let uniqueID = UUID()
    public let a: T
    public let b: U
    public let c: V

    public init(_ a: T, _ b: U, _ c: V) {
        self.a = a
        self.b = b
        self.c = c
    }
}

The key part here is the let uniqueID = UUID() line. This ensures the hash of every param list item is unique.

Here’s how to use Pair to fix the double-negation test from earlier:

    @Test("bool negation operator (fixed with wrapper)", arguments: [
        Pair(true, true),     // control case (no negation used)
        Pair(!true, false),   // negate once
        Pair(!(!true), true), // negate twice
    ])
    func testBoolNegation(boolPair: Pair<Bool, Bool>) {
        #expect(boolPair.a == boolPair.b)
    }

This test now passes. And if you make a mistake in the params list, the compiler will show you exactly which line.

You can also use inference again, e.g. .zero instead of writing CGPoint.zero everywhere.

💡

This hacky fix might stop working at any time.

And arguably you’re better off just not using parameter lists when they’re not a good fit. Hopefully the Testing team will eventually comment on, document or address this issue.

Removing the repetition of tuple wrappers

If having a tuple wrapper on every arg list line is annoying, you could use map to improve things:

    @Test("bool negation operator (fixed with wrapper)", arguments: [
        (true, true),
        (!true, false),
        (!(!true), true)
    ].map(Pair.init))    
    func testBoolNegation(boolPair: Pair<Bool, Bool>) {
        #expect(boolPair.a == boolPair.b)
    }

This works because map can unpack tuples into individual arguments for init methods.

However, there’s a downside: if we use map, we lose the precise pinpointing of mistakes in the argument list.

CustomTestArgumentEncodable

The Testing framework provides this protocol and seems to hint that it can be used to uniquely identify test parameters, but I’ve not played with it yet.

Update

I filed an issue for this problem and it’s a dupe of this: https://github.com/swiftlang/swift-testing/pull/1000. An Xcode bug is also related (rdar://121455205); I can’t find it on OpenRadar at this point.

and I mean signposted on the API documentation/up front website here, not buried in Swift Evolution forums and github issues ↩︎
the issue of using type inference wisely isn’t just limited to this example, of course. Over-use of type inference can add a reasonable amount of time to your compilation; sometimes it’s just not worth giving the compiler that extra work for slightly terser code. You can investigate this stuff using the compilation time tools available in Instruments. But it might be nice if warnings about slow compilation had more granularity than just “The compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions” ↩︎
this Any error happens for a good reason: the compiler is trying to infer the type of a homogenous collection (all items of the same type), but if something is out of place, the best it can do is Any. This leaves you completely in the dark as to where your mistake is ↩︎