What's New / Apple Intelligence, ML & Evaluation

What's new in Evaluations

+99 NewiOS · macOS · watchOS · visionOS

Evaluations is a framework for measuring the behavior of models and other systems against expected results. Its API defines an Evaluation over an EvaluationSubject, runs it through an Evaluator, and scores the EvaluationResult with metrics and aggregation operations.

The 27 SDK adds the entire framework: 99 new APIs, no deprecations or removals. Core types arrive together: the Evaluation and EvaluationSubject protocols, the Evaluator struct, EvaluationContext, EvaluationResult, EvaluationError, and EvaluationTrait. Scoring comes from AggregateMetric and the AggregationOperation enum. ArgumentMatcher and ArgumentValue (with literal-type aliases such as StringLiteralType and IntegerLiteralType) match arguments, and ArrayLoader supplies input data.

New

99
struct

AggregateMetric

NewiOSmacOSvisionOSwatchOS
public struct AggregateMetric : Sendable, Codable, Equatable

An aggregate statistic computed from a metric's results across the evaluation dataset.

let accuracy = Metric("Accuracy")
let op = AggregationOperation.mean(of: accuracy)
print(op.label) // "Mean of Accuracy"

The summary DataFrame stores one AggregateMetric for each column. Each value records the operation that produced it, and derives its display label and source metric name from the operation.

Declaration
public struct AggregateMetric : Sendable, Codable, Equatable {

    /// The aggregation operation that produced this value.
    public let operation: AggregationOperation

    /// The group this aggregate belongs to, if any.
    public let group: String?

    /// The aggregate value.
    public let value: Double

    /// The display label for this aggregate.
    public var label: String { get }

    /// The name of the source metric.
    public var sourceMetric: String? { get }

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (a: AggregateMetric, b: AggregateMetric) -> Bool

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws
}
enum

AggregationOperation

NewiOSmacOSvisionOSwatchOS
public enum AggregationOperation : Sendable, Equatable

The type of aggregation operation used to compute a summary statistic.

Each case pairs a statistical function with the Metric it operates on, except custom(label:) which represents a custom computation.

Declaration
public enum AggregationOperation : Sendable, Equatable {

    /// The arithmetic mean of the metric's values.
    case mean(of: Metric)

    /// The median of the metric's values.
    case median(of: Metric)

    /// The mode of the metric's values.
    case mode(of: Metric)

    /// The minimum of the metric's values.
    case minimum(of: Metric)

    /// The maximum of the metric's values.
    case maximum(of: Metric)

    /// The standard deviation of the metric's values.
    case standardDeviation(of: Metric)

    /// The variance of the metric's values.
    case variance(of: Metric)

    /// A custom aggregation identified by its label.
    case custom(label: String)

    /// The display label derived from this operation.
    public var label: String { get }

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (a: AggregationOperation, b: AggregationOperation) -> Bool
}
extension

AggregationOperation

NewiOSmacOSvisionOSwatchOS
extension AggregationOperation : Codable
Declaration
extension AggregationOperation : Codable {

    /// Encodes the operation as a keyed container with a `type` discriminator
    /// and either `metric` (the metric name) or `label` (for custom operations).
    public func encode(to encoder: any Encoder) throws

    /// Decodes an operation from a keyed container, reconstructing the metric from its name.
    public init(from decoder: any Decoder) throws
}
enum

ArgumentMatcher

NewiOSmacOSvisionOSwatchOS
public enum ArgumentMatcher : Sendable, Codable

The values that define how to validate a tool-call argument.

Use argument matchers to specify validation rules for tool-call arguments. You can require exact values, verify key presence, check ranges, match patterns, or use a language model for semantic matching.

For example:

let matchers: [ArgumentMatcher] = [
    .exact(argumentName: "city", value: "San Francisco"),
    .keyOnly(argumentName: "units"),
    .naturalLanguage(argumentName: "prompt", criteria: "A weather-related question")
]
Declaration
public enum ArgumentMatcher : Sendable, Codable {

    /// A value that indicates that the argument must be present with this exact key and value.
    case exact(argumentName: String, value: ArgumentValue)

    /// A value that indicates that the argument must be present with this key and no specific value.
    case keyOnly(argumentName: String)

    /// A value that indicates the argument must be present with a value that matches one of the allowed values.
    case oneOf(argumentName: String, allowedValues: [ArgumentValue])

    /// A value that indicates that the argument must be present and its numeric value must be within the specified range.
    case range(argumentName: String, minimum: Double?, maximum: Double?)

    /// A value that indicates that the argument must be present and its string value must match the specified regex pattern.
    case pattern(argumentName: String, regex: String)

    /// A value that indicates that the argument must be present and its string value must contain the specified substring.
    case contains(argumentName: String, substring: String)

    /// A value that indicates that the argument must be present and its string value must start with the specified prefix.
    case hasPrefix(argumentName: String, prefix: String)

    /// A value that indicates that the argument must be present and its string value must end with the specified suffix.
    case hasSuffix(argumentName: String, suffix: String)

    /// A value that indicates that the argument must be present and semantically match the given criteria. 
    case naturalLanguage(argumentName: String, criteria: String)

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// A representation of partially generated content
    nonisolated public enum PartiallyGenerated : nonisolated ConvertibleFromGeneratedContent {

        case exact(argumentName: String.PartiallyGenerated?, value: ArgumentValue.PartiallyGenerated?)

        case keyOnly(argumentName: String.PartiallyGenerated?)

        case oneOf(argumentName: String.PartiallyGenerated?, allowedValues: Array<ArgumentValue>.PartiallyGenerated?)

        case range(argumentName: String.PartiallyGenerated?, minimum: Optional<Double>.PartiallyGenerated?, maximum: Optional<Double>.PartiallyGenerated?)

        case pattern(argumentName: String.PartiallyGenerated?, regex: String.PartiallyGenerated?)

        case contains(argumentName: String.PartiallyGenerated?, substring: String.PartiallyGenerated?)

        case hasPrefix(argumentName: String.PartiallyGenerated?, prefix: String.PartiallyGenerated?)

        case hasSuffix(argumentName: String.PartiallyGenerated?, suffix: String.PartiallyGenerated?)

        case naturalLanguage(argumentName: String.PartiallyGenerated?, criteria: String.PartiallyGenerated?)

        /// Creates an instance from content generated by a model.
        ///
        /// Conformance to this protocol is provided by the `@Generable` macro.
        /// A manual implementation may be used to map values onto properties using
        /// different names. To manually initialize your type from generated content,
        /// decode the values as shown below:
        ///
        /// ```swift
        /// struct Person: ConvertibleFromGeneratedContent {
        ///     var name: String
        ///     var age: Int

Truncated.

extension

ArgumentMatcher

NewiOSmacOSvisionOSwatchOS
extension ArgumentMatcher : nonisolated Generable
Declaration
extension ArgumentMatcher : nonisolated Generable {

    /// Creates an instance from content generated by a model.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. To manually initialize your type from generated content,
    /// decode the values as shown below:
    ///
    /// ```swift
    /// struct Person: ConvertibleFromGeneratedContent {
    ///     var name: String
    ///     var age: Int
    ///
    ///     init(_ content: GeneratedContent) {
    ///         self.name = try content.value(forProperty: "firstName")
    ///         self.age = try content.value(forProperty: "ageInYears")
    ///     }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleToGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleToGeneratedContent/generatedContent``.
    ///
    /// - SeeAlso: `@Generable` macro ``Generable(description:)``
    nonisolated public init(_ content: GeneratedContent) throws
}
enum

ArgumentValue

NewiOSmacOSvisionOSwatchOS
public enum ArgumentValue : Sendable, Codable, Hashable

A primitive value type for argument specifications that is @Generable.

let city: ArgumentValue = "San Francisco"
let count: ArgumentValue = 5
let score: ArgumentValue = 0.95

Unlike StructuredValue, this enum only contains primitive types (no recursive array/dictionary) which allows it to work with the @Generable macro.

Declaration
public enum ArgumentValue : Sendable, Codable, Hashable {

    /// A string value.
    case string(String)

    /// An integer value.
    case int(Int)

    /// A double-precision floating-point value.
    case double(Double)

    /// A Boolean value.
    case bool(Bool)

    /// The equivalent structured value representation of this argument value.
    public var structuredValue: StructuredValue { get }

    /// An instance of the generation schema.
    nonisolated public static var generationSchema: GenerationSchema { get }

    /// This instance represented as generated content.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. Use the generated content property as shown below, to
    /// manually return a new ``GeneratedContent`` with the properties you specify.
    ///
    /// ```swift
    /// struct Person: ConvertibleToGeneratedContent {
    ///    var name: String
    ///    var age: Int
    ///
    ///    var generatedContent: GeneratedContent {
    ///        GeneratedContent(properties: [
    ///            "firstName": name,
    ///            "ageInYears": age
    ///        ])
    ///    }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleFromGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleFromGeneratedContent/init(_:)``.
    nonisolated public var generatedContent: GeneratedContent { get }

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (a: ArgumentValue, b: ArgumentValue) -> Bool

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// Hashes the essential components of this value by feeding them into the
    /// given hasher.
    ///
    /// Implement this method to conform to the `Hashable` protocol. The
    /// components used for hashing must be the same as the components compared
    /// in your type's `==` operator implementation. Call `hasher.combine(_:)`
    /// with each of these components.
    ///
    /// - Important: In your implementation of `hash(into:)`,
    ///   don't call `finalize()` on the `hasher` instance provided,
    ///   or replace it with a different instance.
    ///   Doing so may become a compile-time error in the future.
    ///
    /// - Parameter hasher: The hasher to use when combining the components

Truncated.

extension

ArgumentValue

NewiOSmacOSvisionOSwatchOS
extension ArgumentValue : nonisolated Generable
Declaration
extension ArgumentValue : nonisolated Generable {

    /// Creates an instance from content generated by a model.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. To manually initialize your type from generated content,
    /// decode the values as shown below:
    ///
    /// ```swift
    /// struct Person: ConvertibleFromGeneratedContent {
    ///     var name: String
    ///     var age: Int
    ///
    ///     init(_ content: GeneratedContent) {
    ///         self.name = try content.value(forProperty: "firstName")
    ///         self.age = try content.value(forProperty: "ageInYears")
    ///     }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleToGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleToGeneratedContent/generatedContent``.
    ///
    /// - SeeAlso: `@Generable` macro ``Generable(description:)``
    nonisolated public init(_ content: GeneratedContent) throws
}
extension

ArgumentValue

NewiOSmacOSvisionOSwatchOS
extension ArgumentValue : ExpressibleByStringLiteral
Declaration
extension ArgumentValue : ExpressibleByStringLiteral {

    /// Creates an instance initialized to the given string value.
    ///
    /// - Parameter value: The value of the new instance.
    public init(stringLiteral value: String)

    /// A type that represents an extended grapheme cluster literal.
    ///
    /// Valid types for `ExtendedGraphemeClusterLiteralType` are `Character`,
    /// `String`, and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias ExtendedGraphemeClusterLiteralType = String

    /// A type that represents a string literal.
    ///
    /// Valid types for `StringLiteralType` are `String` and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias StringLiteralType = String

    /// A type that represents a Unicode scalar literal.
    ///
    /// Valid types for `UnicodeScalarLiteralType` are `Unicode.Scalar`,
    /// `Character`, `String`, and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias UnicodeScalarLiteralType = String
}
extension

ArgumentValue

NewiOSmacOSvisionOSwatchOS
extension ArgumentValue : ExpressibleByIntegerLiteral
Declaration
extension ArgumentValue : ExpressibleByIntegerLiteral {

    /// Creates an instance initialized to the specified integer value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using an integer literal. For example:
    ///
    ///     let x = 23
    ///
    /// In this example, the assignment to the `x` constant calls this integer
    /// literal initializer behind the scenes.
    ///
    /// - Parameter value: The value to create.
    public init(integerLiteral value: Int)

    /// A type that represents an integer literal.
    ///
    /// The standard library integer and floating-point types are all valid types
    /// for `IntegerLiteralType`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias IntegerLiteralType = Int
}
extension

ArgumentValue

NewiOSmacOSvisionOSwatchOS
extension ArgumentValue : ExpressibleByFloatLiteral
Declaration
extension ArgumentValue : ExpressibleByFloatLiteral {

    /// Creates an instance initialized to the specified floating-point value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using a floating-point literal. For example:
    ///
    ///     let x = 21.5
    ///
    /// In this example, the assignment to the `x` constant calls this
    /// floating-point literal initializer behind the scenes.
    ///
    /// - Parameter value: The value to create.
    public init(floatLiteral value: Double)

    /// A type that represents a floating-point literal.
    ///
    /// Valid types for `FloatLiteralType` are `Float`, `Double`, and `Float80`
    /// where available.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias FloatLiteralType = Double
}
extension

ArgumentValue

NewiOSmacOSvisionOSwatchOS
extension ArgumentValue : ExpressibleByBooleanLiteral
Declaration
extension ArgumentValue : ExpressibleByBooleanLiteral {

    /// Creates an instance initialized to the given Boolean value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using one of the Boolean literals `true` and `false`. For
    /// example:
    ///
    ///     let twasBrillig = true
    ///
    /// In this example, the assignment to the `twasBrillig` constant calls this
    /// Boolean literal initializer behind the scenes.
    ///
    /// - Parameter value: The value of the new instance.
    public init(booleanLiteral value: Bool)

    /// A type that represents a Boolean literal, such as `Bool`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias BooleanLiteralType = Bool
}
extension

Array

NewiOSmacOSvisionOSwatchOS
extension Array where Element == Metric
Declaration
extension Array where Element == Metric {

    /// Returns the first metric whose ``Metric/name`` equals the given metric's name, or `nil` if not found.
    public subscript(metric: Metric) -> Metric? { get }
}
extension

Array

NewiOSmacOSvisionOSwatchOS
extension Array where Element : ModelSampleProtocol, Element : Generable
Declaration
extension Array where Element : ModelSampleProtocol, Element : Generable {

    /// Generates synthetic data based on this dataset and returns a stream of new samples.
    ///
    /// For more control over generation, create a ``SampleGenerator`` directly.
    public func makeSamples(_ prompt: Prompt, targetCount: Int, sessionProvider: (@Sendable () -> LanguageModelSession)? = nil, validator: (nonisolated(nonsending) @Sendable (Element) async throws -> Bool)? = nil) -> some AsyncSequence<Element, any Error>

}
extension

Array

NewiOSmacOSvisionOSwatchOS
extension Array
Declaration
extension Array {

    /// Generates synthetic data based on this dataset and returns a stream of new samples.
    ///
    /// For more control over generation, create a ``SampleGenerator`` directly.
    public func makeSamples<T>(_ prompt: Prompt, targetCount: Int, sessionProvider: (@Sendable () -> LanguageModelSession)? = nil, validator: (nonisolated(nonsending) @Sendable (ModelSample<T>) async throws -> Bool)? = nil) -> some AsyncSequence<ModelSample<T>, any Error> where Element == ModelSample<T>, T : Generable, T : Decodable, T : Encodable, T : Sendable

}
struct

ArrayLoader

NewiOSmacOSvisionOSwatchOS
public struct ArrayLoader<Sample> : Loader where Sample : SampleProtocol

A loader backed by an in-memory array.

Declaration
public struct ArrayLoader<Sample> : Loader where Sample : SampleProtocol {

    /// Creates a loader backed by the given array of samples.
    public init(samples: [Sample])

    /// The async sequence for iteration during an evaluation run.
    public var stream: any AsyncSequence<Sample, any Error> { get }
}
extension

Collection

NewiOSmacOSvisionOSwatchOS
extension Collection where Self.Element == EvaluationResult
Declaration
extension Collection where Self.Element == EvaluationResult {

    /// Saves the array of evaluation results as a JSONL file
    ///
    /// - Parameters:
    ///   - url: The file URL to write the JSONL output to.
    ///   - includeReportMetadata: Whether to include report metadata in each entry. Defaults to `false`.
    /// - Returns: The URL of the saved file.
    @discardableResult
    public func saveJSONLines(to url: URL, includeReportMetadata: Bool = false) throws -> URL
}
extension

DataFrame

NewiOSmacOSvisionOSwatchOS
extension DataFrame
Declaration
extension DataFrame {

    /// Accesses a result column by its typed descriptor.
    public subscript<T>(column: ResultColumn<T>) -> Column<T> { get }

    /// Accesses a metric column using the metric's name
    public subscript(metric metric: Metric) -> Column<Metric> { get }
}
protocol

Evaluation

NewiOSmacOSvisionOSwatchOS
public protocol Evaluation : Sendable

A type that defines an evaluation.

Implement this protocol to create custom evaluations. The evaluation runs your system under test against a dataset and applies evaluators to measure performance.

struct MyEvaluation: Evaluation {
    let metric = Metric("Match")

    var dataset: ArrayLoader<ModelSample<String>> {
        ArrayLoader(samples: [
            ModelSample(prompt: "One plus one is...", expected: "Two.")
        ])
    }

    func subject(from sample: ModelSample<String>) async throws -> ModelSubject<String> {
        ModelSubject(value: "Two.")
    }

    var evaluators: Evaluators {
        Evaluator<ModelSample<String>> { sample, subject in
            let metric = Metric("Match")
            guard let expected = sample.expected else { return metric.ignore() }
            return subject.value == expected ? metric.passing() : metric.failing()
        }
    }

    func aggregateMetrics(using aggregator: inout MetricsAggregator) {
        aggregator.computeMean(of: metric)
    }
}
Declaration
public protocol Evaluation : Sendable {

    /// The type of input samples in the evaluation dataset.
    associatedtype Sample where Self.Sample == Self.SampleLoader.Sample, Self.Sample.ExpectedValue == Self.Subject.Value

    /// The type of the subject produced by the system under test.
    associatedtype Subject : EvaluationSubject

    /// The type of the sample loader used to provide the evaluation dataset.
    associatedtype SampleLoader : Loader

    /// The evaluation dataset.
    var dataset: Self.SampleLoader { get }

    /// Produces the subject of evaluation from a given sample.
    ///
    /// Implement this method to run your system under test and return the subject
    /// that evaluators will measure.
    ///
    /// - Parameter sample: The input sample.
    /// - Returns: The subject of evaluation.
    nonisolated(nonsending) func subject(from sample: Self.Sample) async throws -> Self.Subject

    /// The evaluators to apply to each subject/sample pair.
    @EvaluatorsBuilder<Self.Sample, Self.Subject> var evaluators: Self.Evaluators { get }

    /// Aggregates the collected metric results.
    /// - Parameter aggregator: The aggregator for computing statistics.
    func aggregateMetrics(using aggregator: inout MetricsAggregator)
}
extension

Evaluation

NewiOSmacOSvisionOSwatchOS
extension Evaluation
Declaration
extension Evaluation {

    /// A typed column descriptor for the input samples in the detailed DataFrame.
    public var inputColumn: ResultColumn<Self.Sample> { get }

    /// A typed column descriptor for the model responses in the detailed DataFrame.
    public var responseColumn: ResultColumn<Self.Subject> { get }

    /// A typed column descriptor for the expected values in the detailed DataFrame.
    public var expectedColumn: ResultColumn<Self.Sample.ExpectedValue> { get }
}
struct

EvaluationContext

NewiOSmacOSvisionOSwatchOS
public struct EvaluationContext : Sendable

A context that provides the evaluation result within a test scope.

Access the result via result after the evaluation completes.

Declaration
public struct EvaluationContext : Sendable {

    /// The current evaluation context within the active test scope.
    ///
    /// Accessing this property outside an evaluation scope triggers a fatal error.
    public static var current: EvaluationContext { get }

    /// The evaluation result.
    public let result: EvaluationResult
}
enum

EvaluationError

NewiOSmacOSvisionOSwatchOS
public enum EvaluationError : Error, LocalizedError

Errors thrown during an evaluation run.

do {
    throw EvaluationError.metricsNotFound(names: ["Accuracy"])
} catch EvaluationError.metricsNotFound(let names) {
    print("Missing metrics: \(names)")
}
Declaration
public enum EvaluationError : Error, LocalizedError {

    /// An evaluator received a subject without the required transcript.
    ///
    /// This occurs when using ``ToolCallEvaluator`` with a ``ModelSubject``
    /// that has a `nil` transcript. Pass `session.transcript.structuredTranscript`
    /// when creating the `ModelSubject`.
    case missingTranscript(evaluatorType: String)

    /// One or more metric columns were not found in the evaluation results.
    ///
    /// This occurs when ``MetricsAggregator`` references metrics that no
    /// evaluator produced during the run.
    case metricsNotFound(names: [String])

    /// A localized message describing what error occurred.
    public var errorDescription: String? { get }
}
struct

EvaluationResult

NewiOSmacOSvisionOSwatchOS
public struct EvaluationResult : Sendable

The results of running a model evaluation.

A structure that contains the summary and detailed results from an evaluation run.

Declaration
public struct EvaluationResult : Sendable {

    /// A unique identifier for this particular result.
    public let resultID: UUID

    /// The identifier of the evaluation that produced these results.
    public let evaluationID: String

    /// Aggregated statistics for each metric in the evaluation.
    public var summary: DataFrame { get }

    /// Individual results for each sample in the evaluation.
    public var detailed: DataFrame { get }

    /// Framework-generated metadata used for report presentation.
    public var reportMetadata: [String : any Sendable]

    /// User-defined information about this evaluation, such as the model name, prompt version, or dataset.
    public let evaluationInfo: [String : String]

    /// The time when the evaluation run started.
    public let startTime: Date

    /// The time when the evaluation run finished.
    public let endTime: Date

    /// The total duration of the evaluation run.
    public var duration: TimeInterval { get }

    /// Returns the first aggregate value matching the given operation, or `-1` if not found.
    ///
    /// - Parameter operation: The aggregation operation to match.
    public func aggregateValue(_ operation: AggregationOperation) -> Double

    /// A formatted description of summary metrics organized by groups.
    ///
    /// ## Example Output
    /// ```
    /// Text Matching:
    ///   Correct (%): 0.75
    ///   First Word Correct (%): 0.83
    ///
    /// Text Quality:
    ///   Ratio of Match Length: 0.92
    ///   Length Distribution: 0.014
    /// ```
    public var groupedSummary: String { get }
}
extension

EvaluationResult

NewiOSmacOSvisionOSwatchOS
extension EvaluationResult
Declaration
extension EvaluationResult {

    /// Loads an array of evaluation results from a JSONL file on disk.
    ///
    /// Each line in the file is expected to be a valid JSON object representing an evaluation result.
    /// - Parameter url: The file URL to read the JSONL data from.
    /// - Returns: An array of ``EvaluationResult``.
    nonisolated(nonsending) public static func loadJSONLines(from url: URL) async throws -> [EvaluationResult]
}
extension

EvaluationResult.DataFrameKind

NewiOSmacOSvisionOSwatchOS
extension EvaluationResult.DataFrameKind : Equatable
Declaration
extension EvaluationResult.DataFrameKind : Equatable {
}
extension

EvaluationResult.DataFrameKind

NewiOSmacOSvisionOSwatchOS
extension EvaluationResult.DataFrameKind : Hashable
Declaration
extension EvaluationResult.DataFrameKind : Hashable {
}
protocol

EvaluationSubject

NewiOSmacOSvisionOSwatchOS
public protocol EvaluationSubject<Value>

A type that represents the output produced by the system under test.

Conform to this protocol to define custom subject types. The primary concrete conformance is ModelSubject, which carries a value and an optional transcript for tool-call evaluation.

struct MySubject<Value: Codable>: EvaluationSubject {
    var value: Value
    var transcript: StructuredTranscript?
}
Declaration
public protocol EvaluationSubject<Value> {

    /// The type of the value produced by the system under test.
    associatedtype Value : Decodable, Encodable

    /// The typed value produced by the system under test.
    var value: Self.Value { get }
}
struct

EvaluationTrait

NewiOSmacOSvisionOSwatchOS
public struct EvaluationTrait : TestTrait, TestScoping

A test trait that runs an evaluation and records the result as attachments.

The result is accessible via EvaluationContext.

Declaration
public struct EvaluationTrait : TestTrait, TestScoping {

    /// Provide custom execution scope for a function call which is related to the
    /// specified test or test case.
    ///
    /// - Parameters:
    ///   - test: The test which `function` encapsulates.
    ///   - testCase: The test case, if any, which `function` encapsulates.
    ///     When invoked on a suite, the value of this argument is `nil`.
    ///   - function: The function to perform. If `test` represents a test suite,
    ///     this function encapsulates running all the tests in that suite. If
    ///     `test` represents a test function, this function is the body of that
    ///     test function (including all cases if the test function is
    ///     parameterized.)
    ///
    /// - Throws: Any error that `function` throws, or an error that prevents this
    ///   type from providing a custom scope correctly. The testing library
    ///   records an error thrown from this method as an issue associated with
    ///   `test`. If an error is thrown before this method calls `function`, the
    ///   corresponding test doesn't run.
    ///
    /// When the testing library prepares to run a test, it starts by finding
    /// all traits applied to that test, including those inherited from containing
    /// suites. It begins with inherited suite traits, sorting them
    /// outermost-to-innermost, and if the test is a function, it then adds all
    /// traits applied directly to that functions in the order they were applied
    /// (left-to-right). It then asks each trait for its scope provider (if any)
    /// by calling ``Trait/scopeProvider(for:testCase:)-cjmg``. Finally, it calls
    /// this method on all non-`nil` scope providers, giving each an opportunity
    /// to perform arbitrary work before or after invoking `function`.
    ///
    /// This method should either invoke `function` once before returning,
    /// or throw an error if it's unable to provide a custom scope.
    ///
    /// Issues recorded by this method are associated with `test`.
    ///
    /// @Metadata {
    ///   @Available(Swift, introduced: 6.1)
    ///   @Available(Xcode, introduced: 16.3)
    /// }
    nonisolated(nonsending) public func provideScope(for test: Test, testCase: Test.Case?, performing function: @Sendable () async throws -> Void) async throws

    /// The type of the test scope provider for this trait.
    ///
    /// The default type is `Never`, which can't be instantiated. The
    /// ``scopeProvider(for:testCase:)-cjmg`` method for any trait with
    /// `Never` as its test scope provider type must return `nil`, meaning that
    /// the trait doesn't provide a custom scope for tests it's applied to.
    ///
    /// @Metadata {
    ///   @Available(Swift, introduced: 6.1)
    ///   @Available(Xcode, introduced: 16.3)
    /// }
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias TestScopeProvider = EvaluationTrait
}
struct

Evaluator

NewiOSmacOSvisionOSwatchOS
public struct Evaluator<Input> : EvaluatorProtocol, @unchecked Sendable where Input : SampleProtocol, Input.ExpectedValue : Decodable, Input.ExpectedValue : Encodable, Input.ExpectedValue : Sendable

A closure-based evaluator.

Use Evaluator to create inline evaluators without defining a custom type. The closure receives the input sample and the ModelSubject, providing access to both .value and .transcript.

Evaluator<ModelSample<String>> { sample, subject in
    let metric = Metric("TitleMatch")
    guard let expected = sample.expected else { return metric.ignore() }
    return subject.value == expected ? metric.passing() : metric.failing()
}
Declaration
public struct Evaluator<Input> : EvaluatorProtocol, @unchecked Sendable where Input : SampleProtocol, Input.ExpectedValue : Decodable, Input.ExpectedValue : Encodable, Input.ExpectedValue : Sendable {

    /// Creates an evaluator with the given evaluation closure.
    ///
    /// - Parameter evaluate: A closure that receives the input and subject,
    ///   and returns a ``Metric`` with a result value.
    public init(_ evaluate: nonisolated(nonsending) @escaping (Input, ModelSubject<Input.ExpectedValue>) async throws -> Metric)

    /// Computes metrics for the given subject, given the input sample.
    ///
    /// - Parameters:
    ///   - subject: The subject of evaluation, which the evaluation's `subject(from:)` method produces.
    ///   - input: The input sample that contains the expected value and other context.
    /// - Returns: An array of metrics produced by this evaluator.
    nonisolated(nonsending) public func metrics(subject: ModelSubject<Input.ExpectedValue>, input: Input) async throws -> [Metric]

    /// The type of the subject produced by the system under test.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias Subject = ModelSubject<Input.ExpectedValue>
}
protocol

EvaluatorProtocol

NewiOSmacOSvisionOSwatchOS
public protocol EvaluatorProtocol<Input, Subject> : Sendable

A type that evaluates subjects and produces metrics.

Conform to EvaluatorProtocol to create custom evaluators that measure the system's output against expected criteria. Each evaluator returns an array of Metric values — one per DataFrame column produced.

The protocol is parameterized by Input (the sample type). Subject is an associated type constrained to EvaluationSubject, ensuring the subject's value type matches the sample's expected value type.

Conforming types must be Sendable.

struct MyEvaluator<Input: SampleProtocol>: EvaluatorProtocol
where Input.ExpectedValue: Sendable & Codable {
    let metric = Metric("Quality")

    func metrics(
        subject: ModelSubject<Input.ExpectedValue>,
        input: Input
    ) async throws -> [Metric] {
        return [metric.scoring(1.0)]
    }
}
Declaration
public protocol EvaluatorProtocol<Input, Subject> : Sendable {

    /// The input sample type.
    associatedtype Input : SampleProtocol where Self.Input.ExpectedValue == Self.Subject.Value

    /// The type of the subject produced by the system under test.
    associatedtype Subject : EvaluationSubject

    /// Computes metrics for the given subject, given the input sample.
    ///
    /// - Parameters:
    ///   - subject: The subject of evaluation, which the evaluation's `subject(from:)` method produces.
    ///   - input: The input sample that contains the expected value and other context.
    /// - Returns: An array of metrics produced by this evaluator.
    nonisolated(nonsending) func metrics(subject: Self.Subject, input: Self.Input) async throws -> [Metric]
}
struct

EvaluatorsBuilder

NewiOSmacOSvisionOSwatchOS
public struct EvaluatorsBuilder<Sample, Subject> where Sample : SampleProtocol, Subject : EvaluationSubject

A result builder that enables declarative evaluator lists.

Apply this builder to the evaluators property to remove the need for explicit array literals and type annotations:

@EvaluatorsBuilder<ModelSample<String>, ModelSubject<String>>
func buildEvaluators() -> [any EvaluatorProtocol<ModelSample<String>, ModelSubject<String>>] {
    Evaluator<ModelSample<String>> { sample, subject in
        Metric("Match").scoring(1.0)
    }
}
Declaration
@resultBuilder public struct EvaluatorsBuilder<Sample, Subject> where Sample : SampleProtocol, Subject : EvaluationSubject {

    public static func buildExpression(_ expression: any EvaluatorProtocol<Sample, Subject>) -> any EvaluatorProtocol<Sample, Subject>

    public static func buildBlock(_ components: any EvaluatorProtocol<Sample, Subject>...) -> [any EvaluatorProtocol<Sample, Subject>]

    public static func buildOptional(_ component: [any EvaluatorProtocol<Sample, Subject>]?) -> [any EvaluatorProtocol<Sample, Subject>]
}
struct

JSONLoader

NewiOSmacOSvisionOSwatchOS
public struct JSONLoader<Sample> : Loader where Sample : SampleProtocol

A loader backed by a JSON or JSONL file.

The format is detected automatically from the file contents:

  • If the first non-whitespace character is [, the file is treated as a

JSON array ([{...}, {...}]) and decoded in one pass.

  • Otherwise, the file is treated as JSONL (JSON Lines), where each

non-empty line is decoded as an individual sample.

Malformed entries are logged via OSLog and skipped. A failure to open the file propagates as a thrown error.

Declaration
public struct JSONLoader<Sample> : Loader where Sample : SampleProtocol {

    /// Creates a loader backed by the JSON or JSONL file at the given URL.
    public init(url: URL)

    /// The async sequence for iteration during an evaluation run.
    public var stream: any AsyncSequence<Sample, any Error> { get }
}
protocol

Loader

NewiOSmacOSvisionOSwatchOS
public protocol Loader<Sample> : Sendable

A protocol for types that supply a dataset for evaluation.

Use one of the built-in concrete types — ArrayLoader, JSONLoader, or StreamLoader — or implement this protocol directly for custom data sources.

var dataset: any Loader<ModelSample<String>> {
    ArrayLoader(samples: [
        ModelSample(prompt: "One plus one is...", expected: "Two."),
        ModelSample(prompt: "Swift is...", expected: "A powerful language."),
    ])
}
var dataset: any Loader<ModelSample<String>> {
    JSONLoader(url: Bundle.main.url(forResource: "prompts", withExtension: "jsonl")!)
}
var dataset: any Loader<ModelSample<String>> {
    StreamLoader(stream: AsyncThrowingStream<ModelSample<String>, Error> { continuation in
        Task {
            let prompts = ["One plus one is...", "Swift is..."]
            for prompt in prompts {
                continuation.yield(ModelSample(prompt: prompt, expected: ""))
            }
            continuation.finish()
        }
    })
}
Declaration
public protocol Loader<Sample> : Sendable {

    associatedtype Sample : SampleProtocol

    /// The async sequence for iteration during an evaluation run.
    var stream: any AsyncSequence<Self.Sample, any Error> { get }
}
struct

Metric

NewiOSmacOSvisionOSwatchOS
public struct Metric : Sendable, Equatable

A named metric that carries a result value.

Use Metric to define a named measurement. The factory methods (passing, failing, scoring, ignore) return a new Metric with the result stored inside.

Here's how you create a custom metric:

let metric = Metric("Accuracy")
let result = metric.passing(rationale: "Exact match")
Declaration
public struct Metric : Sendable, Equatable {

    /// The name of the metric, used as the DataFrame column name.
    public let name: String

    /// The result value of this metric.
    public let value: Metric.Value

    /// An optional rationale describing the result.
    public let rationale: String?

    /// A metric result value.
    public enum Value : Equatable, Sendable {

        /// A positive/passing result.
        case passing

        /// A negative/failing result.
        case failing

        /// A numeric result.
        case scoring(Double)

        /// The metric is not applicable for this sample and should be excluded from aggregation.
        ///
        /// Use this when a sample doesn't have the necessary data for evaluation (e.g., no tool
        /// expectations defined for a tool trajectory metric). Aggregators will skip these results
        /// when computing statistics like mean.
        case ignore

        /// Returns a Boolean value indicating whether two values are equal.
        ///
        /// Equality is the inverse of inequality. For any values `a` and `b`,
        /// `a == b` implies that `a != b` is `false`.
        ///
        /// - Parameters:
        ///   - lhs: A value to compare.
        ///   - rhs: Another value to compare.
        public static func == (a: Metric.Value, b: Metric.Value) -> Bool
    }

    /// Creates a metric with just a name.
    ///
    /// Use the factory methods — `passing`, `failing`, `scoring`, or `ignore` — to produce results.
    public init(_ name: String)

    /// Returns a metric with a passing result.
    public func passing(rationale: String? = nil) -> Metric

    /// Returns a metric with a failing result.
    public func failing(rationale: String? = nil) -> Metric

    /// Returns a metric with a numeric result.
    public func scoring(_ value: Double, rationale: String? = nil) -> Metric

    /// Returns a metric with an ignored result, excluded from aggregation.
    public func ignore(rationale: String? = nil) -> Metric

    /// The numeric value of this metric, or `nil` for ignored metrics.
    ///
    /// - `pass` → `1.0`
    /// - `fail` → `0.0`
    /// - `score(x)` → `x`
    /// - `ignore` → `nil`
    public var doubleValue: Double? { get }

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (lhs: Metric, rhs: Metric) -> Bool
}
extension

Metric

NewiOSmacOSvisionOSwatchOS
extension Metric : CustomStringConvertible
Declaration
extension Metric : CustomStringConvertible {

    /// A textual representation of this instance.
    ///
    /// Calling this property directly is discouraged. Instead, convert an
    /// instance of any type to a string by using the `String(describing:)`
    /// initializer. This initializer works with any type, and uses the custom
    /// `description` property for types that conform to
    /// `CustomStringConvertible`:
    ///
    ///     struct Point: CustomStringConvertible {
    ///         let x: Int, y: Int
    ///
    ///         var description: String {
    ///             return "(\(x), \(y))"
    ///         }
    ///     }
    ///
    ///     let p = Point(x: 21, y: 30)
    ///     let s = String(describing: p)
    ///     print(s)
    ///     // Prints "(21, 30)"
    ///
    /// The conversion of `p` to a string in the assignment to `s` uses the
    /// `Point` type's `description` property.
    public var description: String { get }
}
struct

MetricsAggregator

NewiOSmacOSvisionOSwatchOS
public struct MetricsAggregator

A utility for computing aggregate statistics from evaluation metrics.

let accuracy = Metric("Accuracy")

func aggregateMetrics(using aggregator: inout MetricsAggregator) {
    aggregator.computeMean(of: accuracy)
    aggregator.computeMaximum(of: accuracy)
    aggregator.computeStandardDeviation(of: accuracy)
}

Use this structure to calculate summary statistics like mean, median, and standard deviation from your evaluation results. The aggregator processes metric data from a DataFrame and produces aggregated results.

Declaration
public struct MetricsAggregator {

    /// Creates a group of related metrics.
    ///
    /// Use this to organize your metrics into logical groups for better readability.
    ///
    /// ## Example
    /// ```swift
    /// let accuracy = Metric("Accuracy")
    ///
    /// func aggregateMetrics(using aggregator: inout MetricsAggregator) {
    ///     aggregator.group("Quality Metrics") { group in
    ///         group.computeMean(of: accuracy)
    ///         group.computeMedian(of: accuracy)
    ///     }
    /// }
    /// ```
    public mutating func group(_ name: String, _ body: (inout MetricsAggregator.Group) -> Void)

    /// Computes the mean of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeMean(of metric: Metric)

    /// Computes the median of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeMedian(of metric: Metric)

    /// Computes the mode of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeMode(of metric: Metric)

    /// Computes the minimum value of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeMinimum(of metric: Metric)

    /// Computes the maximum value of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeMaximum(of metric: Metric)

    /// Computes the standard deviation of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeStandardDeviation(of metric: Metric)

    /// Computes the variance of a metric and adds it to the aggregated results.
    ///
    /// - Parameter metric: The metric to aggregate.
    public mutating func computeVariance(of metric: Metric)

    /// Computes a custom aggregation from a single metric's results.
    ///
    /// - Parameters:
    ///   - metric: The metric to aggregate.
    ///   - label: The label for this statistic in the aggregated results.
    ///   - body: A closure that receives the metric's values and returns a computed statistic.
    public mutating func custom(of metric: Metric, label: String, _ body: ([Double]) -> Double)
}
extension

MetricsAggregator

NewiOSmacOSvisionOSwatchOS
extension MetricsAggregator
Declaration
extension MetricsAggregator {

    /// A grouped collection of related metrics.
    ///
    /// Use `Group` within ``MetricsAggregator/group(_:_:)`` to add metrics
    /// that should be displayed together.
    public struct Group {

        /// The name of this group.
        public let name: String

        /// Computes the mean of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeMean(of metric: Metric)

        /// Computes the median of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeMedian(of metric: Metric)

        /// Computes the mode of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeMode(of metric: Metric)

        /// Computes the minimum value of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeMinimum(of metric: Metric)

        /// Computes the maximum value of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeMaximum(of metric: Metric)

        /// Computes the standard deviation of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeStandardDeviation(of metric: Metric)

        /// Computes the variance of a metric and adds it to the group.
        ///
        /// - Parameter metric: The metric to aggregate.
        public mutating func computeVariance(of metric: Metric)

        /// Computes a custom aggregation and adds it to the group.
        ///
        /// - Parameters:
        ///   - metric: The metric to aggregate.
        ///   - label: The label for this statistic in the aggregated results.
        ///   - body: A closure that receives the metric's values and returns a computed statistic.
        public mutating func custom(of metric: Metric, label: String, _ body: ([Double]) -> Double)
    }
}
enum

ModelJudgeError

NewiOSmacOSvisionOSwatchOS
public enum ModelJudgeError : LocalizedError
Declaration
public enum ModelJudgeError : LocalizedError {

    /// A scoring dimension returns a value the evaluator can't parse as a number.
    case invalidScore(dimension: String, value: String)

    /// The evaluator can't interpret the model-as-judge's response as a valid score.
    case invalidResponse(String)

    /// The evaluator fails to decode the JSON from the model-as-judge's response.
    case jsonDecodingFailed(response: String, underlying: any Error)

    /// The model-as-judge's response is missing a required scoring dimension.
    case missingDimension(String, response: String)

    /// The scoring dimension has no scale values defined.
    case noScaleValues(dimension: String)

    /// A localized message describing what error occurred.
    public var errorDescription: String? { get }
}
struct

ModelJudgeEvaluator

NewiOSmacOSvisionOSwatchOS
public struct ModelJudgeEvaluator<Input> : EvaluatorProtocol, Sendable where Input : ModelSampleProtocol

An evaluator that uses a language model as a judge to score responses.

ModelJudgeEvaluator sends the query, response, and optional reference data to a judge model, which returns scores for one or more dimensions. The response is automatically serialized as JSON, because OutputType is Codable, or is customizable via ModelJudgePrompt.

Declaration
public struct ModelJudgeEvaluator<Input> : EvaluatorProtocol, Sendable where Input : ModelSampleProtocol {

    /// The dimensions this evaluator scores.
    public let dimensions: [ScoreDimension]

    /// The scoring constraint mode. See ``ScoringMode``.
    public let scoringMode: ScoringMode

    /// The default system instructions the model uses when no custom instructions are provided.
    public static var defaultInstructions: String { get }

    @available(anyAppleOS 27.0, *)
    public init(_ name: String, scale: ScoringScale, judge: any LanguageModel = SystemLanguageModel(), scoringMode: ScoringMode = .discrete)

    /// Creates a single-metric evaluator with a custom judge prompt.
    ///
    /// - Parameters:
    ///   - name: The metric name, used as the DataFrame column name.
    ///   - scale: The scoring scale for this metric.
    ///   - judge: The language model to use as judge.
    ///   - scoringMode: Whether scores are discrete (default) or allow any floating-point value.
    ///   - prompt: Configuration for the judge prompt, including instructions, response presentation, and reference.
    public init(_ name: String, scale: ScoringScale, judge: any LanguageModel, scoringMode: ScoringMode = .discrete, prompt: ModelJudgePrompt<Input>)

    @available(anyAppleOS 27.0, *)
    public init(judge: any LanguageModel = SystemLanguageModel(), dimensions: [ScoreDimension], scoringMode: ScoringMode = .discrete)

    /// Creates a multi-metric evaluator with a custom judge prompt.
    ///
    /// - Parameters:
    ///   - judge: The language model to use as judge.
    ///   - dimensions: The dimensions to score. Each produces a separate DataFrame column.
    ///   - scoringMode: Whether scores are discrete (default) or allow any floating-point value.
    ///   - prompt: Configuration for the judge prompt, including instructions, response presentation, and reference.
    public init(judge: any LanguageModel, dimensions: [ScoreDimension], scoringMode: ScoringMode = .discrete, prompt: ModelJudgePrompt<Input>)

    /// Creates a pairwise comparison evaluator that compares the model's response
    /// against the sample's expected value.
    ///
    /// The judge sees the model's output under "Response" and the expected value from
    /// `input.expected` under "Baseline Response" in the Context section.
    ///
    /// - Parameters:
    ///   - name: The metric name, used as the DataFrame column name.
    ///   - scale: Scoring scale for the comparison.
    ///   - judge: The language model to use as judge.
    ///   - scoringMode: Whether scores are discrete (default) or allow any floating-point value.
    ///   - evaluationTarget: Optional closure to convert the value to a string. Used for both responses.
    public static func pairwise(_ name: String, scale: ScoringScale, judge: any LanguageModel, scoringMode: ScoringMode = .discrete, evaluationTarget: (@Sendable (Input.ExpectedValue) -> String)? = nil) -> ModelJudgeEvaluator<Input>

    /// Creates a multi-metric pairwise comparison evaluator.
    ///
    /// - Parameters:
    ///   - judge: The language model to use as judge.
    ///   - dimensions: The dimensions to score for the comparison.
    ///   - scoringMode: Whether scores are discrete (default) or allow any floating-point value.
    ///   - evaluationTarget: Optional closure to convert the value to a string. Used for both responses.
    public static func pairwise(judge: any LanguageModel, dimensions: [ScoreDimension], scoringMode: ScoringMode = .discrete, evaluationTarget: (@Sendable (Input.ExpectedValue) -> String)? = nil) -> ModelJudgeEvaluator<Input>

    /// Builds and returns the full judge prompt for inspection, debugging, or logging.
    ///
    /// Use this to see exactly what the judge model will receive for a given input/response pair.
    ///
    /// - Parameters:
    ///   - sample: The evaluation sample.
    ///   - output: The model's response content.
    /// - Returns: The fully assembled `Prompt` that would be sent to the judge.
    nonisolated(nonsending) public func judgePrompt(for sample: Input, output: Input.ExpectedValue) async throws -> Prompt

    /// Computes metrics for the given subject, given the input sample.
    ///
    /// - Parameters:
    ///   - subject: The subject of evaluation, which the evaluation's `subject(from:)` method produces.
    ///   - input: The input sample that contains the expected value and other context.
    /// - Returns: An array of metrics produced by this evaluator.
    nonisolated(nonsending) public func metrics(subject: ModelSubject<Input.ExpectedValue>, input: Input) async throws -> [Metric]

    /// The type of the subject produced by the system under test.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)

Truncated.

struct

ModelJudgePrompt

NewiOSmacOSvisionOSwatchOS
public struct ModelJudgePrompt<Input> : Sendable where Input : ModelSampleProtocol

A configuration for how a model-as-judge evaluator constructs its prompt.

let prompt = ModelJudgePrompt<ModelSample<String>>(
    instructions: "You are a domain expert evaluating product reviews."
)

ModelJudgePrompt bundles the instructions, response presentation, and reference-data injection into a single composable value. Use it with ModelJudgeEvaluator to customize how the model as judge sees the evaluation.

Declaration
public struct ModelJudgePrompt<Input> : Sendable where Input : ModelSampleProtocol {

    /// The default system instructions used when no custom instructions are provided.
    public static var defaultInstructions: String { get }

    /// The system instructions for the judge model.
    public let instructions: String

    /// An optional closure that converts the model's response to a string for the judge prompt.
    ///
    /// When `nil`, the evaluator JSON-serializes the response automatically.
    public let evaluationTarget: (@Sendable (Input.ExpectedValue) -> String)?

    /// An optional closure that provides labeled reference data to include in the model-as-judge prompt.
    ///
    /// The closure receives both the input sample and the model's response, allowing
    /// reference data derived from either, for example, running a grammar checker on the response,
    /// or passing the sample's expected value for comparison.
    public let reference: (nonisolated(nonsending) @Sendable (Input, Input.ExpectedValue) async throws -> [String : String])?

    /// Creates a model-as-judge prompt configuration.
    ///
    /// ```swift
    /// let prompt = ModelJudgePrompt<ModelSample<String>>(
    ///     instructions: "You are a domain expert."
    /// )
    /// ```
    ///
    /// - Parameters:
    ///   - instructions: System instructions for the model-as-judge. Defaults to a general-purpose evaluator prompt.
    ///   - evaluationTarget: Optional closure to convert the response to a string. When `nil`, the response is JSON-serialized.
    ///   - reference: Optional closure returning labeled reference data to include in the judge prompt.
    public init(instructions: String = ModelJudgePrompt.defaultInstructions, evaluationTarget: (@Sendable (Input.ExpectedValue) -> String)? = nil, reference: (nonisolated(nonsending) @Sendable (Input, Input.ExpectedValue) async throws -> [String : String])? = nil)
}
struct

ModelSample

NewiOSmacOSvisionOSwatchOS
public struct ModelSample<ExpectedValue> : ModelSampleProtocol where ExpectedValue : Decodable, ExpectedValue : Encodable, ExpectedValue : Sendable

A general-purpose language model evaluation sample.

Accepts string-based prompts and instructions. For multimodal prompts, create a custom ModelSampleProtocol conformance or use the init(input:expected:expectations:) initializer with a prebuilt ModelSampleInput.

let sample = ModelSample(prompt: "The capital of France is...", expected: "Paris.")
Declaration
public struct ModelSample<ExpectedValue> : ModelSampleProtocol where ExpectedValue : Decodable, ExpectedValue : Encodable, ExpectedValue : Sendable {

    /// The bundled language model input (prompt, instructions, schema).
    public var input: ModelSampleInput

    /// The expected output value and evaluation expectations.
    public var output: ModelSampleOutput<ExpectedValue, TrajectoryExpectation>

    /// The expected output for comparison.
    public var expected: ExpectedValue? { get }

    /// Creates a model sample with string-based prompt and instructions.
    public init(prompt: String, expected: ExpectedValue? = Optional<String>(nilLiteral: ()), instructions: String? = nil, generationSchema: GenerationSchema? = nil, expectations: TrajectoryExpectation? = nil)

    /// Creates a model sample with a FoundationModels prompt.
    public init(prompt: Prompt, expected: ExpectedValue? = Optional<String>(nilLiteral: ()), instructions: Instructions? = nil, generationSchema: GenerationSchema? = nil, expectations: TrajectoryExpectation? = nil)

    /// Creates a model sample with a prebuilt input.
    public init(input: ModelSampleInput, expected: ExpectedValue? = Optional<String>(nilLiteral: ()), expectations: TrajectoryExpectation? = nil)

    /// The type of evaluation expectations (e.g., ``TrajectoryExpectation``).
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias Expectation = TrajectoryExpectation

    /// The type of the input data.
    ///
    /// The type must be string-representable for display.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias Input = ModelSampleInput
}
extension

ModelSample

NewiOSmacOSvisionOSwatchOS
extension ModelSample : Codable
Declaration
extension ModelSample : Codable {

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws
}
extension

ModelSample

NewiOSmacOSvisionOSwatchOS
extension ModelSample
Declaration
extension ModelSample {

    /// The user's prompt for this sample.
    public var prompt: Prompt { get }

    /// A text representation of the prompt, synthesized from its segments.
    public var promptDescription: String { get }

    /// Optional instructions providing context to the model for this sample.
    public var instructions: Instructions? { get }

    /// A text representation of the instructions, synthesized from their segments.
    public var instructionsDescription: String? { get }

    /// The output schema for the model's response.
    public var generationSchema: GenerationSchema? { get }

    /// The expected pattern of tool calls for this sample.
    public var expectations: TrajectoryExpectation? { get }
}
struct

ModelSampleInput

NewiOSmacOSvisionOSwatchOS
public struct ModelSampleInput : CustomStringConvertible, Sendable

The data sent to a language model for evaluation.

Stores FoundationModels types (Prompt, Instructions) and automatically synthesizes text representations for display, logging, and synthetic data.

Declaration
public struct ModelSampleInput : CustomStringConvertible, Sendable {

    /// The FoundationModels prompt for this input.
    public var prompt: Prompt

    /// A text representation of the prompt, synthesized from prompt segments.
    public var promptDescription: String { get }

    /// The optional FoundationModels instructions for this input.
    public var instructions: Instructions?

    /// A text representation of the instructions, synthesized from instruction segments.
    public var instructionsDescription: String? { get }

    /// The output schema for the assistant's response.
    public var generationSchema: GenerationSchema?

    /// A textual representation of this instance.
    ///
    /// Calling this property directly is discouraged. Instead, convert an
    /// instance of any type to a string by using the `String(describing:)`
    /// initializer. This initializer works with any type, and uses the custom
    /// `description` property for types that conform to
    /// `CustomStringConvertible`:
    ///
    ///     struct Point: CustomStringConvertible {
    ///         let x: Int, y: Int
    ///
    ///         var description: String {
    ///             return "(\(x), \(y))"
    ///         }
    ///     }
    ///
    ///     let p = Point(x: 21, y: 30)
    ///     let s = String(describing: p)
    ///     print(s)
    ///     // Prints "(21, 30)"
    ///
    /// The conversion of `p` to a string in the assignment to `s` uses the
    /// `Point` type's `description` property.
    public var description: String { get }

    /// Creates a model sample input with the given prompt, instructions, and schema.
    ///
    /// - Parameters:
    ///   - prompt: The prompt to send to the language model.
    ///   - instructions: Optional system instructions for the model session.
    ///   - generationSchema: The output schema for the assistant's response.
    public init(prompt: Prompt, instructions: Instructions? = nil, generationSchema: GenerationSchema? = nil)
}
extension

ModelSampleInput

NewiOSmacOSvisionOSwatchOS
extension ModelSampleInput : Codable
Declaration
extension ModelSampleInput : Codable {

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws
}
struct

ModelSampleOutput

NewiOSmacOSvisionOSwatchOS
public struct ModelSampleOutput<Value, Expectation> : Sendable, Codable where Value : Decodable, Value : Encodable, Value : Sendable, Expectation : Decodable, Expectation : Encodable, Expectation : Sendable

The expected output value and evaluation expectations for a sample.

Declaration
public struct ModelSampleOutput<Value, Expectation> : Sendable, Codable where Value : Decodable, Value : Encodable, Value : Sendable, Expectation : Decodable, Expectation : Encodable, Expectation : Sendable {

    /// The expected output value for comparison.
    public var value: Value?

    /// The expected behavior, for example, tool-call trajectory.
    public var expectations: Expectation?

    /// Creates a model sample output with an optional expected value and expectations.
    ///
    /// ```swift
    /// let output = ModelSampleOutput<String, TrajectoryExpectation>(value: "Paris", expectations: nil)
    /// ```
    ///
    /// - Parameters:
    ///   - value: The expected output value for comparison.
    ///   - expectations: The expected behavior, such as a tool-call trajectory.
    public init(value: Value? = nil, expectations: Expectation? = nil)

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws
}
protocol

ModelSampleProtocol

NewiOSmacOSvisionOSwatchOS
public protocol ModelSampleProtocol : SampleProtocol where Self.ExpectedValue : Decodable, Self.ExpectedValue : Encodable, Self.ExpectedValue : Sendable

A type that defines language model evaluation samples with prompt, instructions, and expectations.

Extends SampleProtocol with prompt, instructions, and evaluation expectations. Use ModelSample for the common case; create custom conformances when you need additional properties.

let sample = ModelSample(
    prompt: "What's the weather?",
    expected: "Sunny",
    expectations: TrajectoryExpectation(ordered: [
        ToolExpectation("get_weather")
    ])
)
Declaration
public protocol ModelSampleProtocol : SampleProtocol where Self.ExpectedValue : Decodable, Self.ExpectedValue : Encodable, Self.ExpectedValue : Sendable {

    /// The type of evaluation expectations (e.g., ``TrajectoryExpectation``).
    associatedtype Expectation : Decodable, Encodable, Sendable

    /// The bundled language model input (prompt, instructions, schema).
    var input: ModelSampleInput { get }

    /// The expected output value and evaluation expectations.
    var output: ModelSampleOutput<Self.ExpectedValue, Self.Expectation> { get }
}
struct

ModelSubject

NewiOSmacOSvisionOSwatchOS
public struct ModelSubject<Value> : EvaluationSubject where Value : Decodable, Value : Encodable, Value : Sendable

The subject type for language model evaluations.

Carries the model's produced value and an optional structured transcript. The transcript is required for tool-call evaluation. ToolCallEvaluator performs a runtime check and throws missingTranscript(evaluatorType:) if the transcript is nil.

let subject = ModelSubject(value: "Paris, France")
Declaration
public struct ModelSubject<Value> : EvaluationSubject where Value : Decodable, Value : Encodable, Value : Sendable {

    /// The typed value produced by the model.
    public var value: Value

    /// The structured transcript from the model session.
    ///
    /// Required when using ``ToolCallEvaluator``. If `nil` and a
    /// ``ToolCallEvaluator`` is used, ``EvaluationError/missingTranscript(evaluatorType:)`` is thrown.
    public var transcript: StructuredTranscript?

    /// Creates a model subject with a value and optional transcript.
    ///
    /// - Parameters:
    ///   - value: The typed value produced by the model.
    ///   - transcript: The structured transcript from the model session.
    ///     Required for tool call evaluations.
    public init(value: Value, transcript: StructuredTranscript? = nil)
}
extension

ModelSubject

NewiOSmacOSvisionOSwatchOS
extension ModelSubject
Declaration
extension ModelSubject {

    /// The tool calls from the transcript, or an empty array if no transcript was provided.
    public var toolCalls: [Transcript.ToolCall] { get }
}
struct

ResultColumn

NewiOSmacOSvisionOSwatchOS
public struct ResultColumn<Value> : Sendable

A typed descriptor for a column in an evaluation result DataFrame.

let column = ResultColumn<ModelSample<String>>(name: "Input")
Declaration
public struct ResultColumn<Value> : Sendable {

    /// The column name in the DataFrame.
    public let name: String
}
actor

SampleGenerator

NewiOSmacOSvisionOSwatchOS
public actor SampleGenerator<SampleType> where SampleType : ModelSampleProtocol

An actor that generates evaluation samples using a language model.

Create a generator, configure its properties, then call run() to produce new samples as an async stream. After iteration completes, access samples for all generated samples, or invalidSamples for any the validator rejected.

Declaration
public actor SampleGenerator<SampleType> where SampleType : ModelSampleProtocol {

    /// The strategy for selecting existing samples as examples in the prompt.
    ///
    /// When `nil`, the generator shows no examples and doesn't retry on repetition.
    /// When set, the strategy also controls retry behavior when the model repeats itself.
    public var samplingStrategy: SampleGenerator<SampleType>.SamplingStrategy?

    /// An optional closure that decides whether a generated sample is valid.
    ///
    /// When provided, the generator collects rejected samples in ``invalidSamples``.
    public var validator: (nonisolated(nonsending) @Sendable (SampleType) async throws -> Bool)?

    /// All samples — initial and generated — from the most recent run.
    ///
    /// Before ``run()`` is called, this equals ``initialSamples``. After iteration
    /// completes, it contains the full resulting dataset.
    public var samples: [SampleType] { get }

    /// Samples that the validator rejected during the most recent run.
    ///
    /// Returns an empty array when no validator was provided.
    public var invalidSamples: [SampleType] { get }

    /// Creates a generator for sample values with a generable-expected value type.
    ///
    /// - Parameters:
    ///   - prompt: The prompt the generator sends to the language model session.
    ///   - samples: The initial set of evaluation samples that provide context.
    ///   - targetCount: The total number of samples in the resulting dataset.
    ///   - sessionProvider: A closure that creates a new language model session.
    ///   - samplingStrategy: The strategy for selecting example samples.
    ///   - validator: An optional closure that decides whether a generated sample is valid.
    public init<T>(_ prompt: Prompt, samples: [SampleType], targetCount: Int, sessionProvider: (@Sendable () -> LanguageModelSession)? = nil, samplingStrategy: SampleGenerator<SampleType>.SamplingStrategy? = .random(), validator: (nonisolated(nonsending) @Sendable (SampleType) async throws -> Bool)? = nil) where SampleType == ModelSample<T>, T : Generable, T : Decodable, T : Encodable, T : Sendable

    /// Creates a generator for custom, generable evaluation samples.
    ///
    /// - Parameters:
    ///   - prompt: The prompt the generator sends to the language model session.
    ///   - samples: The initial set of evaluation samples that provide context.
    ///   - targetCount: The total number of samples in the resulting dataset.
    ///   - sessionProvider: A closure that creates a new language model session.
    ///   - samplingStrategy: The strategy for selecting example samples.
    ///   - validator: An optional closure that decides whether a generated sample is valid.
    public init(_ prompt: Prompt, samples: [SampleType], targetCount: Int, sessionProvider: (@Sendable () -> LanguageModelSession)? = nil, samplingStrategy: SampleGenerator<SampleType>.SamplingStrategy? = .random(), validator: (nonisolated(nonsending) @Sendable (SampleType) async throws -> Bool)? = nil) where SampleType : Generable

    /// Runs the generator and returns a stream of newly synthesized samples.
    ///
    /// Each element in the returned stream is a newly generated sample. After iteration
    /// completes, access ``samples`` to retrieve the full dataset (initial + generated), or
    /// ``invalidSamples`` to see samples the validator rejected.
    ///
    /// - Returns: An async throwing stream of individual samples.
    nonisolated public func run() -> some AsyncSequence<SampleType, any Error>


    /// The values that define how the generator selects existing samples as examples in the generation prompt.
    ///
    /// When a model repeats an inference, the strategy determines whether and how the generator retries with different examples.
    public enum SamplingStrategy : Sendable {

        /// A strategy that randomly picks a subset of samples each time a model repeats inference.
        ///
        /// When the model repeats an inference, this strategy retries up to `retries` times,
        /// selecting a random subset to steer the model toward a new inference.
        case random(retries: Int = 5)

        /// A strategy that slides a window through the examples, advancing it each batch.
        ///
        /// When the model repeats an inference, this strategy continues retrying as long as there
        /// are new windows of examples to show the model.
        case slidingWindow
    }

    @objc deinit

    /// Retrieve the executor for this actor as an optimized, unowned
    /// reference.
    ///
    /// This property must always evaluate to the same executor for a

Truncated.

protocol

SampleProtocol

NewiOSmacOSvisionOSwatchOS
public protocol SampleProtocol : Decodable, Encodable, Sendable

A type that defines evaluation samples.

struct MySample: SampleProtocol {
    var input: String
    var expected: String?
}

Conform to this protocol to define input samples for your evaluation datasets. Each sample has an input that's displayed in the DataFrame "Input" column, and an optional expected value for comparison.

For language model evaluations, use ModelSampleProtocol which extends this protocol with language-model-specific properties: prompt, instructions, and expectations.

let samples = [
    ModelSample(prompt: "Classify: I love this!", expected: "positive"),
]
Declaration
public protocol SampleProtocol : Decodable, Encodable, Sendable {

    /// The type of the input data.
    ///
    /// The type must be string-representable for display.
    associatedtype Input : CustomStringConvertible

    /// The type of the expected output value.
    associatedtype ExpectedValue

    /// The input data for this sample, shown in the "Input" DataFrame column.
    var input: Self.Input { get }

    /// The expected output for comparison.
    var expected: Self.ExpectedValue? { get }
}
struct

ScaleOption

NewiOSmacOSvisionOSwatchOS
public struct ScaleOption : Sendable

A single option in a scoring scale.

let option = ScaleOption(
    label: "Excellent",
    guideDescription: "The response is of exceptional quality.",
    value: 5.0
)

Each option defines a label, guide description, and numeric value. Options are presented to the model as judge in the scoring guide section of the prompt.

Declaration
public struct ScaleOption : Sendable {

    /// A short label for this option (e.g., "excellent", "pass", "5").
    public let label: String

    /// Rubric guidance shown to the judge for this option.
    public let guideDescription: String

    /// The numeric value for this option, used for metric aggregation.
    public let value: Double

    /// Creates a scale option.
    ///
    /// - Parameters:
    ///   - label: A short label for this option.
    ///   - guideDescription: Rubric guidance shown to the judge for this option.
    ///   - value: The numeric value for this option.
    public init(label: String, guideDescription: String, value: Double)
}
struct

ScoreDimension

NewiOSmacOSvisionOSwatchOS
public struct ScoreDimension : Sendable

A named scoring dimension for a model judge evaluator.

Each dimension defines a name (used as the DataFrame column), an optional description, and a ScoringScale that defines what each score means.

ScoreDimension("Grammar", scale: .numeric([
    5: "Flawless grammar throughout",
    3: "Some errors but generally readable",
    1: "Pervasive errors making text difficult to understand"
]))
ScoreDimension("Safe", scale: .passFail(
    passDescription: "The response is safe and appropriate",
    failDescription: "The response contains harmful content"
))
enum SafetyLevel: ScoreLevel {
    case safe, unsafe
    var guideDescription: String { self == .safe ? "Safe" : "Unsafe" }
    var value: Double { self == .safe ? 1 : 0 }
}
let _ = ScoreDimension("Safety", scale: .custom(SafetyLevel.self))
Declaration
public struct ScoreDimension : Sendable {

    /// The name of the dimension, used as the DataFrame column name.
    public let name: String

    /// An optional description providing additional context for the judge about what this dimension measures.
    public let description: String?

    /// The scoring scale for this dimension.
    public let scale: ScoringScale

    /// A metric identifier derived from this dimension's name.
    ///
    /// Use this to reference the dimension's metric in ``MetricsAggregator``
    /// without repeating the name as a raw string:
    ///
    ///     let relevance = ScoreDimension("Relevance", scale: .numeric([...]))
    ///     aggregator.computeMean(of: relevance.metric)
    ///
    public var metric: Metric { get }

    /// Creates a scoring dimension.
    ///
    /// - Parameters:
    ///   - name: The dimension name, used as the DataFrame column name and for aggregation lookup.
    ///   - description: Optional description providing context about what this dimension measures.
    ///   - scale: The scoring scale for this dimension.
    public init(_ name: String, description: String? = nil, scale: ScoringScale)
}
protocol

ScoreLevel

NewiOSmacOSvisionOSwatchOS
public protocol ScoreLevel : CaseIterable, Hashable, Sendable

A type that defines individual levels within a scoring scale.

Conform an enum to ScoreLevel to create a typed, reusable scoring vocabulary. Each case represents one level a model as judge can assign. Labels default to the case name via String(describing:) — override label for human-readable formatting.

enum SafetyLevel: ScoreLevel {
    case safe, unsafe

    var guideDescription: String {
        switch self {
        case .safe: "The response is safe and appropriate"
        case .unsafe: "The response contains harmful content"
        }
    }

    var value: Double {
        switch self {
        case .safe: 1
        case .unsafe: 0
        }
    }
}

let dimension = ScoreDimension("Safety", scale: .custom(SafetyLevel.self))
Declaration
public protocol ScoreLevel : CaseIterable, Hashable, Sendable {

    /// A short judge-facing label for this level.
    ///
    /// Defaults to `String(describing: self)`, which for enums produces the case name.
    var label: String { get }

    /// Rubric guidance shown to the judge for this level.
    var guideDescription: String { get }

    /// The numeric value for this level, used for metric aggregation.
    var value: Double { get }
}
extension

ScoreLevel

NewiOSmacOSvisionOSwatchOS
extension ScoreLevel
Declaration
extension ScoreLevel {

    /// A short judge-facing label for this level.
    ///
    /// Defaults to `String(describing: self)`, which for enums produces the case name.
    public var label: String { get }
}
enum

ScoringMode

NewiOSmacOSvisionOSwatchOS
public enum ScoringMode : Sendable

The scoring constraint mode for a model-as-judge evaluator.

let mode: ScoringMode = .discrete

Controls whether the judge model can return any floating-point score (continuous) or is structurally constrained to return exactly one of the defined scale values (discrete).

Declaration
public enum ScoringMode : Sendable {

    /// A mode that requires the model to return exactly one of the values defined in the
    /// scoring dimension's scale, enforced using structured generation.
    case discrete

    /// A mode that allows the model to return any floating-point value. The scale serves as a guide
    /// but is not enforced at the generation level.
    case continuous

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (a: ScoringMode, b: ScoringMode) -> Bool

    /// Hashes the essential components of this value by feeding them into the
    /// given hasher.
    ///
    /// Implement this method to conform to the `Hashable` protocol. The
    /// components used for hashing must be the same as the components compared
    /// in your type's `==` operator implementation. Call `hasher.combine(_:)`
    /// with each of these components.
    ///
    /// - Important: In your implementation of `hash(into:)`,
    ///   don't call `finalize()` on the `hasher` instance provided,
    ///   or replace it with a different instance.
    ///   Doing so may become a compile-time error in the future.
    ///
    /// - Parameter hasher: The hasher to use when combining the components
    ///   of this instance.
    public func hash(into hasher: inout Hasher)

    /// The hash value.
    ///
    /// Hash values are not guaranteed to be equal across different executions of
    /// your program. Do not save hash values to use during a future execution.
    ///
    /// - Important: `hashValue` is deprecated as a `Hashable` requirement. To
    ///   conform to `Hashable`, implement the `hash(into:)` requirement instead.
    ///   The compiler provides an implementation for `hashValue` for you.
    public var hashValue: Int { get }
}
extension

ScoringMode

NewiOSmacOSvisionOSwatchOS
extension ScoringMode : Equatable
Declaration
extension ScoringMode : Equatable {
}
extension

ScoringMode

NewiOSmacOSvisionOSwatchOS
extension ScoringMode : Hashable
Declaration
extension ScoringMode : Hashable {
}
struct

ScoringScale

NewiOSmacOSvisionOSwatchOS
public struct ScoringScale : Sendable

A scoring scale that defines the set of options a judge can assign.

Use the factory methods to create scales from numeric dictionaries, pass/fail pairs, or typed ScoreLevel enums:

// Numeric scale
let _ = ScoringScale.numeric([5: "Flawless", 3: "Readable", 1: "Incomprehensible"])

// Pass/fail
let _ = ScoringScale.passFail(passDescription: "Safe", failDescription: "Unsafe")

// Typed enum
enum SafetyLevel: ScoreLevel {
    case safe, unsafe
    var guideDescription: String { self == .safe ? "Safe" : "Unsafe" }
    var value: Double { self == .safe ? 1 : 0 }
}
let _ = ScoringScale.custom(SafetyLevel.self)
Declaration
public struct ScoringScale : Sendable {

    /// The scale options, ordered from highest to lowest value.
    public let options: [ScaleOption]

    /// Creates a scoring scale with explicit options.
    ///
    /// - Parameter options: The scale options. Sorted by value descending.
    public init(options: [ScaleOption])

    /// Creates a scoring scale from a numeric dictionary.
    ///
    /// Each key-value pair maps a numeric score to rubric guidance. The label for each
    /// option is derived from the numeric value (e.g., `5` becomes `"5"`).
    ///
    /// - Parameter scale: A dictionary mapping numeric scores to rubric guidance.
    public static func numeric(_ scale: [Double : String]) -> ScoringScale

    /// Creates a binary pass/fail scoring scale.
    ///
    /// - Parameters:
    ///   - passDescription: Rubric guidance for what constitutes a pass.
    ///   - failDescription: Rubric guidance for what constitutes a fail.
    public static func passFail(passDescription: String, failDescription: String) -> ScoringScale

    /// Creates a scoring scale from a typed score level enum.
    ///
    /// All cases are enumerated and converted to ``ScaleOption`` values.
    ///
    /// - Parameter level: The score level type.
    public static func custom<Level>(_ level: Level.Type) -> ScoringScale where Level : ScoreLevel
}
struct

StreamLoader

NewiOSmacOSvisionOSwatchOS
public struct StreamLoader<Sample> : Loader where Sample : SampleProtocol

A loader backed by a custom async sequence.

Declaration
public struct StreamLoader<Sample> : Loader where Sample : SampleProtocol {

    /// Creates a loader backed by the given async sequence.
    public init(stream: some Sendable & AsyncSequence<Sample, any Error>)

    /// The async sequence for iteration during an evaluation run.
    public var stream: any AsyncSequence<Sample, any Error> { get }
}
struct

StructuredTranscript

NewiOSmacOSvisionOSwatchOS
public struct StructuredTranscript
Declaration
public struct StructuredTranscript {

    /// The tool calls extracted from the transcript.
    public var toolCalls: [Transcript.ToolCall]

    /// The tool outputs extracted from the transcript.
    public var toolOutputs: [Transcript.ToolOutput]

    /// The system instruction text from the transcript.
    public var instructionText: String

    /// The user prompt strings from the transcript.
    public var prompts: [String]

    /// The model responses from the transcript.
    public var responses: [Transcript.Response]

    /// Creates a structured transcript.
    ///
    /// - Parameters:
    ///   - toolCalls: The tool calls from the session.
    ///   - toolOutputs: The tool outputs from the session.
    ///   - instructionText: The system instructions text.
    ///   - prompts: The user prompts.
    ///   - responses: The model responses.
    public init(toolCalls: [Transcript.ToolCall] = [], toolOutputs: [Transcript.ToolOutput] = [], instructionText: String = "", prompts: [String] = [], responses: [Transcript.Response] = [])
}
enum

StructuredValue

NewiOSmacOSvisionOSwatchOS
public enum StructuredValue : Sendable, Hashable

A type-safe representation of JSON values.

let name: StructuredValue = "Alice"
let score: StructuredValue = 4.5
let tags: StructuredValue = ["swift", "evaluation"]

This type is not @Generable due to its recursive array/dictionary structure. For generable argument specifications, use ArgumentValue instead.

Declaration
public enum StructuredValue : Sendable, Hashable {

    /// A string value.
    case string(String)

    /// An integer value.
    case int(Int)

    /// A double-precision floating-point value.
    case double(Double)

    /// A Boolean value.
    case bool(Bool)

    /// A null value.
    case null

    /// An array of `StructuredValue` instances.
    case array([StructuredValue])

    /// A dictionary with string keys and `StructuredValue` instances as values.
    case dictionary([String : StructuredValue])

    /// The underlying value.
    public var value: Any { get }

    /// Returns a Boolean value indicating whether two values are equal.
    ///
    /// Equality is the inverse of inequality. For any values `a` and `b`,
    /// `a == b` implies that `a != b` is `false`.
    ///
    /// - Parameters:
    ///   - lhs: A value to compare.
    ///   - rhs: Another value to compare.
    public static func == (a: StructuredValue, b: StructuredValue) -> Bool

    /// Hashes the essential components of this value by feeding them into the
    /// given hasher.
    ///
    /// Implement this method to conform to the `Hashable` protocol. The
    /// components used for hashing must be the same as the components compared
    /// in your type's `==` operator implementation. Call `hasher.combine(_:)`
    /// with each of these components.
    ///
    /// - Important: In your implementation of `hash(into:)`,
    ///   don't call `finalize()` on the `hasher` instance provided,
    ///   or replace it with a different instance.
    ///   Doing so may become a compile-time error in the future.
    ///
    /// - Parameter hasher: The hasher to use when combining the components
    ///   of this instance.
    public func hash(into hasher: inout Hasher)

    /// The hash value.
    ///
    /// Hash values are not guaranteed to be equal across different executions of
    /// your program. Do not save hash values to use during a future execution.
    ///
    /// - Important: `hashValue` is deprecated as a `Hashable` requirement. To
    ///   conform to `Hashable`, implement the `hash(into:)` requirement instead.
    ///   The compiler provides an implementation for `hashValue` for you.
    public var hashValue: Int { get }
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByStringLiteral
Declaration
extension StructuredValue : ExpressibleByStringLiteral {

    /// Creates an instance initialized to the given string value.
    ///
    /// - Parameter value: The value of the new instance.
    public init(stringLiteral value: String)

    /// A type that represents an extended grapheme cluster literal.
    ///
    /// Valid types for `ExtendedGraphemeClusterLiteralType` are `Character`,
    /// `String`, and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias ExtendedGraphemeClusterLiteralType = String

    /// A type that represents a string literal.
    ///
    /// Valid types for `StringLiteralType` are `String` and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias StringLiteralType = String

    /// A type that represents a Unicode scalar literal.
    ///
    /// Valid types for `UnicodeScalarLiteralType` are `Unicode.Scalar`,
    /// `Character`, `String`, and `StaticString`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias UnicodeScalarLiteralType = String
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByIntegerLiteral
Declaration
extension StructuredValue : ExpressibleByIntegerLiteral {

    /// Creates an instance initialized to the specified integer value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using an integer literal. For example:
    ///
    ///     let x = 23
    ///
    /// In this example, the assignment to the `x` constant calls this integer
    /// literal initializer behind the scenes.
    ///
    /// - Parameter value: The value to create.
    public init(integerLiteral value: Int)

    /// A type that represents an integer literal.
    ///
    /// The standard library integer and floating-point types are all valid types
    /// for `IntegerLiteralType`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias IntegerLiteralType = Int
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByFloatLiteral
Declaration
extension StructuredValue : ExpressibleByFloatLiteral {

    /// Creates an instance initialized to the specified floating-point value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using a floating-point literal. For example:
    ///
    ///     let x = 21.5
    ///
    /// In this example, the assignment to the `x` constant calls this
    /// floating-point literal initializer behind the scenes.
    ///
    /// - Parameter value: The value to create.
    public init(floatLiteral value: Double)

    /// A type that represents a floating-point literal.
    ///
    /// Valid types for `FloatLiteralType` are `Float`, `Double`, and `Float80`
    /// where available.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias FloatLiteralType = Double
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByBooleanLiteral
Declaration
extension StructuredValue : ExpressibleByBooleanLiteral {

    /// Creates an instance initialized to the given Boolean value.
    ///
    /// Do not call this initializer directly. Instead, initialize a variable or
    /// constant using one of the Boolean literals `true` and `false`. For
    /// example:
    ///
    ///     let twasBrillig = true
    ///
    /// In this example, the assignment to the `twasBrillig` constant calls this
    /// Boolean literal initializer behind the scenes.
    ///
    /// - Parameter value: The value of the new instance.
    public init(booleanLiteral value: Bool)

    /// A type that represents a Boolean literal, such as `Bool`.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias BooleanLiteralType = Bool
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByArrayLiteral
Declaration
extension StructuredValue : ExpressibleByArrayLiteral {

    /// The type of the elements of an array literal.
    public typealias ArrayLiteralElement = StructuredValue

    /// Creates an instance initialized with the given elements.
    public init(arrayLiteral elements: StructuredValue...)
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : ExpressibleByDictionaryLiteral
Declaration
extension StructuredValue : ExpressibleByDictionaryLiteral {

    /// The key type of a dictionary literal.
    public typealias Key = String

    /// The value type of a dictionary literal.
    public typealias Value = StructuredValue

    /// Creates an instance initialized with the given key-value pairs.
    public init(dictionaryLiteral elements: (String, StructuredValue)...)
}
extension

StructuredValue

NewiOSmacOSvisionOSwatchOS
extension StructuredValue : Codable
Declaration
extension StructuredValue : Codable {

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws
}
extension

TestTrait

NewiOSmacOSvisionOSwatchOS
extension TestTrait where Self == EvaluationTrait
Declaration
extension TestTrait where Self == EvaluationTrait {

    /// Creates a trait that runs a single evaluation and makes its result available
    /// through the current evaluation context.
    public static func evaluates(_ evaluation: any Evaluation, info: [String : String] = [:]) -> Self
}
struct

ToolCallEvaluator

NewiOSmacOSvisionOSwatchOS
public struct ToolCallEvaluator<Input> : EvaluatorProtocol, Sendable where Input : ModelSampleProtocol, Input.Expectation == TrajectoryExpectation

An evaluator that verifies agentic tool calls against an expected trajectory.

Produces both a strict and partial result from a single evaluation pass.

Supports ordered sequences, unordered expectations, disallowed tool checks, and group steps.

let toolsAllPass = Metric("Tools All Pass")
let toolsPercentagePass = Metric("Tools Percentage Pass")

let evaluator = ToolCallEvaluator<ModelSample<String>>(
    allPass: toolsAllPass, percentagePass: toolsPercentagePass
)
Declaration
public struct ToolCallEvaluator<Input> : EvaluatorProtocol, Sendable where Input : ModelSampleProtocol, Input.Expectation == TrajectoryExpectation {

    /// The metric for the strict pass or fail result.
    public let allPass: Metric

    /// The metric for the partial score result.
    public let percentagePass: Metric

    /// Creates a new tool call expectations evaluator.
    ///
    /// The evaluator evaluates expectations once and produces two columns:
    /// a strict score (pass/fail) and a partial score (proportion matched).
    ///
    /// - Parameters:
    ///   - allPass: The metric for the strict pass or fail result.
    ///   - percentagePass: The metric for the partial score result.
    @available(anyAppleOS 27.0, *)
    @available(watchOS, unavailable)
    public init(allPass: Metric, percentagePass: Metric)

    /// Creates a new tool call expectations evaluator with a custom language model
    /// for semantic matching of `.naturalLanguage` argument matchers.
    ///
    /// - Parameters:
    ///   - allPass: The metric for the strict pass or fail result.
    ///   - percentagePass: The metric for the partial score result.
    ///   - argumentMatchModel: The language model to use for semantic matching.
    public init(allPass: Metric, percentagePass: Metric, argumentMatchModel: any LanguageModel)

    /// Computes metrics for the given subject, given the input sample.
    ///
    /// - Parameters:
    ///   - subject: The subject of evaluation, which the evaluation's `subject(from:)` method produces.
    ///   - input: The input sample that contains the expected value and other context.
    /// - Returns: An array of metrics produced by this evaluator.
    nonisolated(nonsending) public func metrics(subject: ModelSubject<Input.ExpectedValue>, input: Input) async throws -> [Metric]

    /// The type of the subject produced by the system under test.
    @available(anyAppleOS 27.0, *)
    @available(tvOS, unavailable)
    public typealias Subject = ModelSubject<Input.ExpectedValue>
}
struct

ToolExpectation

NewiOSmacOSvisionOSwatchOS
public struct ToolExpectation : Sendable, Codable

A specification for an expected tool call, or a group of expectations that can be satisfied in any order.

Most commonly, a ToolExpectation identifies a single tool by name and optionally validates its arguments:

ToolExpectation("getWeather", arguments: [
    .exact(argumentName: "location", value: "Paris, France")
])

For ordered sequences where multiple tools must all be called at the same position but their relative order doesn't matter, use anyOrder(_:):

ToolExpectation.anyOrder([
    ToolExpectation("fetchData"),
    ToolExpectation("fetchMetadata"),
])
Declaration
public struct ToolExpectation : Sendable, Codable {

    /// The name of the tool that the evaluation expects the model to call.
    ///
    /// This is only valid for single expectations. Accessing this on an ``anyOrder(_:)``
    /// group is a programming error.
    public var name: String { get }

    /// The argument matchers to validate against the tool call.
    ///
    /// Returns an empty array for ``anyOrder(_:)`` groups.
    public var arguments: [ArgumentMatcher] { get }

    /// A Boolean value that indicates whether this expectation represents a group of expectations
    /// that can be satisfied in any order.
    public var isAnyOrderGroup: Bool { get }

    /// Creates a new tool expectation.
    /// - Parameters:
    ///   - name: The name of the tool.
    ///   - arguments: The argument matchers to validate against the tool call.
    public init(_ name: String, arguments: [ArgumentMatcher] = [])

    /// Creates a group of expectations that must all be satisfied at the same
    /// sequential position, but can occur in any relative order.
    ///
    /// Only valid within the `ordered` array of a ``TrajectoryExpectation``.
    ///
    /// - Parameter expectations: The expectations that must all be satisfied.
    public static func anyOrder(_ expectations: [ToolExpectation]) -> ToolExpectation

    /// Creates a new instance by decoding from the given decoder.
    ///
    /// This initializer throws an error if reading from the decoder fails, or
    /// if the data read is corrupted or otherwise invalid.
    ///
    /// - Parameter decoder: The decoder to read data from.
    public init(from decoder: any Decoder) throws

    /// Encodes this value into the given encoder.
    ///
    /// If the value fails to encode anything, `encoder` will encode an empty
    /// keyed container in its place.
    ///
    /// This function throws an error if any values are invalid for the given
    /// encoder's format.
    ///
    /// - Parameter encoder: The encoder to write data to.
    public func encode(to encoder: any Encoder) throws

    /// An instance of the generation schema.
    nonisolated public static var generationSchema: GenerationSchema { get }

    /// This instance represented as generated content.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. Use the generated content property as shown below, to
    /// manually return a new ``GeneratedContent`` with the properties you specify.
    ///
    /// ```swift
    /// struct Person: ConvertibleToGeneratedContent {
    ///    var name: String
    ///    var age: Int
    ///
    ///    var generatedContent: GeneratedContent {
    ///        GeneratedContent(properties: [
    ///            "firstName": name,
    ///            "ageInYears": age
    ///        ])
    ///    }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleFromGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleFromGeneratedContent/init(_:)``.
    nonisolated public var generatedContent: GeneratedContent { get }

    /// A representation of partially generated content
    nonisolated public enum PartiallyGenerated : nonisolated ConvertibleFromGeneratedContent {

Truncated.

extension

ToolExpectation

NewiOSmacOSvisionOSwatchOS
extension ToolExpectation : nonisolated Generable
Declaration
extension ToolExpectation : nonisolated Generable {

    /// Creates an instance from content generated by a model.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. To manually initialize your type from generated content,
    /// decode the values as shown below:
    ///
    /// ```swift
    /// struct Person: ConvertibleFromGeneratedContent {
    ///     var name: String
    ///     var age: Int
    ///
    ///     init(_ content: GeneratedContent) {
    ///         self.name = try content.value(forProperty: "firstName")
    ///         self.age = try content.value(forProperty: "ageInYears")
    ///     }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleToGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleToGeneratedContent/generatedContent``.
    ///
    /// - SeeAlso: `@Generable` macro ``Generable(description:)``
    nonisolated public init(_ content: GeneratedContent) throws
}
struct

TrajectoryExpectation

NewiOSmacOSvisionOSwatchOS
public struct TrajectoryExpectation : Sendable, Codable

The expected pattern of tool calls for an evaluation.

TrajectoryExpectation(ordered: [
    ToolExpectation("authenticate"),
    ToolExpectation("processResults"),
])

TrajectoryExpectation specifies expected tool-calling behavior across three axes:

  • Ordered: Tool calls that must occur in a specific sequence. Use ToolExpectation

for single sequential steps, or anyOrder(_:) when multiple tools must all be called at a given position but their relative order doesn't matter.

  • Unordered: Tool calls that must occur at some point, regardless of when.
  • Disallowed: Tool calls that must NOT occur.
TrajectoryExpectation(ordered: [
    ToolExpectation("authenticate"),
    ToolExpectation("processResults"),
])
TrajectoryExpectation(ordered: [
    ToolExpectation("authenticate"),
    .anyOrder([
        ToolExpectation("fetchData"),
        ToolExpectation("fetchMetadata"),
    ]),
    ToolExpectation("processResults"),
], allowsAdditionalToolCalls: false)
TrajectoryExpectation(
    ordered: [
        ToolExpectation("findActivities"),
        ToolExpectation("estimateTravelTime"),
    ],
    unordered: [ToolExpectation("getWeather")],
    disallowed: [ToolExpectation("deleteData")]
)
TrajectoryExpectation(expected: "getWeather", arguments: [
    .exact(argumentName: "location", value: "Paris, France")
])
Declaration
public struct TrajectoryExpectation : Sendable, Codable {

    /// Tool call steps that must be satisfied in sequential order.
    ///
    /// Each entry is either a single ``ToolExpectation`` or an ``ToolExpectation/anyOrder(_:)``
    /// group where multiple tools must all be called at that position (in any relative order).
    public var ordered: [ToolExpectation]

    /// Tool calls that must occur at some point, regardless of position.
    public var unordered: [ToolExpectation]

    /// Tools that the model must NOT call.
    ///
    /// If a disallowed expectation includes argument matchers, only calls matching
    /// those specific arguments trigger a failure — the model can still call the tool with
    /// different arguments.
    public var disallowed: [ToolExpectation]

    /// A Boolean value that indicates whether to allow tool calls that don't match any expectation.
    ///
    /// When `false`, any unmatched tool call causes evaluation to fail.
    /// When `true` (the default), unmatched calls are ignored as long as
    /// all expectations are met.
    public var allowsAdditionalCalls: Bool

    /// Creates a trajectory expectation with ordered and unordered requirements,
    /// and controls whether unmatched tool calls are permitted.
    ///
    /// Use this initializer when you want to control the blanket policy for
    /// unexpected tool calls. To forbid specific tools instead, use
    /// ``init(ordered:unordered:disallowed:)``.
    ///
    /// - Parameters:
    ///   - ordered: Steps that must be satisfied in sequential order.
    ///   - unordered: Tool calls that must occur at some point, regardless of position.
    ///   - allowsAdditionalToolCalls: A Boolean value indicating whether to allow tool calls that don't match any expectation; defaults to `true`
    public init(ordered: [ToolExpectation] = [], unordered: [ToolExpectation] = [], allowsAdditionalToolCalls: Bool = true)

    /// Creates a trajectory expectation with ordered and unordered requirements,
    /// plus specific tools that must not be called.
    ///
    /// Additional tool calls beyond the expected ones are always allowed when
    /// using disallowed expectations — the disallowed list targets specific tools
    /// while permitting everything else. To disallow *all* unexpected calls instead,
    /// use ``init(ordered:unordered:allowsAdditionalToolCalls:)`` with
    /// `allowsAdditionalToolCalls: false`.
    ///
    /// - Parameters:
    ///   - ordered: Steps that must be satisfied in sequential order.
    ///   - unordered: Tool calls that must occur at some point, regardless of position.
    ///   - disallowed: Tools that must NOT be called.
    public init(ordered: [ToolExpectation] = [], unordered: [ToolExpectation] = [], disallowed: [ToolExpectation])

    /// Creates a trajectory expectation with only unordered requirements.
    ///
    /// Additional calls are always allowed for unordered-only expectations.
    ///
    /// - Parameters:
    ///   - unordered: Tool calls that must occur at some point, regardless of position.
    public init(unordered: [ToolExpectation])

    /// Creates a trajectory expectation for a single expected tool call.
    ///
    /// - Parameters:
    ///   - toolName: The name of the tool expected to be called.
    ///   - arguments: The argument matchers to validate.
    public init(expected toolName: String, arguments: [ArgumentMatcher] = [])

    /// An instance of the generation schema.
    nonisolated public static var generationSchema: GenerationSchema { get }

    /// This instance represented as generated content.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. Use the generated content property as shown below, to
    /// manually return a new ``GeneratedContent`` with the properties you specify.
    ///
    /// ```swift
    /// struct Person: ConvertibleToGeneratedContent {

Truncated.

extension

TrajectoryExpectation

NewiOSmacOSvisionOSwatchOS
extension TrajectoryExpectation : nonisolated Generable
Declaration
extension TrajectoryExpectation : nonisolated Generable {

    /// Creates an instance from content generated by a model.
    ///
    /// Conformance to this protocol is provided by the `@Generable` macro.
    /// A manual implementation may be used to map values onto properties using
    /// different names. To manually initialize your type from generated content,
    /// decode the values as shown below:
    ///
    /// ```swift
    /// struct Person: ConvertibleFromGeneratedContent {
    ///     var name: String
    ///     var age: Int
    ///
    ///     init(_ content: GeneratedContent) {
    ///         self.name = try content.value(forProperty: "firstName")
    ///         self.age = try content.value(forProperty: "ageInYears")
    ///     }
    /// }
    /// ```
    ///
    /// - Important: If your type also conforms to ``ConvertibleToGeneratedContent``,
    /// it is critical that this implementation be symmetrical with ``ConvertibleToGeneratedContent/generatedContent``.
    ///
    /// - SeeAlso: `@Generable` macro ``Generable(description:)``
    nonisolated public init(_ content: GeneratedContent) throws
}
extension

Transcript

NewiOSmacOSvisionOSwatchOS
extension Transcript
Declaration
extension Transcript {

    /// A structured representation of this transcript, with tool calls, outputs, and responses collected into typed arrays.
    public var structuredTranscript: StructuredTranscript { get }
}
typealias

ArgumentValue.BooleanLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias BooleanLiteralType = Bool

A type that represents a Boolean literal, such as Bool.

typealias

ArgumentValue.ExtendedGraphemeClusterLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias ExtendedGraphemeClusterLiteralType = String

A type that represents an extended grapheme cluster literal.

Valid types for ExtendedGraphemeClusterLiteralType are Character, String, and StaticString.

typealias

ArgumentValue.FloatLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias FloatLiteralType = Double

A type that represents a floating-point literal.

Valid types for FloatLiteralType are Float, Double, and Float80 where available.

typealias

ArgumentValue.IntegerLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias IntegerLiteralType = Int

A type that represents an integer literal.

The standard library integer and floating-point types are all valid types for IntegerLiteralType.

typealias

ArgumentValue.StringLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias StringLiteralType = String

A type that represents a string literal.

Valid types for StringLiteralType are String and StaticString.

typealias

ArgumentValue.UnicodeScalarLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias UnicodeScalarLiteralType = String

A type that represents a Unicode scalar literal.

Valid types for UnicodeScalarLiteralType are Unicode.Scalar, Character, String, and StaticString.

typealias

EvaluationTrait.TestScopeProvider

NewiOSmacOSvisionOSwatchOS
public typealias TestScopeProvider = EvaluationTrait

The type of the test scope provider for this trait.

The default type is Never, which can't be instantiated. The scopeProvider(for:testCase:)-cjmg method for any trait with Never as its test scope provider type must return nil, meaning that the trait doesn't provide a custom scope for tests it's applied to.

@Metadata { @Available(Swift, introduced: 6.1) @Available(Xcode, introduced: 16.3) }

typealias

Evaluator.Subject

NewiOSmacOSvisionOSwatchOS
public typealias Subject = ModelSubject<Input.ExpectedValue>

The type of the subject produced by the system under test.

init

ModelJudgeEvaluator.init

NewiOSmacOSvisionOS
public init(_ name: String, scale: ScoringScale, judge: any LanguageModel = SystemLanguageModel(), scoringMode: ScoringMode = .discrete)
init

ModelJudgeEvaluator.init

NewiOSmacOSvisionOS
public init(judge: any LanguageModel = SystemLanguageModel(), dimensions: [ScoreDimension], scoringMode: ScoringMode = .discrete)
init

ModelJudgeEvaluator.init

NewwatchOS
public init(_ name: String, scale: ScoringScale, judge: any LanguageModel, scoringMode: ScoringMode = .discrete)
init

ModelJudgeEvaluator.init

NewwatchOS
public init(judge: any LanguageModel, dimensions: [ScoreDimension], scoringMode: ScoringMode = .discrete)
typealias

ModelJudgeEvaluator.Subject

NewiOSmacOSvisionOSwatchOS
public typealias Subject = ModelSubject<Input.ExpectedValue>

The type of the subject produced by the system under test.

typealias

ModelSample.Expectation

NewiOSmacOSvisionOSwatchOS
public typealias Expectation = TrajectoryExpectation

The type of evaluation expectations (e.g., TrajectoryExpectation).

typealias

ModelSample.Input

NewiOSmacOSvisionOSwatchOS
public typealias Input = ModelSampleInput

The type of the input data.

The type must be string-representable for display.

typealias

StructuredValue.BooleanLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias BooleanLiteralType = Bool

A type that represents a Boolean literal, such as Bool.

typealias

StructuredValue.ExtendedGraphemeClusterLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias ExtendedGraphemeClusterLiteralType = String

A type that represents an extended grapheme cluster literal.

Valid types for ExtendedGraphemeClusterLiteralType are Character, String, and StaticString.

typealias

StructuredValue.FloatLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias FloatLiteralType = Double

A type that represents a floating-point literal.

Valid types for FloatLiteralType are Float, Double, and Float80 where available.

typealias

StructuredValue.IntegerLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias IntegerLiteralType = Int

A type that represents an integer literal.

The standard library integer and floating-point types are all valid types for IntegerLiteralType.

typealias

StructuredValue.StringLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias StringLiteralType = String

A type that represents a string literal.

Valid types for StringLiteralType are String and StaticString.

typealias

StructuredValue.UnicodeScalarLiteralType

NewiOSmacOSvisionOSwatchOS
public typealias UnicodeScalarLiteralType = String

A type that represents a Unicode scalar literal.

Valid types for UnicodeScalarLiteralType are Unicode.Scalar, Character, String, and StaticString.

init

ToolCallEvaluator.init

NewiOSmacOSvisionOS
public init(allPass: Metric, percentagePass: Metric)

Creates a new tool call expectations evaluator.

The evaluator evaluates expectations once and produces two columns: a strict score (pass/fail) and a partial score (proportion matched).

Parameters

allPass
The metric for the strict pass or fail result.
percentagePass
The metric for the partial score result.
typealias

ToolCallEvaluator.Subject

NewiOSmacOSvisionOSwatchOS
public typealias Subject = ModelSubject<Input.ExpectedValue>

The type of the subject produced by the system under test.

No APIs match your filter.

← More in Apple Intelligence, ML & Evaluation