AI systems need to be safe, reliable, and trustworthy, and their operation needs to be explainable. However, although the AI Act has now been enacted, the harmonized standard that defines how an audit of an AI system (conformity assessment) should be conducted, is still awaited. Another frequently heard argument is that, in the absence of such standards, carrying out meaningful AI audits is difficult. We contend, however, that auditors can already play their role: AI systems can be audited effectively through a structured, risk-based approach, illustrated with a high-risk AI example.
Imagine applying for a loan and being rejected by an AI system – without knowing why. This is the reality for many users of systems like SmartLoan-App, a fictional AI tool used by a fintech company to assess creditworthiness. This article explores how auditors can provide assurance for such opaque systems, especially under the upcoming AI Act.
Introduction
Background
Artificial intelligence (AI) has moved from science fiction into everyday reality. Today, AI systems are increasingly used to make autonomous decisions, in fields such as healthcare (for diagnostics and treatment recommendations), finance (for fraud detection and algorithmic trading), and transportation (for traffic management and autonomous driving). However, as reliance on these systems grows, so does the risk of things going wrong. Especially with so-called black box models. These are AI models whose internal workings are not transparent, making it difficult to understand how decisions are made or to identify why errors occur. As these opaque systems are deployed in more critical domains, the potential impact of incorrect or biased outcomes becomes significantly more serious.
To ensure that AI systems are safe, reliable, and trustworthy, auditors have a vital role in independently reviewing these systems and fostering public trust. However, to effectively fulfil this role, auditors must develop robust and adaptive assurance strategies. This is challenging, as there is still a common understanding that no clear standard exists for AI audits, which makes some auditors hesitant to engage in this area. At the same time, the few emerging standards that aim to fill this gap, face criticism ([ICAE25]). The recent ISO 42006:2025 ([ISO25]) is a valuable step that sets requirements for auditing AI management systems (ISO 42001) ([ISO25]), but it is still new, not yet widely adopted, and narrowly scoped. It does not cover the assurance of individual AI systems and has not yet been aligned with broader assurance frameworks such as ISAE 3000 ([IAAS15]) or SOC 2 ([AICP17]). As a result, ISO 42001 or ISO 42006 alone cannot provide the level of assurance required under the EU AI Act.
In this article, we argue that ambiguity in applicable standards should not stop auditors from getting involved in AI assurance. Auditors can rely on their creativity, generic IT audit experience, and professional judgment to assess AI systems, using practical techniques such as adversarial testing and evaluating the consistency of outputs across different scenarios. Join us as we explore the complexities of AI assurance and share strategies to help uphold accountability in an AI-driven world. With the AI Act set to change how AI is built and used, this article, including a practical example (SmartLoan-App), offers clear insights to support auditors in adapting to these new demands.
To bring the key concepts to life, this article features a fictional case study: SmartLoan-App, an AI-driven system developed by the fintech company SmartLoan to evaluate customer eligibility for financial loans.
Running example case study
As described, SmartLoan is a financial technology company that uses an AI system (SmartLoan-App) to help decide which customers can get a financial loan. The system looks at the financial details from an applicant, such as their income, recurring expenses (e.g. mortgage, rent), payment behaviour and debts. Based on this information, it determines a credit risk score that is used to:
- Decide if an applicant can get a loan;
- Determine the risk of the loan;
- Set the interest for the loan.
The system also gives explanations for its decisions using the method of counterfactuals. In this way, applicants can understand why their application was approved or denied.
How the SmartLoan-App system works
The system is based on supervised learning technology, using historical data of (bad) applicants to identify similar patterns for new applicants. In overview, the systems work as follows:
- Data handling: SmartLoan-App collects and processes the data provided by loan applicants, like salaries, monthly costs and outstanding loans.
- Data verification: with help of open sources and paid subscriptions with data providers, SmartLoan-App tries to verify the inputted data is accurate.
- Risk prediction: using machine learning, the system predicts how likely it is that someone will pay back the loan within the set timeframe of the request. Based on this, the systems calculate a risk score.
- Decision process: based on the risk score and a threshold, applicants are divided into three groups:
- Approved automatically;
- Rejected automatically;
- Flagged for manual review by SmartLoan staff
- Customer explanation with counterfactuals: as the system itself is too opaque to a certain level, SmartLoan-App is using counter-factional reasoning instead of regular explanations. This means the system informs customers about the changes they could make to improve their risk score and increase their chances of obtaining the loan. For example, it might say, “if your monthly income was EUR 500 higher, or your mortgage was EUR 50,000 lower, you would have been approved”.
- System transparency: on top of the individual explanations, SmartLoan-App is also explained to a certain degree of detail on the website.
SmartLoan-App relevance
The system is used for financial decisions that are important for both customers and SmartLoan itself. SmartLoan decided to use this AI-based system for efficiency purposes. The company wants to minimize costs, in order to be competitive in the market of large banks that can provide lower interest rates because of their size and scale. On top of this, SmartLoan tries to be more user friendly by providing clear guidance on approval or denials of loan requests.
Rules and regulations
SmartLoan as a company operates in the highly regulated financial sector. There are some important regulations that the company (and the system) should comply with, including at least the following:
- Data Protection Laws (e.g. GDPR [EC16]): Decisions need to be fair and explainable to customers. Customers have to be informed properly about the system, plus there needs to be an alternative route for processing requests.
- Financial Regulations: Financial companies are obliged to follow various regulations, for example in relation to Customer Due Diligence (CDD) and various types of transaction monitoring. On top of this, SmartLoan is required to comply with other regulations in the financial domain as well.
- AI Act ([EC24]) (will be further introduced later in this article): SmartLoan-App is a clear example of a high-risk AI system under the AI Act due to its alignment with the Act’s definition of AI, its inclusion in the high-risk application areas outlined in Annex III and the fact that it is both developed and deployed in-house. The AI Act specifically classifies AI systems used to assess creditworthiness or establish credit scores of individuals as high-risk, given their significant impact on access to essential services such as housing, electricity and telecommunications (see for relationship between AI and Privacy also [Idem24]). These systems can pose risks of discrimination by reinforcing biases based on factors like race, gender, age, or socioeconomic status. As SmartLoan-App processes sensitive financial data to make automated credit decisions, it falls directly under this category and must comply with the stringent regulatory requirements of the AI Act. Additionally, since the organization acts as both the provider and deployer of the system, it bears the responsibility of ensuring compliance throughout the AI lifecycle – from development to deployment. This dual role requires adherence to key obligations such as transparency, risk management and conformity assessments to prevent potential harm and ensure fair, unbiased outcomes.
What is AI Assurance?
AI Assurance is the process of evaluating an AI system against predetermined audit criteria by an independent auditor to establish a certain level of confidence in the system’s compliance with quality standards. We will clearly indicate that our initial focus is on systems that function as black boxes (like our running example of the SmartLoan-App). We will emphasize the urgency of having an unbiased assessment to ensure that AI systems are developed, used and monitored properly, even without direct access to the inner workings of the system. Especially in relation to the upcoming legislation. We have to take on our responsibility as auditors. In the case of SmartLoan-App, assurance means verifying that the system’s credit decisions are fair, reliable, and secure, even if the model’s inner workings are not fully transparent.
Breaking through the uncertainty in AI Assurance
As shortly mentioned in the introduction of this article, we believe the need for AI Assurance will increase significantly, especially with the upcoming AI Act. However, we observe that typically two obstacles are indicated to hinder the broad adoption of AI assurance practices.
- Complexity of Black Box Models – Many AI systems, particularly those involving black box models, add another layer of complexity. These models often lack transparency, making it challenging to understand and evaluate their decision-making processes and use existing audit frameworks. Without direct access to the inner workings of these systems, auditors struggle to provide assurance that these AI applications operate within ethical boundaries and meet quality standards.
- Ambiguity of Auditing standards for AI – A key challenge is the perception that no clear standard yet exists for AI audits, making auditors hesitant to take on such work. Existing standards that aim to fill this gap also face criticism. For example, the new ISO 42006:2025, which provides guidance on auditing AI management systems (ISO 42001), is seen as narrowly scoped, as it does not cover all aspects of AI assurance ([ICAE25]).
Qualitative and quantitative testing
Despite the challenges, providing assurance over AI systems is both possible and necessary. In the absence of comprehensive standards, auditors can adopt a proactive and creative approach to assess these systems. A useful distinction introduced by Algorithm Audit ([Algo25]) can be made between qualitative and quantitative testing. In this article, we interpretate this distinction as follows.
Qualitative testing focuses on the control environment around the AI system: how it is developed, governed, and monitored. This includes evaluating areas such as governance structures, data quality, model lifecycle management, and human oversight. It aligns with the scope of ISO 42001, which introduces the concept of an AI Management System. By assessing these surrounding processes, auditors can form a view of whether the system has been developed responsibly and ethically.
Quantitative testing, on the other hand, is outcome-based and involves directly examining the behavior of the AI system itself using statistical or mathematical methods. This includes techniques such as adversarial testing – where the system is exposed to challenging inputs to test its robustness – and output consistency checks across diverse scenarios. Methods and techniques such as SHAP ([Lund17]) or LIME ([Tuli16]) can be used to explore model explainability, while metrics for bias detection, performance benchmarking, and fairness provide further insight into how the system operates in practice.
Together, these two forms of testing allow auditors to deliver meaningful assurance even when the inner workings of AI models remain opaque. By combining process-focused evaluation with data-driven analysis, auditors can help ensure that AI systems are developed, used, and monitored in a responsible and trustworthy way.
AI Assurance and the AI Act
The European Union’s AI Act introduces significant regulatory measures, particularly for AI systems that are either classified as prohibited – such as social scoring and predictive policing – or as high-risk. High-risk AI systems, as defined by the legislation, have the potential to impact fundamental human rights, necessitating stringent risk management measures. These systems are not only critical from a compliance standpoint but also present considerable interest from an audit and assurance perspective due to their far-reaching impact and associated risks for the organizations that deploy these systems.
Before we discuss how audits can help to ensure compliance and increase trust across an organization, it is crucial to understand the criteria that classify an AI system as high risk. Without summarizing the entire legislation, three key aspects should be considered:
- Definition of AI: The AI Act adopts a definition based on the OECD formulation, which characterizes AI systems as a machine-based system that is designed to operate with varying levels of autonomy and that may exhibit adaptiveness after deployment and that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments [1]. Notably, the definition’s broad scope means that many AI systems will fall under the Act’s jurisdiction.
- Scope of Application: The AI Act’s broad definition is further refined in Annex III [2], which delineates specific areas of application that determine whether an AI system is considered high-risk. If an AI system does not fall within one of these categories, it may either be prohibited or classified as lower risk.
- Provider vs. Deployer: It is essential for organizations to determine whether they are merely deploying a high-risk AI system or also acting as its provider. In both cases, organizations must be aware of the associated risks and implement appropriate measures. However, providers carry additional responsibilities in ensuring compliance.
Requirements of the AI Act
The AI Act is an extensive piece of legislation, with the bulk of its provisions specifically targeting high-risk AI systems. Given the classification of SmartLoan-App as high risk, it is imperative for the organization to familiarize itself with the relevant requirements and take proactive measures to ensure compliance. While an exhaustive discussion of all requirements is beyond the scope of this article, the following key areas relevant to audit and assurance efforts deserve particular attention:
- Transparency: High-risk AI systems must adhere to strict transparency obligations, ensuring that users and stakeholders have clear insights into the AI system’s functionality and decision-making processes.
- Registry Obligations: Organizations deploying high-risk AI systems are required to register them in an EU-wide database, facilitating regulatory oversight and public accountability.
- Conformity Assessment: This process is akin to CE marking in other industries, requiring organizations to evaluate their AI systems’ compliance with the AI Act and other applicable regulations. In certain cases, third-party conformity assessments may be mandated by notified bodies.
The role of AI Assurance in ensuring compliance
While specific details are still being refined through harmonized standards, conformity assessments serve as a key compliance mechanism under the AI Act. They allow organizations to systematically evaluate whether their AI systems meet regulatory requirements. From an audit perspective, the conformity assessment process can provide valuable insights into:
- Identifying gaps in compliance and recommending corrective actions.
- Assessing documentation and evidence to ensure transparency and accountability.
- Validating risk management frameworks to mitigate potential harm.
The next sections of this article will provide an in-depth exploration of how AI audits should be structured, leveraging the expected requirements of conformity assessments and aligning them with established quality criteria for effective AI audits.
In conclusion, organizations deploying or developing high-risk AI systems must remain on top of their game to understand and implement the AI Act’s requirements. Proactive engagement with audit and assurance processes will not only ensure regulatory compliance but also build trust and credibility in the responsible use of AI technologies.
Key elements for AI Assurance
Using the SmartLoan-App example, we will now begin by describing the comparable structure of an AI Assurance engagement in relation to regular assurance work. Subsequently, we will run through the aspects that are most distinctive for AI systems and in practice most related to AI Act requirements.
The AI Assurance process
An AI assurance engagement follows the same fundamental structure as traditional IT assurance audits (see Figure 1). It begins with defining the scope, identifying the AI system to be assessed, its key components, objectives and relevant evaluation criteria, which may differ from standard IT audits. Next, a risk analysis is conducted based on the identified criteria, determining the key risks and shaping the audit procedures. Since the auditor’s goal is to assess compliance with the selected criteria, not all risks may be included in the final scope. Based on the risk analysis, an audit approach is established, and a work program is created, outlining the necessary procedures. The execution phase follows, where the auditor performs assessments similar to traditional IT audits, but with additional complexity due to the nature of AI models. Finally, the audit concludes with validating findings and reporting results. While AI audits pose unique challenges, their overall structure remains aligned with IT assurance engagements. In the next paragraphs, we will dive into some of these challenges and their proposed solutions.
Figure 1. Generic AI audit approach (KPMG). [Click on the image for a larger image]
Audit/quality criteria
The key to choosing the correct audit criteria is to know what the audit aims to achieve. For example, the aim could be to assess how trustworthy the system is, or to perform a conformity assessment of the AI system to comply with AI Act requirements.
The first step in choosing the right audit criteria should be to decide which aspects of trustworthiness are relevant to the audit. The second step is to determine which control objectives and controls could be established to cover the audit criteria from an assurance perspective. These criteria may vary in their importance or relevance depending on the type, domain and purpose of the AI system. For example, explainability may be more crucial for a high-stakes AI system that affects human lives or rights, than for a low-stakes AI system that provides entertainment or convenience. Similarly, compliance may be more stringent for an AI system that operates in a regulated industry or market, than for an AI system that operates in an unregulated or emerging field.
Therefore, the criteria are not fixed or universal, but rather dynamic and adaptable and should be tailored and adjusted to the specific characteristics and objectives of the AI system, as well as the expectations and preferences of the stakeholders. The auditor should use these criteria as a starting point and reference, not as a checklist or a prescription, and should exercise professional judgment and discretion when applying and evaluating them.
To mitigate this risk of adaptable criteria introducing inconsistency between audits, organizations should adopt recognized frameworks (e.g., ISO/IEC 42001, ISO/IEC 42006, NIST AI Risk Management Framework [NIST23]) and clearly define context-specific criteria aligned with risk levels and use cases. Transparency in how criteria are selected and applied is essential, as is involving multidisciplinary stakeholders in their development. Regular reviews and updates of criteria ensure they remain relevant as AI systems and regulatory expectations evolve.
As an example of how to use these criteria in practice, we have selected the criteria of fairness, reliability and security. For the sake of the defined case study, we have applied these criteria to the case study of SmartLoan and examined how they can help assess the quality and performance of the AI system, as well as identify potential issues and risks.
- Fairness: This criterion refers to the AI system’s ability to avoid bias and discrimination and to treat all users fairly and equitably. Fairness is important for ensuring the ethical and social responsibility of the AI system, as well as complying with human rights and anti-discrimination laws. In the case of SmartLoan-App, fairness is essential, as the AI system may have an impact on the customers’ access to credit and financial opportunities, and may create or exacerbate social inequalities. Therefore, the auditor should evaluate how the AI system defines and measures fairness, how it identifies and mitigates bias in its data and models and how it ensures diversity and inclusion in its design and development.
- Reliability: This criterion refers to the ability of the AI system to perform consistently and accurately, according to its specifications and intended purposes. In the case of SmartLoan-App, reliability is crucial, as the AI system determines the eligibility and affordability of the loans and may affect the company’s reputation and profitability. Therefore, the auditor should evaluate how the AI system handles data quality and completeness, how it validates and verifies its models and outputs and how it monitors and corrects its performance over time.
- Security: This criterion refers to the ability of the AI system to prevent unauthorized access, use, modification, or destruction of its data, models, or outputs. In the case of SmartLoan-App, security is critical, as the AI system handles sensitive and valuable information and may be exposed to cyberattacks and fraud. Therefore, the auditor should evaluate how the AI system implements security policies and standards, how it encrypts and authenticates its data and models and how it detects and responds to security incidents and breaches.
Scope of the audit
In this paragraph, we will outline how to scope an AI audit, by introducing the elements (see Figure 2):
- AI Management System
- Lifecycle Management and
- Operations.
These three components form the backbone of a robust AI assurance framework, and their evaluation is essential to provide comprehensive insights into the AI system’s trustworthiness, reliability and compliance.
Figure 2. AI governance & assurance framework ([Hout24]). [Click on the image for a larger image]
AI Management System
The AI Management System (AI-MS) encompasses strategic and organizational considerations that ensure trustworthy AI. This includes governance, roles and responsibilities, AI policy, compliance, audit and certification. Additionally, it covers communication and training related to AI. To audit this component, auditors should assess the establishment and implementation of governance structures, review the AI policies that define the organization’s ethical and operational standards, and evaluate the training programs provided to employees. The auditor must prioritize areas such as organizational governance, risk assessment and compliance with regulatory frameworks (e.g. AI Act).
AI Lifecycle Management
The AI Lifecycle Management domain focuses on the development and management of AI systems from conception to decommissioning. Key phases include requirements, design, build, test/deploy, monitor and decommission. Each phase should be audited to ensure adherence to best practices and standards. For example, during the design phase, the auditor should evaluate data quality and management, while the build phase should include rigorous model validation and testing. The auditor should prioritize phases based on their impact on the overall AI system, with a particular emphasis on testing and monitoring for ongoing reliability and accuracy.
AI Operations
AI Operations cover the functionality of AI systems, focusing on security, data quality and algorithm validation. This includes operational controls to safeguard systems from unauthorized access and ensure data integrity. Auditing this domain requires a detailed examination of security measures, model validation processes and data quality controls. The auditor should prioritize security policies, including encryption and access controls and ensure that robust validation and testing protocols are in place to maintain the AI system’s integrity and performance.
By thoroughly evaluating these components, auditors can provide a comprehensive assessment of the AI system’s assurance. The depth of this examination is determined by the complexity of the AI system and the potential risks associated with its deployment, as detailed in the section “Audit procedures”.
AI system
Not included in the diagram is the AI system itself. The components described above primarily relate to the qualitative scope, focusing on how the system was developed, governed, and monitored. However, for quantitative testing, it’s equally important to consider the technical aspects of the AI system. This involves understanding characteristics such as the model type, complexity, training data, input-output behavior, and explainability. Only by combining an understanding of both the technical design of the AI system and the real-world processes it supports can auditors effectively identify risks, define appropriate testing strategies, and determine the required level of assurance. This comprehensive scoping is essential for delivering meaningful and trustworthy AI assurance.
Audit procedures
To conduct an effective AI assurance engagement, auditors should apply a combination of qualitative and quantitative audit procedures. These procedures are adapted from traditional IT audit methodologies ([Boer22]) but are tailored to address the unique complexities and risks associated with AI systems, especially those involving opaque or black-box models.
Qualitative procedures
Qualitative procedures focus on evaluating the control environment and organizational context in which the AI system operates. These include assessing the governance, policies, documentation and internal controls that guide and monitor AI development, deployment and use.
Common qualitative techniques include:
- Entity-Level Control Reviews – Assessment of overarching AI governance, ethical/fairness policies, roles and responsibilities and organizational readiness.
- Documentation Inspection – Review of model development artifacts, risk assessments, version control logs and operational documentation.
- Control Testing – Evaluation of the design and operational effectiveness of specific controls related to data governance, change management and access control.
- Code Review (non-substantive) – High-level examination of programming and documentation practices to confirm adherence to internal development standards.
Quantitative Procedures
Quantitative procedures involve direct technical validation of the AI system’s behavior. These methods aim to verify the integrity, consistency and fairness of model outputs using data-driven and statistical approaches.
Key quantitative techniques include:
- Substantive Testing – Independent testing of model predictions using curated datasets to validate decision accuracy, fairness and output stability.
- Reperformance – Re-executing model training or scoring using the same input data and configurations to validate reproducibility and control over non-deterministic behavior.
- Independent Testing – Evaluation of the AI model using novel or adversarial inputs to observe its robustness and bias mitigation effectiveness.
- Functionality Replication – Development of a benchmark or reference model to serve as a comparative control, highlighting discrepancies in output logic or performance.
The selection of techniques should be aligned with the audit objectives, risk profile and technical maturity of the AI system. High-risk systems, particularly those under the scope of the AI Act, warrant a deeper focus on quantitative assurance methods to verify compliance and detect unintended discriminatory impacts.
Running example SmartLoan
To illustrate how AI assurance procedures can be applied, we return to SmartLoan-App – a high-risk AI system used for credit scoring. The system is assessed using three core audit criteria: fairness, reliability and security. For each, we apply one qualitative and one quantitative audit procedure to demonstrate a balanced approach.
Fairness
Objective: Ensure that the AI system does not exhibit bias or result in unjustified discrimination.
- Qualitative Procedure – Control Testing:
Review whether SmartLoan has implemented formal fairness policies during model development, such as guidelines for inclusive training data and documented bias mitigation steps. - Quantitative Procedure – Substantive Testing:
Test a sample dataset to detect disparate impact by comparing approval rates across different demographic groups (e.g. gender or age), using fairness metrics such as statistical parity difference.
Reliability
Objective: Confirm that the AI model performs consistently and accurately across varying conditions.
- Qualitative Procedure – Documentation Inspection:
Assess whether SmartLoan maintains up-to-date performance monitoring logs, including error rates and (logging of) drift detection thresholds. - Quantitative Procedure – Reperformance:
Re-run the model using the same input data and configurations to confirm consistent credit scoring results, validating that the model is stable and reproducible. Ideally, this is combined with substantive work, doing manual calculations next to the model to determine if the same outcome is derived.
In the case of SmartLoan-App, reperformance testing could involve inputting a set of previously processed applications (using the same financial data such as income, mortgage size, or debt levels) and confirming that the system generates the same credit risk scores and loan decisions (approved, rejected, or flagged for review) as it did originally. To enhance reliability, auditors should perform manual calculations for a sample of these applications, applying the same business rules and thresholds to determine whether they lead to the same results. This dual approach helps ensure the technical consistency of the system and provides assurance that no unintended changes have affected the model’s behavior.
Security
Objective: Ensure the AI system safeguards sensitive data and resists unauthorized access or manipulation.
- Qualitative Procedure – Control Testing:
Verify the implementation of access controls and encryption protocols for model assets and applicant data, including user role separation. Perform adversarial testing to observe its behavior. - Quantitative Procedure – Inspection with Technical Validation:
Examine a small sample of data handling operations (approx. 5 – 10) to validate that encryption and masking techniques are applied correctly and data is securely stored and transmitted.
This focused application of one qualitative and one quantitative audit procedure per criterion provides a practical, risk-based approach to AI assurance. It also supports clarity in reporting and can be scaled depending on system complexity and assurance objectives.
Reporting
Reporting the results of an AI assurance engagement presents unique challenges, particularly when assessing black box models. Unlike traditional IT audits, AI systems often involve probabilistic behavior, limited interpretability and ongoing model evolution, all of which affect how findings should be communicated.
Auditors must clearly define the scope of the audit, including which components of the AI system were reviewed (e.g., data inputs, model outputs), the evaluation criteria used (such as fairness, reliability, or security) and the time period under consideration. The audit report should explicitly distinguish between procedures that support reasonable assurance (such as substantive testing) and those that support limited assurance (such as control or documentation reviews).
Given the dynamic nature of AI, auditors should avoid statements that imply certainty about future performance. Instead, they should focus on whether the system met defined quality criteria at the time of assessment. Any assumptions or limitations – such as restricted access to training data or proprietary algorithms – should be disclosed to ensure transparency.
While AI audits may include techniques like stress testing or independent model validation, these cannot guarantee future behavior. Therefore, the report should manage expectations accordingly and emphasize the retrospective nature of the assurance.
To support decision makers, the report should present a clear summary of key findings, associated risks and actionable recommendations, while also highlighting areas that may require ongoing monitoring. In doing so, the auditor provides not only an assessment of current compliance but also valuable guidance for future AI system governance.
Discussion
As AI assurance continues to evolve, two central issues emerge:
- the need for dedicated standards and
- the skills required to perform effective AI audits.
Dedicated standards on AI Assurance
While existing IT assurance frameworks provide a useful starting point, they do not fully capture the unique characteristics of AI systems, particularly those that operate as opaque or dynamic “black boxes.” The absence of clear and widely accepted standards creates hesitation among auditors and uncertainty for organizations. Recent developments, such as ISO 42006:2025, are a step forward by setting requirements for auditing AI management systems (ISO 42001), but their scope remains limited and they do not yet address the assurance of individual AI systems.
Looking ahead, purpose-built AI assurance standards will likely be introduced to complement existing approaches. Such standards can provide practical guidance on how to assess quality criteria like fairness and robustness, on how to set appropriate audit boundaries, and on what constitutes “sufficient” evidence in a probabilistic and evolving context. Clear thresholds for auditability and greater consistency in reporting would also strengthen comparability and confidence across AI audits.
That said, all of the above should not be seen as a reason for inaction. Auditors should already apply their creativity, experience, and professional skill set to provide meaningful AI assurance today by adapting proven IT audit practices and using techniques such as adversarial testing and consistency checks across different scenarios.
Skills required to perform an AI Assurance engagement
At the same time, the complexity of AI systems demands a broader skill set than traditional IT auditing. While foundational audit skills remain essential, auditors must also understand key concepts in machine learning, data governance and algorithmic bias. This raises the issue of whether standard IT auditors can conduct AI assurance independently, or whether multidisciplinary teams – including data scientists and legal experts – are required.
In practice, auditors must balance profound technical knowledge with sound professional judgment. As the field matures, training programs, certifications and shared methodologies will be critical to closing this skills gap and ensuring consistency in how AI systems are assessed. Ultimately, the development of effective AI assurance depends not only on tools and standards, but also on collaboration and knowledge-sharing across the audit profession.
Conclusion
As the SmartLoan-App example shows, even complex, high-risk AI systems can be audited effectively using a structured, risk-based approach. AI assurance is both feasible and necessary, even for complex black box models. While the limited scope of emerging standards such as ISO 42006:2025 and the evolving regulatory landscape present clear challenges, auditors already have access to a range of techniques – both qualitative and quantitative – that can be adapted from traditional IT assurance practices. By applying a structured, risk-based approach and using appropriate criteria such as fairness, reliability and security, meaningful assurance can be provided today.
The AI Act reinforces the urgency of this task, especially for high-risk systems where compliance and ethical use are critical. Auditors must take an active role in assessing these systems, even in the absence of perfect transparency and support organizations in managing AI-related risks responsibly.
Looking ahead, the profession must continue to invest in the development of skills, methods and – potentially – new standards to strengthen the reliability and comparability of AI audits. Just as importantly, audit professionals should remain engaged in the ongoing debate, sharing insights and practical experiences to collectively shape the future of AI assurance.
References
[AICP17] AICPA (2017). SOC 2® – SOC for Service Organizations: Trust Services Criteria. Retrieved from: https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2
In Dutch: NOREA (2021, december). NOREA Handreiking – Handreiking voor SOC 2® en SOC 3® op basis van ISAE3000 / Richtlijn 3000A. Retrieved from: https://www.norea.nl/uploads/bfile/87b07304-10b1-4b36-9976-fd34165c3513
[Algo25] Algorithm Audit (2025). A European knowledge platform for AI standards and AI bias testing. Retrieved from: https://algorithmaudit.eu/
[Boer22] Boer, A. De Beer, L. & Van Praat, F. (2022). Algorithm Assurance: Auditing Applications of Artificial Intelligence. In: Advanced Digital Auditing (Springer Nature). Retrieved from: https://link.springer.com/chapter/10.1007/978-3-031-11089-4_7
[EC16] European Commission (2016). General Data Protection Regulation. Regulation 2016/679. Retrieved from: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng
[EC24] European Commission (2024). The EU Artificial Intelligence Act – Up-to-date developments and analyses of the EU AI Act. Regulation 2024/1698. Retrieved from: https://artificialintelligenceact.eu/
[Hout24] Van Houten, P. (2024). Opening the AI black box – A comprehensive framework for AI assurance. Retrieved from: https://www.linkedin.com/posts/pietervanhouten_opening-the-ai-black-box-a-framework-for-activity-7162751195101515777-sjdR/
[IAAS15] The International Auditing and Assurance Standards Board, IAASB (2015). International Standard on Assurance Engagements (ISAE) 3000 Revised, Assurance Engagements Other than Audits or Reviews of Historical Financial Information. Retrieved from: https://www.iaasb.org/publications/international-standard-assurance-engagements-isae-3000-revised-assurance-engagements-other-audits-or
[ICAE25] Institute of Chartered Accountants in England and Wales, ICAEW (2025). New standard for firms certifying AI management systems. Retrieved from: https://www.icaew.com/insights/viewpoints-on-the-news/2025/aug-2025/new-standard-for-firms-certifying-ai-management-systems
[Idem24] Idema, S.X. & Gozales Riedel, D. (2024). Understanding intersection between EU’s AI Act and privacy compliance. Compact 2024/2. Retrieved from: https://www.compact.nl/pdf/C-2024-9-Idema.pdf
[ISO23] ISO (2023). ISO/IEC 42001:2023, version 1. Information technology – Artificial intelligence – Management system. Retrieved from: https://www.iso.org/standard/42001
[ISO25] ISO (2025). ISO/IEC 42006:2025, version 1. Information technology – Artificial intelligence – Requirements for bodies providing audit and certification of artificial intelligence management systems. Retrieved from: https://www.iso.org/standard/42006
[Lund17] Lundberg, S. & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions (SHapley Additive exPlanations, SHAP). In arXiv:1705.07874v2. Retrieved from: https://arxiv.org/abs/1705.07874v2 and used for this article: https://shap.readthedocs.io/en/latest/
[NIST23] NIST (2023). AI Risk Management Framework. Retrieved from: https://www.nist.gov/itl/ai-risk-management-framework
[Tuli16] Tulio Ribeiro, M., Singh, S. & Guestrin, C. (2016). Why Should I Trust You?”: Explaining the Predictions of Any Classifier (Local Interpretable Model-Agnostic Explanations, LIME). In arXiv: 1602.04938. Retrieved from: https://arxiv.org/abs/1602.04938 and used for this article: https://c3.ai/glossary/data-science/lime-local-interpretable-model-agnostic-explanations/

