home > guidelines > computer based testing International Test Commission

Introduction

Over the past few years the International Test Commission (ITC) has adopted a policy of promoting good practice in testing issues where international coordination of effort is most important. For example, the ITC has devised guidelines to promote good practice in test adaptations (Hambleton, 1994; Van de Vijver & Hambleton, 1996) and good practice in test use (ITC, 2001). In recent years substantial and rapid developments have occurred in the provision of stand-alone and Internet-delivered computer based testing. These developments raise a number of issues in relation to standards of administration, security of the tests and test results and control over the testing process. Therefore, as the market for such testing increases and as the technological sophistication of the products increases issues associated with ensuring those developing, distributing, using and taking such tests and assessment tools follow good practice will increase in importance. In response to this, the ITC Council decided to invest in a program of research, consultation, and conferences designed to develop internationally agreed guidelines specifically aimed at computer/Internet based testing.

Aims and Objectives

The ultimate aims of this project were

  • to produce a set of internationally developed and recognised guidelines that highlight good practice issues in computer-based (CBT) and Internet-delivered testing
  • to raise awareness among all stakeholders in the testing process of what constitutes good practice.

The aim was not to ‘invent’ new guidelines but to draw together common themes that run through existing guidelines, codes of practice, standards, research papers and other sources, and to create a coherent structure within which these guidelines can be used and understood. Contributions to the guidelines have been made by psychological and educational testing specialists, including test designers, test developers, test publishers and test users drawn from a number of countries.

Further, the aim is to focus on the development of guidelines specific to CBT/Internet based testing, not to reiterate good practice issues in testing in general. Clearly, any form of testing and assessment should conform to good practice issues regardless of the method of presentation. These guidelines are intended to complement the ITC Guidelines on Test Use (2001), with a specific focus on CBT/Internet testing.

Development of the Guidelines

As with previous ITC guidelines, the present guidelines can be seen as a benchmark against which existing local standards can be compared or as a basis for the development of locally applicable standards or codes of practice. The advantage of these guidelines is that local standards can be compared to these set guidelines for coverage and international consistency in order to promote consistency across national boundaries and for benchmarking purposes.

The project commenced with an initial literature search and review of existing references and guidelines on computer-based testing and Internet testing from a number of different countries (see Appendix). A number of these sources were particularly influential in the development of the guidelines:

  • Bartram, D. (2001). The impact of the Internet on testing: Issues that need to be addressed by a Code of Good Practice. Internal report for SHL Group plc.
  • British Psychological Society Psychological Testing Centre (2002) Guidelines for the Development and Use of Computer-based Assessments.
  • European Federation of Psychologists’ Associations (EFPA). Review model for the description and evaluation of psychological tests (Bartram, 2002).
  • British Standards’ Institute (BSI). BS 7988 (2001). A code of practice for the use of information technology for the delivery of assessments.
  • Association of Test Publishers (ATP). Guidelines for Computer Based Testing.

The next stage involved a small scale survey of United Kingdom test publishers, examining good practice issues in Internet-delivered personality tests in the UK. Further examples of good practice were highlighted from this survey.

As a third method of obtaining relevant information, the ITC organised a conference in Winchester, England in June 2002 on Computer-based Testing and the Internet. The goal of this conference was to bring together people working in the field of computer/Internet testing (e.g., practitioners, scholars, industry leaders and others) from around the world and to extract common issues and themes that would inform the guidelines. In total 254 delegates from 21 countries attended the conference. The conference was composed of workshops, keynote presentations and themed papers, posters and symposia on a number of topics concerning computer/Internet testing. A review of the material from this conference coupled with the small survey data and literature review provided the basis for the development of the draft guidelines for initial consultation (version 0.3).

Four general issues emerged from the information gathering process and these formed the basis of the development of an initial draft version. The four issues were:

  • Technology – ensuring that the technical aspects of CBT/Internet testing are considered, especially in relation to the hardware and software required to run the testing.
  • Quality – ensuring and assuring the quality of testing and test materials and ensuring good practice throughout the testing process.
  • Control – controlling the delivery of tests, test-taker authentication and prior practice.
  • Security – security of the testing materials, privacy, data protection and confidentiality.

These four issues were considered high level issues and were further broken down into second-level specific guidelines. A third-level set of accompanying examples is provided to the relevant stakeholder. The guidelines are primarily written to provide advice to test developers, test publishers and test users; however, these guidelines also provide a useful source of reference for test-takers. Given these intended applications, the guidelines are structured in a three (main stakeholders) by three (level of guideline) matrix.

  • Following development of the initial draft by the two authors, a consultation process was undertaken. This involved circulation of the draft to all those who attended the ITC Conference in Winchester and all those on the ITC circulation list for Testing International. A copy also was placed on the ITC web site. Comments on the draft guidelines were received and version 0.4 produced . In addition the report of the APA Internet Task force was published (Naglieri et al, 2004) . This was reviewed in detail and elements from the report were included in version 0.5 of the draft g uidelines.

Another cycle of consultation was implemented including those people previously contact in the first consultation process. The revisions and edits from this process were completed and version 0.6 of the draft guidelines was produced. Final revisions were produced and the f inal draft version was devised (1.0 ). The current guidelines (version 2005) were officially launched in July 2005 after approval by the ITC Council.

Timeline

The following shows the timeline in the design and development of the guidelines.

  • Completion of first draft and first consultation initiated: March 2003
  • End of first Consultation period: June 2003.
  • Revisions completed and second consultation initiated: February 2004
  • End of second Consultation period: April 2004
  • A symposium on CBT and Internet testing at the International Congress of Psychology in Beijing, August 2004.
  • Final version for approval: January 2005
  • Development of final version and design of web-based version: March 2005
  • Approval by ITC Council and formal launch: July 2005
  •  

    Scope

    As with the International Guidelines of Test Use (2001), the current guidelines use the terms ‘test’ and ‘testing’ in their broadest sense and include psychological and educational tests used in clinical, health, educational and work and organisational assessment settings. CBT/Internet tests should be supported by evidence of their technical adequacy for their intended purpose. These guidelines are aimed at tests conducted both online and onscreen (offline), which can include testing via the use of a CD ROM or a download executable. The document includes guidance for fully computerised testing and for part-computerised testing and the reader can refer to the most appropriate elements. For example, only the sending and scoring of assessment papers may be computerised (the rest paper and pencil). Given this, the guidelines dealing with security and confidentiality of data are important.

    In general, the guidelines can apply to both high stakes and low stakes assessment. As an example, high stakes assessments are those where a third party requires the results of the test for use in the process of making an important decision about a test-taker (high stakes testing may also include those that are used to make decisions about groups of test-takers, such as a school class). By contrast, an example of low stakes assessment would be where the test-taker obtains the information for his or her own interest. That some guidelines apply only to high stakes testing environments is made clear within the text itself.

    Again, unless otherwise specified in the text, the guidelines presented here should be considered as applying to a number of modes of supervision and across a number of testing scenarios. Four modes of test administration are considered:

    • Open mode – Where there is no direct human supervision of the assessment session and hence there is no means of authenticating the identity of the test-taker. Internet-based tests without any requirement for registration can be considered an example of this mode of administration.
    • Controlled mode – No direct human supervision of the assessment session is involved but the test is made available only to known test-takers. Internet tests will require test-takers to obtain a logon username and password. These often are designed to operate on a one-time-only basis.1
    • Supervised (Proctored) mode – Where there is a level of direct human supervision over test-taking conditions. In this mode test-taker identity can be authenticated. For Internet testing this would require an administrator to log-in a candidate and confirm that the test had been properly administered and completed.
    • Managed mode – Where there is a high level of human supervision and control over the test-taking environment. In CBT testing this is normally achieved by the use of dedicated testing centres, where there is a high level of control over access, security, the qualification of test administration staff and the quality and technical specifications of the test equipment.2

    Application of these guidelines needs to be considered in terms of their relevance for a range of different testing scenarios (e.g., guidelines are more appropriate for the more high stakes forms of scenarios). For example, in relation to testing in work and organisational settings, four main scenarios can be identified:

    • Guidance – personnel development or career guidance, where the test-taker requires the information for his/her own interest.
    • Pre-screening recruitment – covers assessment carried out on people up to the point at which they are sifted to form a short-list.
    • Post sift selection – assessments on a known set of applicants who have been previously short-listed
    • Post-hire assessment – assessments carried out on employees of an organisation by or on behalf of the organisations. This may be either high or low stakes assessment.

    Additionally in clinical/counseling settings, four scenarios could be:

    • Development and decision-making purposes – where the information is used by the client and therapist/counselor to identify aspects of functioning that require development or to make decisions (e.g., career assessment).
    • Screening – to get a global picture of the client’s functioning.
    • Diagnostic purposes – to identify specific strengths and weaknesses which can guide intervention planning.
    • Planning and evaluating intervention/therapy.

    Each of these raises different issues regarding control and security.

     

    1 Standardisation of the testing environment is not possible with open mode testing, and often not possible in the controlled mode of testing.
    2 Standardisation is possible with supervised mode and managed mode.

    Who are the Guidelines for?

    The guidelines apply to the use of CBT and Internet tests in professional practice. Thus they are directed towards test users who:

    • purchase and use CBT/Internet tests;
    • are responsible for selecting tests and determining the use to which tests will be put;
    • administer, score, or interpret tests (invigilators/proctors);
    • provide advice to others on the basis of test results (e.g., recruitment consultants, educational and career counsellors, educational and school psychologists, trainers, succession planners, organisational development consultants);
    • are concerned with the process of reporting test results and providing feedback to people who have been tested.

    These guidelines also specifically address three other main stakeholders in the testing process:

    • developers of CBT and Internet tests,
    • publishers of CBT and Internet tests (who also may be involved in test development), and
    • consultants to developers and publishers of CBT and Internet tests.

    The guidelines are relevant to others involved in the use of CBT and Internet tests. These include:

    • those involved in the training of test users,
    • those who take tests and their relevant others (e.g., parents, spouses, partners),
    • professional bodies and other associations with an interest in the use of psychological and educational testing, and policy makers and legislators.

     

    Contextual factors

    The guidelines are intended to be applicable internationally. Many factors may affect how standards may be managed and realised in practice. These contextual factors have to be considered at the local level when interpreting these guidelines and defining what they would mean in practice within any particular setting.

    The factors that need to be considered for turning the guidelines into specific standards include:

    • social, political, institutional, linguistic, and cultural differences between assessment settings;
    • laws, statutes, policy and other legal documentation that addresses testing issues;
    • laws applying to the various countries through which test data may pass or be stored;
    • existing national guidelines and performance standards set by professional psychological societies and associations;
    • differences relating to individual versus group assessment;
    • differences related to the test setting (educational, clinical, work-related and other assessment);
    • who the primary recipients of the test results are (e.g., the test-takers, their parents or guardian, the test developer, an employer or other third party);
    • differences relating to the use of test results (e.g., for decision-making, as in selection screening, or for providing information to support guidance or counselling); and
    • variations in the degree to which opportunity exists for the accuracy of interpretations to be checked in light of subsequent information and amended if needed.