Authors Lonneke A. L. de Meijer and Marise Ph. Born
Erasmus University Rotterdam, The Netherlands
Uploaded June 2009
When citing this reading, please reference it as follows:
de Meijer, L.A.L. & Born, M. Ph. The situational judgement test: Advantages and disadvantages. In M. Born, C.D. Foxcroft & R. Butter (Eds.), Online Readings in Testing and Assessment, International Test Commission, http://www.intestcom.org/Publications/ORTA.php
In the field of personnel selection, the situational judgment test (SJT) has become increasingly popular since the 1920s. The SJT is a test which presents a series of problem situations people can encounter at work and job applicants choose the best of several alternative responses. SJTs can be presented in a written, verbal or visual form. Several characteristics have caused the revival of the SJT. Among these are the fact that the SJT improves prediction of performance on the job over and above cognitive ability tests and personality questionnaires taken together. SJT’s also have less adverse impact against ethnic minority groups than cognitive ability tests. We will discuss several issues in the present paper, namely: 1) the main characteristics of SJTs, 2) the similarities and dissimilarities of SJTs compared to other types of assessment for personnel selection, 3) how to develop a SJT, and finally, 4) the newest type of SJT.
Situational judgment tests (SJTs) have been in use since the 1920. They have become increasingly popular in personnel selection during the last two decades. Several characteristics of the SJT have caused its revival.
First, the SJT has a comparably high predictive validity to the cognitive ability test. McDaniel, Morgeson, Bruhn Finnegan, Campion, and Braverman (2001) conducted a meta-analysis – a statistical integration and summary of a series of separate studies – and showed a mean corrected predictive validity of .34. Furthermore, McDaniel et al. (2007) meta-analytically showed that SJTs are able to predict job performance over and above a composite of cognitive ability tests and personality questionnaires. They therefore suggest that SJTs are capturing a unique aspect of job performance that is not captured by other more traditional constructs such as cognitive tests and personality questionnaires.
Second, SJTs have less adverse impact against ethnic minority groups than more traditionally used cognitive ability tests. Although SJTs show a wide variety in score differences between ethnic groups, in almost all cases these differences are smaller that the score differences between ethnic groups typically reported for cognitive ability tests (Sackett & Wilk, 1994). In a meta-analysis in the United States, Nguyen, McDaniel, and Whetzel (2005) showed that score differences on SJTs between Blacks and Whites were around .38 standard deviation (SD) favoring Whites. This score difference is smaller than on most cognitive ability tests.
Third, the face validity of the typical SJT, that is whether the SJT looks like it is measuring something relevant to the job, can also be an important benefit, especially for selection procedures. SJTs therefore seem to be acceptable by applicants and may even offer the benefit of providing realistic job previews, that is a realistic view of what the job will look like (Weekley & Ployhart, 2006).
Finally, new technology has made the development of SJTs based on video material possible. A video-based SJT appears to have several advantages compared to the paper-and-pencil SJT, such as a higher validity in predicting job performance (Lievens & Sackett, 2006), less adverse impact, and a higher realism of the test leading to more reliable reactions by applicants (Chan & Schmitt, 1997; Richman-Hirsch, Olson-Buchanan, & Drasgow, 2000).
Even though SJTs have a series of advantages, important questions still persist. A critical issue is the fact that it sometimes is unclear which underlying constructs SJTs are actually measuring, if any at all, and the difficulty of developing a SJT to measure one specific construct.
In the research literature, a substantial debate exists concerning what SJTs actually measure. Broadly, two movements can be distinguished. On the one hand, there is the viewpoint that there is one single construct, namely job knowledge, which is measured by situational judgment tests (Schmidt & Hunter, 1993). On the other hand, there is a group of researchers who argue that SJTs are nothing else but measurement methods (Chan & Schmitt, 1997; McDaniel et al., 2001; McDaniel & Nguyen, 2001; McDaniel & Whetzel, 2005; Weekley & Jones, 1999). This latter view implies that, similar to other measurement methods such as the employment interview, SJTs can be built to measure any of a variety of constructs. For example, to assess conscientiousness, one could build a SJT where conscientiousness is the major determinant of individual differences in item responding. Yet, these researchers also argue that there are limits to what constructs a SJT can or cannot measure. Summarizing the empirical literature, McDaniel and Nguyen (2001) and McDaniel et al. (2001) showed that SJTs are not unidimensional construct tests, but should be considered as a measurement method capable of measuring a series of constructs. According to these authors, empirical evidence indicates that the constructs measured by SJTs can be one of the following: cognitive ability or g, conscientiousness, agreeableness, and emotional stability. However, other researchers (e.g., Becker, 2005; De Meijer, Born, Van Zielst & Van der Molen, in press) recently have argued that SJTs can also be built to measure other constructs, for instance integrity.
In the following, we will first describe the characteristics of SJTs in general, followed by a comparison of the SJT to other types of assessment. Then, how to develop a SJT is discussed. Finally, the newest type of SJT is introduced.
SJTs typically consist of hypothetical scenarios describing a work situation in which a problem has arisen. An example is a conflict between two employees at work. The work situation may be a possible actual situation on the target job or a situation constructed in such a manner that it is psychologically identical to an actual work situation (Chan & Schmitt, 1997). Work situations within the test are usually developed on the basis of a critical-incident analysis involving subject matter experts (SMEs). In case of developing a SJT aimed to measure one specific construct only, SMEs are asked for critical incidents in terms of this specific construct, for instance integrity, instead of the more general work context.
All SJTs have similarities, such as the fact that they consist of hypothetical scenarios as was described above. Yet, they can vary in terms of format, namely from paper-and-pencil tests with written descriptions of situations (Chan & Schmitt, 2002) to video-based tests consisting of multimedia scenarios (Lievens, Buyse, & Sackett, 2005; Olson-Buchanan et al., 1998; Weekley & Jones, 1997). Paper-and-pencil SJTs represent SJTs in written form. In the latter case, both the hypothetical scenarios and the response options are in a written format. Video-based SJTs consist of video clips of hypothetical scenarios. Also, the response options can be video-based. With video-based SJTs, it is possible for respondents to see and hear people interacting and to perceive -- or fail to perceive – their emotions and stress. With this format, one may be able to better understand how respondents interpret verbal and nonverbal behavior of others in the workplace and how they choose to respond (Olson-Buchanan & Drasgow, 2006). Potential challenges of video-based SJTs are high development costs and high maintenance costs in terms of technology and content of the videos (Olson-Buchanan & Drasgow, 2006).
SJTs can also vary in terms of their response instructions. McDaniel and Nguyen (2001) identified two categories of response instructions. These are a knowledge response instruction and a behavioral tendency response instruction. Knowledge response instructions typically ask respondents to rate the effectiveness of responses or to select the most effective and/or least effective response. Behavioral tendency response instructions ask respondents to rate how likely they would display the same behavior or to select the response they would most likely and/or least likely do.
Nguyen, Biderman, and McDaniel (2005) examined which of the two response formats was more resistant to faking by respondents. They found that knowledge response instructions were more resistant to faking and had a stronger relationship with cognitive ability than did behavioral tendency instructions. McDaniel et al. (2007) meta-analytically showed that knowledge instructions had higher criterion-related validities and a higher correlation with cognitive ability than behavioral tendency instructions. A limitation of McDaniel et al.’s meta-analysis is that the results are based on so-called concurrent-validity studies, in which the SJTs were filled out by employees, not by applicants. As Weekley and Jones (1999) argued, the question rises whether these results are generalizable to the personnel-selection setting. It seems unlikely that an applicant who wishes to be accepted would select, under behavioral tendency instructions, an option other than the one (s)he believed to be the best. Research in applicant settings is, therefore, needed to shed more light on the impact of response instructions on SJT measurement properties.
SJTs show similarities and dissimilarities with other types of assessment. The SJT shares similarities with the so-called situational interview (Latham & Saari, 1984), the so-called work sample test (Asher & Sciarrino, 1974), and the assessment center (AC; Thornton & Byham, 1982). In the situational interview, applicants are presented with job-related situations. The interviewer evaluates the effectiveness of their responses. The situational interview shows similarities with the SJT both in form and in validity (Weekley & Ployhart, 2006). The primary difference between the situational interview and most SJTs is the way in which they are presented to respondents (verbally vs. in writing/by means of videos); how responses are given (verbally vs. selecting from among a closed-ended set of options; and how responses are scored (interviewer judgment vs. comparison to scoring key; Weekley & Ployhart, 2006).
SJTs also show similarities with work sample tests and ACs. Work samples and ACs, however, go well beyond the SJT format in that they confront the respondent with a real situation compared to presenting a description or a video of a situation. This difference has also been expressed in terms of ‘fidelity’, in the sense that SJTs are low-fidelity tests, ACs are higher-fidelity tests, and work sample tests are the highest-fidelity tests (Ployhart, Schneider, & Schmitt, 2005). A second difference between SJTs on the one hand, and work samples and ACs on the other hand, is that the latter methods require an assessor to evaluate respondents, whereas SJTs can be scored mechanically (Weekley & Ployhart, 2006).
In sum, the SJT shows similarities with the situational interview, with the work sample test, and with the AC, but there are a number of important differences. These differences include that SJTs may be easier to score and implement in large-scale testing programs, making them attractive options for the early stages of recruitment and selection (Weekley & Ployhart, 2006).
We will now discuss the development of SJTs. We describe the development procedure used by De Meijer et al. (in press). The SJT in the present example is a video-based SJT aimed to measure integrity. It has a knowledge response instruction. The SJT was developed to select potential police officers in The Netherlands. First, we will discuss the construct of integrity. Then, the development process of the video-based SJT is outlined. We will conclude with some of the features of the final form of this SJT.
Integrity is difficult to define and appears to consist of various sub-dimensions (Jones, Brasher, & Huff, 2002; Van Iddekinge, Taylor, & Eidson, 2005). Often-found examples of sub-dimensions of integrity are honesty, drug avoidance, work values, and customer service. In this case, we focus on integrity as defined within the police context (Naeyé, Huberts, Van Zweden, Busato, & Berger, 2004, p. 19):
“Police integrity refers to whether the performance in police jobs is in accordance with the applicable values, norms, and the rules that are involved. Values are moral principles or standards, such as legitimacy and brotherhood, which should be of importance during decision-making. Norms are more concrete and direct. Norms are action rules, which give a clear guidance in what is allowed in a specific situation and what is not.”
Violations of integrity at the police involve, among other things, corruption, fraud and theft, accepting dubious gifts and services, misuse of authority, and misuse of information (Naeyé et al., 2004), which can be viewed as sub-dimensions of police integrity. Because of the impact that these integrity violations may have on the police organization, it is important to determine an applicant’s integrity by means of a police officer selection measure.
The SJT typically includes scenarios representing problematic interpersonal situations. For an example of one of the SJT-items, see the Appendix. We will now describe a development approach, analogous to the approach used in a study of De Meijer et al. (submitted). The example refers to a SJT developed to measure integrity for police officer jobs.
First, realistic critical incidents need to be collected regarding interactions between police officers and civilians or among police colleagues from ten to fifteen experienced police officers. Both policemen and policewomen need to be present in this group of experts. Also, if possible, police officers from different ethnic group should be consulted. The police officers should have substantial work experience, for example about fifteen years of experience.
All incidents need to focus on integrity violations and potential reactions to these violations. An example incident could be: resisting fraudulent people or situations. Second, critical incidents that are similar to each other are grouped and scenarios are written about each of these groups of critical incidents. The experienced police officers who have been interviewed to collect the critical incidents, need to check these scenarios for realism.
At the same time, with the help of these experienced police officers, four response options are derived for each scenario. This results in a number of SJT items (a scenario including its four response options is called an ‘item’). These then are pilot tested among a sample of police officer applicants in, e.g., a written version of the test.
Third, after examining statistical results, that is the descriptives and the factor-analytic results of the pilot-study data, the SJT items are edited to be ready for filming. Fourth, professional actors are trained to act in scenarios. After that, the scenarios are videotaped. Police officers experts need to be present during the video-shoot and are asked, again, to assess the filmed scenarios in terms of their realism.
Finally, a panel of experts is asked to fill out the video-based SJT in order to develop a scoring key. The expert panel consists of a minimum of ten experienced police officers. Each response option has to be evaluated on its effectiveness given the situation presented in the scenario. Agreement among the experts in effectiveness ratings is calculated with intraclass correlations (ICCs). If agreement among experts is satisfactory, the scoring key is set at the modus of the total expert group. The absolute difference between the scoring key of a given item response option and the applicant response forms the applicant score.
In its final form, the video-based SJT consists of short, videotaped scenarios of key integrity issues that police officers are likely to encounter with civilians or with police colleagues. A narrator introduces each scenario. Per SJT item, the scene freezes at an important point and the applicant has to answer the responses related to the scene presented. The items have four response options each. Applicants are instructed to evaluate each response option in terms of its effectiveness within the given situation. This response instruction generally is known as a knowledge response instruction.
Recently, in The Netherlands (Geertsma & De Meijer, 2008; Van der Maesen, 2005) and in the U.S. (Walker & Goldenberg, 2004) a new and innovative type of SJT has been developed used in personnel-selection settings. Here, applicants are shown videos depicting highly realistic and important work situation. The scenarios are obtained by means of critical incident interviews with experienced employees. At a critical point of the scenario, the video freezes and the applicant must then react orally to the given situation in his or her own words. This response is videotaped with a web cam. To score the videotaped responses, assessors experienced in personnel selection are used. Rubrics for scoring responses are developed to help the assessor evaluate.
The most innovative feature of this assessment lies in its open-ended response format. Test-wise applicants cannot scrutinize multiple-choice options and deduce the option that the test developer will score as correct. Instead, the applicant must create their own answer in real time (Olson-Buchanan & Drasgow, 2006). Much interesting research can be and should be conducted on this response format (see for instance Oostrom, Born, Serlie & Van der Molen, in press) . Issues such as the predictive validity of this type of assessment, the incremental validity of the open-ended response format over and above the multiple-choice format, and potential differences in latent traits between open-ended and multiple-choice formats should be explored in the future.
A good overview of situational judgment testing is given by the following book:
J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests. Theory measurement, and application. Mahwah, NJ: Lawrence Erlbaum Ass.
More detailed references used in the text most often are journal articles, which will be less easy to get hold of:
Asher, J. J., & Sciarrino, J. A. (1974). Realistic work sample tests: A review. Personnel Psychology, 27(4), 519-533.
Becker, T. E. (2005). Development and validation of a situational judgment test of employment integrity. International Journal of Selection and Assessment, 13(3), 225-232.
Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judgment tests: Subgroup differences in test performance and face validity perception. Journal of Applied Psychology, 82(1), 143-159.
Chan, D., & Schmitt, N. (2002). Situational judgment and job performance. Human Performance, 15(3), 233-254.
Chan, D., & Schmitt, N. (2005). Situational judgment tests. In A. Evers, N. Anderson, & O. Voskuijl (Eds.), Handbook of personnel selection (pp. 219-246). Oxford: Blackwell.
Clevenger, J., Pereira, G. M., Wiechmann, D., Schmitt, N., & Harvey, V. S. (2001). Incremental validity of situational judgment tests. Journal of Applied Psychology, 86(3), 410-417.
Dalessio, A. T. (1994). Predicting insurance agent turnover using a video-based situational judgment test. Journal of Business and Psychology, 9(1), 23-32.
De Meijer, L. A. L., Born, M. Ph., Van Zielst, J., & Van der Molen, H. T. (in press). The construct-driven development of a video-based situational judgment test measuring integrity: A study in a multi-ethnical setting. European Psychologist.
Geertsma, J., & De Meijer L. A. L. (2008, July). The validity and reliability of a newly developed web cam test. Presented at the 6th conference of the International Test Commission (ITC), Liverpool, UK.
Jones, J. W., Brasher, E. E., & Huff, J. W. (2002). Innovations in integrity-based personnel selection: Building a technology-friendly assessment. International Journal of Selection and Assessment, 10(1/2), 87-97.
Latham, G. P., & Saari, L. M. (1984). Do people do what they say? Further studies of the situational interview. Journal of Applied Psychology, 69(4), 569-573.
Lievens, F., Buyse, T., & Sackett, P. R. (2005). The operational validity of a video-based situational judgment test for medical college admission: Illustrating the importance of matching predictor and criterion construct domains. Journal of Applied Psychology, 90(3), 442-452.
Lievens, F., & Sackett, P. R. (2006). Video-based versus written situational judgment tests: A comparison in terms of predictive validity. Journal of Applied Psychology, 91(5), 1181-1188.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. L., III (2007). Situational judgment tests, response instructions, and validity: A meta-analysis. Personnel Psychology, 60(1), 63-91.
McDaniel, M. A., Morgeson, F. P., Bruhn Finnegan, E., Campion, M. A., & Braverman, E. P. (2001). Use of situational judgment tests to predict job performance: A clarification of the literature. Journal of Applied Psychology, 86(4), 730-740.
McDaniel, M. A., & Nguyen, N. T. (2001). Situational judgment tests: A review of practice and constructs assessed. International Journal of Selection and Assessment, 9(1/2), 103-113.
McDaniel, M. A., & Whetzel, D. L. (2005). Situational judgment test research: Informing the debate on practical intelligence theory. Intelligence, 33(5), 515-525.
Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity simulation. Journal of Applied Psychology, 75(6), 640-647.
Naeyé, J., Huberts, L., Van Zweden, C., Busato, V., & Berger, B. (2004). Integriteit in het dagelijkse politiework [Integrity during daily police work]. Amsterdam: Vrije Universiteit.
Nguyen, N. T., Biderman, M., & McDaniel, M. A. (2005). Effects of response instructions on faking in a situational judgment test. International Journal of Selection and Assessment, 13(4), 250-260.
Nguyen, N. T., & McDaniel, M. A. (2003). Response instructions and racial differences in a situational judgment test. Applied H.R.M. Research, 8(1), 33-44.
Nguyen, N. T., McDaniel, M. A., & Whetzel D. L. (2005, April). Subgroup differences in situational judgment test performance: A meta-analysis. Paper presented at the 20th Annual Conference of the Society of Industrial and Organizational Psychology, Los Angeles, CA.
O’Connell, M. S., Hartman, N. S., McDaniel, M. A., Grubb, W. L., III., & Lawrence, A. (2007). Incremental validity of situational judgment tests for task and contextual performance. International Journal of Selection and Assessment, 15(1), 19-29.
Olson-Buchanan, J. B., & Drasgow, F. (2006). Multimedia situational judgment tests: The medium creates the message. In J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests. Theory measurement, and application (pp. 253-278). Mahwah, NJ: Lawrence Erlbaum Ass.
Olson-Buchanan, J. B., Drasgow, F., Moberg, P. J., Mead, A. D., Keenan, P. A., & Donovan, M. A. (1998). An interactive video assessment of conflict resolution skills. Personnel Psychology, 51(1), 1-24.
Oostrom, J.K., Born, M.Ph., Serlie, A.W., & Van der Molen, H.T. (in press). Webcam testing: Validation of an innovative open-ended multimedia test. European Journal of Work and Organizational Psychology.
Ployhart, R. E., Schneider, B., & Schmitt, N. (2005). Organizational staffing: Contemporary practice and theory. Mahwah, NJ: Lawrence Erlbaum Ass.
Richman-Hirsch, W. L., Olson-Buchanan, J. B., & Drasgow, F. (2000). Examining the impact of administration medium on examinee perceptions and attitudes. Journal of Applied Psychology, 85(6), 880-887.
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemployment testing. American Psychologist, 49(11), 929-954.
Schmidt, F. L., & Hunter, J. E. (1993). Tacit knowledge, practical intelligence, general mental ability and job knowledge. Current Directions in Psychological Science, 2(1), 8-9.
Thornton, G. C., & Byham, W. C. (1982). Assessment centers and managerial performance. New York: Academic Press.
Van der Maesen, P. E. A. M. (2005). Webcamtest voor actieve sociale vaardigheden [Web cam test for active social competencies]. Van der Maesen Advies, april, 1-5.
Van Iddekinge, C. H., Taylor, M. A., Eidson, C. E., Jr. (2005). Broad versus narrow facets of integrity: predictive validity and subgroup differences. Human Performance, 18(2), 151-177.
Walker, D., & Goldenberg, R. (2004, June). Bringing selection into the 21st century: A look at video-based testing within U.S. Customs and Border Protection. Paper presented at the annual conference of the International Public Management Association-Assessment Council, Seattle, WA.
Weekley, J. A., & Jones, C. (1997). Video-based situational judgment testing. Personnel Psychology, 50(1), 25-49.
Weekley, J. A., & Jones, C. (1999). Further studies of situational tests. Personnel Psychology, 52(3), 679-700.
Weekley, J. A., & Ployhart, R. E. (2006). An introduction to situational judgment testing. In J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests. Theory measurement, and application (pp. 1-10). Mahwah, NJ: Lawrence Erlbaum Ass.
Description of situation:
A police officer (police officer 1) comes to work on his motorbike. When he enters the parking garage of the police station he accidentally hits a police car, causing a big scratch on the police car. Shortly after, he meets a colleague (police officer 2) and tells her what happened.
Police officer 1:
“Hi! Listen: I just entered the parking garage with my motorbike and caused a big scratch on one of the police cars. I feel really bad about it and, actually, I don’t know what to do.”
Possible reactions of police officer 2:
1. Don’t worry about it! Police cars are covered with scratches.
2. O... I’m sorry. If I were you, I would report it to the chief.
3. Well, that’s pretty stupid of you!! You have to report it to the chief!
4. The only thing you can do is to report it to the chief! And if you’re not going to do it, I will!!
1. What are the advantages and disadvantages of video-based situational judgment tests (SJTs) compared to paper-and-pencil SJTs?
2. Which instrument is closest related to the SJT?
3. Why can the research results about the difference between knowledge instructions and behavioral tendency instructions not be generalized to personnel-selection setting?
4. Which technique is most often used to develop scenarios of SJTs?
5. What is the advantage of the newest type of SJTs?
Lonneke A. L. de Meijer (corresponding author)
Erasmus University Rotterdam
P.O. Box 1738, Woudestein T13-24
3000 DR Rotterdam, The Netherlands
Marise Ph. Born
Erasmus University Rotterdam
P.O. Box 1738, Woudestein T13-15
3000 DR Rotterdam, The Netherlands