After the 5 cognitive abilities that the RIOT tests were chosen, our team scoured the literature for subtests that measure these abilities. Thirty-seven possible subtests were considered for inclusion. These subtests were evaluated on the basis on interface needs, portability to a variety of devices, the degree to which the subtest was a good measure of g (as indicated by its g loading in prior research), the difficulty of creating items, format of examinee responses, the availability of other subtests for the same broad ability, and any other advantages or disadvantages of note. When selecting subtests, the RIOT team preferred to use existing formats that had been proven to be effective measures of intelligence on earlier tests. Indeed, many of the subtests considered have a long history of inclusion on intelligence tests (Gibbons & Warne, 2019).
After weighing these pros and cons, 15 subtests were chosen: 3 for each broad cognitive ability. As detailed below, one of the subtests was found to be unsuitable for the RIOT. It was replaced by another subtest. Therefore, 16 subtests were piloted, of which 15 were retained for version 1.0 of the RIOT. These subtests are described in the rest of this section.
The Vocabulary subtest measures examineesâ knowledge of word definitions. On the RIOT, each item stem uses the following phrasing: âWhich word has a meaning closest to â[target word]â? Only the target word varied in the stem. Each item had 5 multiple choice options. For example, one item under consideration for inclusion in the RIOT was:
Which word has a meaning closest to beverage?
Fruit
Salad
Meal
Drink
Silverware
Vocabulary was chosen as a subtest because experts have noted that it has a very high g loading (e.g., Gottfredson, 1997b; Jensen, 1980, 1998; Johnson et al., 2004; Lichtenberger & Kaufman, 2013), the items are easy to present online, and the items are easy to create. A downside is that they are very susceptible to cheating; it is very easy for a person to conduct an internet search to quickly find the definition of the target word.
Gottfredson (1997b) had insight that easier Vocabulary items refer to concrete objects and more difficult items have target words that refer to complex or abstract ideas. Jensen (1980) had the insight that Vocabulary subtests measure intelligence because more intelligent people would infer the meaning of a word they encounter better and more efficiently than other people would. Therefore, the frequency that a target word is used in English was seen as an indicator of difficulty. [ADD INFO FROM BRYSBAERT ET AL., 2019]
With these characteristics in mind, a set of potential target words was generated by a human with some assistance from ChatGPT-3.5. The target words were required to be used worldwide and to have at least one synonym. Words that were regional in use, obsolete, technical terminology, offensive, or derogatory were not included in the list of potential target words.Â
After a set of target words was identified, the Oxford English Dictionary (OED) was consulted to identify the frequency that the words are used. The OED reports the frequency that the word occurs in written English (per million words). This information is then used to classify words into one of 8 frequency bands. These frequency bands are numbered (in ascending order of word use frequency) from 1 to 8. The words for the draft Vocabulary subtest were chosen so that they spanned from frequency Bands 1 through 6. The distribution of words was chosen to purposely form an approximately normal distribution among the frequency bands, with almost two-thirds belonging to Bands 4 and 5. Among words in the draft subtest, the word frequency per million written words ranged from < .001 to 94, with a median of 0.76. The target word in the example item above (i.e., âbeverageâ) has a frequency of 5.2 words per million and belongs to Band 5.
The correct option of the Vocabulary subtest items was always a synonym that was found about as often, or more frequently, in English. The incorrect options (called distractors) were brainstormed to be the same part of speech as the target word and the correct option. Distractors were selected if they were plausible incorrect options. Some distractors were thematically related to the target word (such as in the example item above, where all the distractors are related to eating and drinking). Other distractors were based on misunderstandings of the target wordâs meaning or were words that served the same function (e.g., describing behavior). If a target word had a prefix or suffix, then some distractors were chosen to include plausible meanings for the prefix or suffix. However, no distractors used the same prefix or suffix.
The Information subtest measures an examineeâs general knowledge about the world. Like Vocabulary, items on the Information subtest measure an examineeâs breadth of knowledge (Jensen, 1970b/1973). On the RIOT, each Information item takes the form of a question that has five multiple choice options. For example, one item under consideration for inclusion in the RIOT was:
Which continent is the Sahara Desert located in?
Australia
Africa
South America
Europe
Asia
Information was chosen as a subtest for the RIOT for the same reasons as Vocabulary. On previous intelligence test batteries, Information items tend to have a high (e.g., Johnson et al., 2004) or moderate g loading (Lichtenberger & Kaufman, 2013), the items are easy to present online, and the items are easy to create.
Jensen (1980, p. 148) argued that Information items were a measure of breadth of knowledgeâmost of which the person acquired during schooling or in their everyday life. Therefore, Information items function well on intelligence tests because smarter people generally know more facts about the world. However, a problem with Information items is that there is no objective way to decide which items belong on an intelligence test (Jensen, 1980; Schroeders et al., 2021; Watrin et al., 2023), and the choice of items can have important consequences. This is apparent in the study of sex differences. Males tend to have higher levels of knowledge about science and technology, while females generally know more about humanities and the arts, and one can create an Information subtest that favors one sex or another by oversampling items from a domain that shows a strong sex difference in knowledge (Schroeders et al., 2016). For many tests, one can ensure that Information subtest items produce equal scores for males and females. However, this is difficult for the RIOT because the RIOT is intended to be adapted to other countries and languages in the future, and items that function well in other countries tend to have content that is objective or scientific (e.g., physical sciences), and males tend to answer these items more often (Watrin et al., 2023). Therefore, the RIOT team faced a trade-off: an Information subtest with that would be portable across cultures may artificially inflate malesâ scores, while a subtest that produces no mean sex differences will likely not be adaptable to other countries without heavy revision.
The arbitrary decision was made to identify a set of domains, some of which would likely favor one sex or the other, and others which had no clear advantage for males or females. The chosen domains were the arts; history; leisure and recreation; philosophy, psychology, religion, and mythology; physical, life and earth science; technology and engineering; and world geography. Items were written (some with the assistance of ChatGPT-3.5) so that each of these domains was represented approximately equally. Because of the future prospect of adapting the test to other cultures, items were written so that they would not measure knowledge about American culture, and objective âscholasticâ information available in a wide variety of cultures was favored. When items did refer to culturally specific knowledge, an attempt was made to use information from different parts of the world. Rejected areas of knowledge were business and health and medicine. Business items were found to be difficult to write because they often relied on jargon or were specific to the United States. Health and medicine items were incorporated into the physical, life and earth science category.
The Analogies subtest measures an examineeâs ability to infer the relationship between two words and then apply that same relationship to a second pair of words. Answering an Analogies item correctly requires using inductive reasoning to understand the relationship between the first pair and then using deductive reasoning to apply that relationship to the second pair of items. On the RIOT, items on the Analogies subtest are written to present the words and five multiple choice options. An example is:
Cat : Kitten :: Butterfly :
Caterpillar
Flower
Worm
Housefly
Raincloud
Analogies was chosen as a substitute for the Logical Reasoning subtest that was rejected after the initial piloting phase (see below). Analogies subtests have a long history on intelligence tests (Gibbons & Warne, 2019) and have high g loadings (Jensen, 1980).Â
Like the other Verbal Reasoning subtests (i.e., Vocabulary and Information), Analogies items are very easy to write and present online. They have the downside, however, of relying strongly on an examineeâs knowledge of word knowledge, which makes them overlap more with the ability Vocabulary subtest than is ideal.
ChatGPT-3.5 was used to generate a list of potential relationships between pairs of words. Items were written (with some assistance from ChatGPT-3.5) so that as many relationships as possible were in the initial item pool. Twenty relationships, plus a âmiscellaneousâ category were represented in the original item pool; no relationship was represented by more than four draft items. Like the Information subtest, in the drafting process Analogy items were designed to not be culturally specific.
Items on the Matrix Reasoning subtest are designed to measure fluid intelligence with visual, geometric stimuli. In a matrix reasoning item, a series of images are shown in a 3 x 3 grid, with the bottom right image missing. The examinee must identify the pattern in the grid and identify which of eight options would complete the pattern. An example of a Matrix Reasoning item is here:
Matrix Reasoning was chosen as a subtest because it is most researched tasks found on intelligence tests (e.g., Armstrong et al., 2016; Kunda et al., 2016; Lichtenberger & Kaufman, 2013), and the item format has a lengthy history of being successful intelligence (dating back to Penrose & Raven, 1936, and Raven, 1939). Matrix items are highly regarded and are more g-loaded than average, though they are probably not the most g-loaded task available (Gignac, 2015).
Carpenter et al. (1990) catalogued a set of rules that govern the patterns in items on the Ravenâs Progressive Matrices (RPM) test, which is the most popular test using Matrix Reasoning items. These rules were used as a foundation to create Matrix Reasoning items for the RIOT. Note that some items on the RPM do not fit into some of the rules propagated by Carpenter et al. (1990). These unusual RPM items were used to brainstorm additional items for the RIOT. Additionally, the RIOTâs Chief Scientist created new rules that could serve as a basis for new items for the subtestâs initial item pool.
The Visual Puzzles subtest is designed to measure fluid reasoning with visual stimuli. On the RIOT, items on the Visual Puzzles subtest consist of a target image and a set of 8 options. Between two and four of the options can be assembled to create an image that is identical to the target image in size, color, and shape. An example of a Visual Puzzles item is shown below.
Visual Puzzles was chosen as a subtest because it does not contain verbal stimuli, the g loadings are somewhat high (e.g., Wechsler, 2005). The format of the Visual Puzzles subtest can be traced back to puzzle assembly tasks in the 1910s (e.g., the manikin test) and later Object Assembly subtests (Lichtenberger & Kaufman, 2013). But the current format (in which a person merely sees the âpiecesâ of a puzzle, instead of manipulating them as objects) has grown in popularity recently. Tests developed in the 21st century, such as the fourth edition of the Wechsler Adult Intelligence Scale and the Kinder IQ Test + (KIQT+) use this item format. The RIOT follows the lead of these instruments.
The RIOTâs team created a set of items designed to have a range of complexity. Less complex items use simple cuts to divide the shape into components that are easy to recognize in the target image. Colors in the target image can serve to aid the examinee on these items. More complex items use complex cuts and/or combinations of options that assemble to form the target image in unusual and unexpected ways. Requiring examinees to use mental rotation to identify correct options also adds to the complexity. Like items on the Vocabulary subtest, it is surmised that complexity is a determining factor in item difficulty.
The final subtest that measures Fluid Reasoning is the Figure Weights subtest. This subtest measures abstract reasoning by asking examinees to discern the quantitative relationship among objects and to mentally manipulate that information to draw a conclusion about a new relationship. In the RIOT, Figure Weights items have one to four balance scales with a variety of shapes on them and the rightmost pan empty. The examinee must use the visual information to identify which of five options would balance out the final scale. An example of a Figure Weights item is here:
Figure Weights as a subtest format originates from the fourth edition of the WAIS (Lichtenberger & Kaufman, 2013) as a subtest that measures quantitative reasoning. Like the Visual Puzzles item format, it also appears on the KIQT+ and other modern tests.
To create items on the Figure Weights subtest, the RIOT team identified rules that govern the quantitative relationships between colored objects. As was the case for Visual Puzzles and Vocabulary, complexity was a guiding principle: simple items had one rule that could be discerned by counting (such as in the example above), while more complex items required examinees to combine information from different balances scales to identify the correct option. It was later discovered that certain design principles could increase the difficulty of an item.
The Object Rotation subtest uses a classic format to represent peopleâs understanding of objects in three-dimensional space. Each Object Rotation item on the RIOT shows a target shape and five options. The examinee must identify which option is the same as the target shape after it has been rotated in two or three dimensions. An example of an Object Rotation item for the RIOT is shown below.
The Object Rotation task was selected for the RIOT because it often has a strong loading onto a spatial ability factor (e.g., Johnson et al., 2004; Ramful et al., 2017), and it consistently appears as a useful task for measuring general intelligence (Carroll, 1993). The Object Rotation subtest is an intrinsic dynamic task in Uttal et al.âs (2013) typology of spatial reasoning tasks.
The RIOT team used the Quotid platformâs rotation item generation tool (https://quotid.io) to create items for the Object Rotation subtest. In this tool, the user can create a target image, rotate it to create the correct option, and then alter it (and rotate it, if needed) to make a distractor. This is an efficient way of creating Object Rotation items. Again, complexity was a proxy for anticipated difficulty of the items. In this case, the complexity was in the amount of rotation, number of axes of rotation, and the complexity of the object was manipulated to create items of varying difficulty.
The SToVeS task is a subtest designed to measure an examineeâs understanding of cardinal directions and orientation in two-dimensional space. On the RIOT, almost all of the items are presented as written items; three items include a visual stimulus to aid examinees. All items require an examinee to identify which of one of five multiple choice options is the correct answer. An example of a SToVeS item is as follows:
If you were facing west and turned 90 degrees to your left, which direction would you be facing?
North
East
West
Northwest
South
The task is based on the Verbal Test of Spatial Ability designed for air traffic controllers and described by Ackerman and Kanfer (1993). Despite being administered solely in written form and without any images, Lohman (2005, p. 137) stated that it â. . . was one of the best measures of spatial ability in the selection battery.â The SToVeS subtest is an example of an extrinsic dynamic task in Uttal et al.âs (2013) theory of spatial abilities.
Creating items for the SToVeS subtest required writing items that followed the same template: providing a starting direction, one or more turns (to the left and/or right), and ending with a question that required a response in the form of a direction. The complexity of an item on the SToVeS subtest can be adjusted by increasing the number of turns, the description of how far a turn must go, and the final question.
The RIOTâs Spatial Orientation subtest consists of items that require an examinee to use information from maps to visualize the relative location of objects in two- or three-dimensional space. All of the items on the Spatial Orientation subtest have a visual stimulus of some sort and an accompanying question that the examinee must answer by selecting the correct option from a series of 5 choices. An example of an item from the Spatial Orientation subtest is:
Which building is located southwest of the car dealership?
Library
Restaurant #1
Restaurant #2
Hotel
Gas station
The item content of the Spatial Orientation subtest is more heterogeneous than other subtests on the RIOT. Some take the form of what Carroll (1993, p. 323) called âPerspective Tasks,â in which a person must imagine what the view of a person would be if they were at a specific place on a map. Others ask examinees to answer questions about navigating to different landmarks or points on a map (see Juan-Espinosa et al., 2000, for a description of a similar task that was classified as a spatial orientation task). Other items on Spatial Orientation subtest require a person to use information from a map to answer questions about three-dimensional space. The commonality of all of these items is the need to be able to translate a two-dimensional representation of the world into a space that the examinee can imagine navigating. Most of the items are examples of the extrinsic static category in Uttal et al.âs (2013) system of organizing spatial abilities.
Because of the unique graphic design needs, the Spatial Orientation subtest was the most difficult one to create for the RIOT. A series of maps were created to represent different types of geographic locations. Some of these were drawn by a graphic designer, and others were created by rendering a virtual environment into three dimensions and moving the âcameraâ to an overhead position to create two-dimensional maps. Text to accompany maps were written so that the items could ask examinees to interpret information in a map. The complexity of items was manipulated by adjusting the cues available in a map, adding obstacles to a navigation route, and asking for information that required interpreting information in two- or three-dimensions.
Computation Span is a subtest (described by Salthouse & Babcock, 1991) designed to measure examineesâ working memory. In this task, examinees receive a few simple mathematical problems and are asked to remember the answers. Immediately after finishing the sequence, the examinee must recall the answers to the math problems in the same sequence. An example of a Computation Span item is as follows:
5 - 0 =
0
4
5
7 - 1 =
6
3
4
3 + 1 =
4
6
2
3 + 5 =
4
8
0
What is the answer to the first problem?
What is the answer to the second problem?
What is the answer to the third problem?
What is the answer to the fourth problem?
The arithmetic problems are displayed one at a time (for a maximum of 3 seconds), and the recall questions are all displayed on the same screen (for 6 seconds per digit for the first two items and 4 seconds per digit for all other items). Examinees cannot return to a previous screen.
The Computation Span task is based on the classical digit span task, which has been heavily researched by psychologists for over 100 years (Carroll, 1993; Richardson, 2007; Wongupparaj et al., 2017). The Computation Span version of the task was chosen because the items were easy to generate, needed only text to display, and were less amenable to strategiesâsuch as chunkingâthan the classical digit span task.
Computation Span items were created by compiling all arithmetic problems that satisfied the conditions listed by Salthouse and Babcock (1991) and then randomly selecting them (without replacement) for each sequence of arithmetic problems.
Exposure Memory is a subtest that measures recognition of visual stimuli. For each item on this subtest, the examinee is shown a series of images at the rate of 1 per second. Immediately afterwards, the set of eight options would be presented, and the examinee must select the one(s) that were shown in the sequence. An example of an Exposure Memory item is as follows:
In this example, the top row is the sequence of images that are shown individually (in random order). The numbered set of images afterwards is the group of option images. Examinees cannot repeat the sequence of images or return to a previous screen on this subtest. When prompted to respond, examinees have a 30-second time limit.
The Exposure Memory task is a type of visual memory task described by Carroll (1993), and similar tasks were recently used on the Reynolds Intellectual Assessment Scales and the openpsychometrics.org intelligence test. It was chosen because item creation was easy, and it measured a type of short-term memory that other subtests on the Working Memory domain did not: recognition. In previous research, the g loading for this task is moderate (e.g., Dombrowski et al., 2009) or weak (Beaujean et al., 2009; J. M. Nelson et al., 2007). It is not expected that a recognition memory task would have a strong factor because recognition is a more basic mental task than the more complex transformation of other memory tasks (Jensen, 1980), such as the RIOTâs Computation Span and Visual Reversal subtests.
To create the items for the Exposure Memory subtest, the RIOT team identified simple artwork that was in the public domain online. Most of these images were software icons. These were then programmed to be showed in the different sequences and arrays for each item. Â
The final working memory task for the RIOT is the Visual Reversal subtest. In this task, examinees are shown a 3 x 3 grid of squares, eight of which are white and one of which is blue. At the rate of 1 per second, the square that is blue changes to white and a randomly selected white square changes to blue. After a series of color changes, a random grid of numbers is displayed. At that point, the examinee must report a sequence of numbers that corresponds to the reverse order that the squares changed color. An example of a Visual Reversal item is as follows:
What is the reverse order of the numbers for the squares that changed color?
Like the other Working Memory subtests, examinees cannot return to a previous screen or have the item repeated on the Visual Reversal subtest. Each square in a sequence changes color for 1 second, and the examinee has 10 seconds, plus an additional 4 seconds per square, to respond.
The Visual Reversal subtest can be thought of as a combination of the Corsi block tapping test and the backward digit span task. Like the Corsi test, the Visual Reversal subtest asks examinees to remember a sequence of visual-spatial sequences (i.e., blocks that are tapped in the Corsi test, or squares that change color on the Visual Reversal task). However, the reporting is much more akin to the backward digit span task in that the examinee must produce a series of numbers in reverse orderâwhich increases g-loading (Jensen, 1980). The spatial aspect of the task and the reversal requirement for responses were both not found in other Working Memory subtests, and the RIOT team believed that having a subtest with both components was a major asset for the RIOT.
The Symbol Search subtest is a task designed to measure processing speed in examinees. In this subtest, examinees are shown a set of two symbols and a set of five symbols. The examinee must state whether either of the first two symbols appear in the set of five symbols. An example of this item appears below.
Do either of these pictures . . .
. . . match any of the pictures below?
To respond, the examinee selects âYesâ or âNo.â Symbols Search is a speeded subtest, meaning that examinees score better by correctly completing as many items as possible in 2 minutes.
After the visual stimuli of this subtest were created by a graphic designer, the items were created through random selection of the symbols. Within each set (i.e., the first pair and the set of five symbols), the symbols were selected without replacement. The RIOT has a pool of 20 symbols, which means that there are 190 possible pairs of symbols and 15,504 possible sets of 5 symbols. Together, there are 2,945,760 unique combinations of sets of 2 and 5 symbols that could be created. Among these combinations, 1,627,920 would have no shared symbols between sets (55.3%), 1,162,800 (39.5%) would have one symbol shared between them, and 155,040 (5.3%) combinations would have both symbols from the first set appearing in the second set of 5 symbols.
The Symbol Search task often has a weak g loading, which is common in processing speed tasks. Still, it is a better measure of intelligence than other measures of processing speed (Lichtenberger & Kaufman, 2013). Symbol Search also had the advantages of having items that were easy to create and being a very short subtest: just 2 minutes.
Abstract Matching is another test of processing speed that was described by Hale (1990). In this task, there is a target stimulus presented and two option images. The examinee must identify which of the two options more closely resembles the target stimulus. In the RIOT, the target stimulus is a collection of colored geometric shapes that have four characteristics: (1) shape, (2) color, (3) number, and (4) orientation. An example of Abstract Matching item for the RIOT is displayed below:
On the RIOT, Abstract Matching items are organized into four groups. In the first group, the correct option and the target image are identical (as in the image above). In the second group, they differ on characteristic. In the third group, they differ on two characteristics. In the fourth group, they differ on three characteristics. Incorrect options always differ from the target in four characteristics. Examinees have one minute to complete as many items in a group as they can. They automatically advance to the next group of items when the examinee (1) completes all 20 items in a group, or (2) reaches the end of the 40-second time limit.
The Abstract Matching items were created by randomly selecting combinations of shape, color, number, and orientation and creating the target image. There were 420 possible target images that are created by randomly selecting the 4 characteristics. For a given target image there are 144 possible distractors that differ on all 4 characteristics, meaning that there are 60,480 possible pairs of target images and distractors. Table XXXX shows how many possible sets of three images (target, correct option, and incorrect option) each item that the Abstract Matching subtest can have. Across all four sets of items, the Abstract Matching subtest can have 44,240,256 possible items.Abstract Matching was selected as a subtest because it was easy to create items andâunlike Symbol Searchâto vary items in difficulty by increasing the number of characteristics that differ between the target stimulus and the correct option. Abstract Matching also has the advantage of being a more cognitive task than most measures of processing speed.
The final subtest on the RIOT is Reaction Time. This subtest has two portions: simple reaction time and choice reaction time. In the simple reaction time portion, examinees are instructed to press a key on their keyboard as soon as they see a blue square. This task is presented 20 times (including practice items). In the choice reaction time portion, examinees are told to press the â1â key when they see a blue square and to press the â2â key when they see a yellow plus. This task is also presented 20 times (including practice items). Between each item, there is a preparation screen that lasts between 3 and 10 seconds. This time interval is chosen randomly. The stimuli for this task are shown below.
In the simple reaction time portion, all stimuli are the blue square. In the choice reaction time portion, the blue square and the red plus are presented at random. It is important to recognize that 20 items per task includes 5 practice items. Both the number of practice items and the total number of items per task are much lower than is recommended for measures of reaction time (e.g., Jensen, 2006). However, this was necessary to minimize examinee boredom. Data from the pilot administrations indicated that the RIOTâs adult examinee population adapted to the tasks quickly and that their times typically leveled off after 3-4 trials (see Figure XXXX).
Reaction Time was selected as a subtest for the RIOT because it is easy to create items, and the task has a long history of functioning as measures of processing speed (e.g., Peak & Boring, 1926). Additionally, there is strong evidence that reaction time tasks are an importantâthough indirectâmanifestation of the basic neural processing speed of the brain (Schubert et al., 2015). However, reaction time tasks tend to have very weak g loadings (Jensen, 1980). Including choice reaction time improves this g loading (Jensen, 1980, 1998).Â
As can be ascertained from the descriptions and the example items, the 15 subtests on the full RIOT are quite varied in their format, content, and cognitive processes that they elicit. This diversity in formats allows the RIOT to have the property of âsystematic homogeneity.â When a test has systematic homogeneity, the meaning of a testâs scores is not dependent on any particular task or item format. This increases confidence in that the test measures something meaningful because, âIf a psychological construct is powerful, it should travel through multiple assessment vehiclesâin this context different item and scale typesâ (Lubinski, 2005, p. 5, emphasis in original).
In addition to the 15 subtests described above, there were 21 other subtests that were considered for inclusion on the RIOT 1.0. The only one that made it to draft for was a Logical Reasoning subtest, which was intended to be one of the verbal subtests on the RIOT. This subtest consisted of items based on the Propositional Reasoning items from the ICAR test (GĂźhne et al., 2021). A draft subtest was administered to a convenience sample in Stage 1 (see explanation below), and it was discovered that these items took far longer to solve than anticipated (about 2-3 minutes per item). This subtest was replaced by the Analogies subtest.
Be the first to experience RIOT. Sign up for exclusive updates and priority access.