▸ 1884: Francis Galton attempted to measure intelligence at the Anthropometric Laboratory in London.
▸ 1904: Charles Spearman discovers the general mental factor, which he names "g".
▸ 1905: Alfred Binet and Théodore Simon developed the first successful intelligence scale.
▸ 1912: William Stern introduced the original IQ formula.
▸ 1916: Lewis Terman published the Stanford-Binet Intelligence Scale, an expanded and Americanized version of Binet's tests.
▸ 1917: The U.S. Army created the Army Alpha and Beta Tests during WWI and tests over 1 million men in the word's first large-scale intelligence testing program.
▸ 1939: David Wechsler created the Wechsler-Bellevue Intelligence Scale.
▸ 1955: The first Wechsler Adult Intelligence Scale (WAIS) was published, a revised version of the Wechsler-Bellevue, was published.
▸ 1968: Researchers began investigating potential bias in IQ testing.
▸ 1984: James Flynn publicizes the gradual increase in IQ scores over time, a phenomenon later named the Flynn Effect.
▸ 1988: Richard J. Haier publishes the first study linking scores to brain anatomy.
▸ 1993: John B. Carroll publishes Human Cognitive Abilities, which makes the Cattell-Horn-Carroll (CHC) Theory became a leading model of cognitive abilities.
▸ 2000: The International Society for Intelligence Research (ISIR) is founded.
▸ 2025: The RIOT Launch – Built on over a century of IQ research, the RIOT test combines modern computerized testing with scientifically validated intelligence assessment.
Key Points
◉ Intelligence testing evolved from physical measurements to complex cognitive assessments
◉ Environmental factors significantly influence cognitive performance
◉ Cultural context and individual differences are crucial in understanding intelligence
◉ Modern tests aim to be more comprehensive, unbiased, and scientifically rigorous
Intelligence testing has transformed dramatically since its inception in the late 19th century. This timeline traces the over a century-long history of how we measure human cognitive abilities.
For over 100 years, scientists, psychologists, and researchers have grappled with how to measure human intelligence, developing increasingly sophisticated methods to understand the intricate nature of intelligence.
Intelligence testing represents more than just a scientific endeavor—it is a profound exploration of human potential. This journey is not simply about creating tests, but about unraveling the mysteries of human cognition. It is a story of persistent curiosity, technological innovation, and a deep commitment to understanding the remarkable capabilities of the human mind.
Let's start at the beginning...
The first attempt at measuring human intelligence was carried out by Francis Galton, who established the Anthropometric Laboratory in London as a setting to conduct his tests. Convinced that the source of intelligence was in the brain, Galton hypothesized that larger, healthier, and better-functioning brains would also be smarter brains. Galton proposed that individuals with larger heads were likely more intelligent. Additionally, he suggested that better visual acuity and reaction time might reflect underlying cognitive strengths. Below is an illustration of his measuring devices:
His approach focused on physical measurements like head size, strength, and sensory acuity, believing these could indicate mental capabilities by measuring a well functioning nervous system. Influenced by the publication of his half-cousin Charles Darwin's On the Origin of Species, Galton also believed that intelligence was a trait passed down through genetic inheritance. Galton’s data at the time did not support his theories, though. While primitive by today's standards, Galton's work laid the groundwork for future intelligence research.
In 1904, Charles Spearman, a former British military officer, administered a series of academic and sensory tests to a sample of school children. Spearman discovered that performance on all of the tests was positively correlated, meaning that children who performed well on one test tended to perform well on the others. Spearman hypothesized that this pattern of test performance had a common cause, which he believed was intelligence. To test his theory, Spearman invented a new statistical method called factor analysis, and with it he discovered that performance on all of the tests could indeed be attributed to a common cause, which he called “g” (an abbreviation for the “general factor” or “general intelligence”). Even today, modern versions of factor analysis are used in intelligence research, and g is central to mainstream theories of intelligence.
Alfred Binet and Théodore Simon developed the first successful intelligence scale in France. Galton's work inspired Binet and Simon to measure human intelligence. Binet and Simon created a series of questions that tested memory, attention, and problem-solving skills. Unlike tests that measured academic knowledge, Binet's assessment targeted more complex mental processes like attention, language comprehension, and memory.
Above is an example mental test from the first Binet-Simon Scale. The resulting evaluation became known as the Binet-Simon Intelligence Scale. Their novel test aimed to identify students who might need additional educational support. This test became the first successful intelligence test and was revised twice before Binet’s death in 1911.
William Stern introduced the original Intelligence Quotient (IQ) formula, providing a mathematical method to express cognitive abilities. Stern was intrigued by the challenge of measuring mental development. His fundamental question—how to systematically track intellectual growth—became the driving force behind his groundbreaking work in developing what was then called the “mental quotient” (MQ). Stern transformed Binet's original approach by changing how intelligence was calculated. Instead of simply subtracting mental age from chronological age, Stern proposed dividing mental age by chronological age.
MQ = Mental Age / Chronological Age
This formula provided a standardized measure that remained consistent across different age groups, making it a more reliable indicator of cognitive abilities. This allowed for standardized comparisons of mental performance across different individuals, transforming intelligence from an abstract concept to a measurable metric.
Lewis Terman published the Stanford-Binet Intelligence Scale (SBIS), refining and standardizing Binet's original work. By revising the Binet-Simon scales for American populations, Terman created the Stanford-Binet test, which became the most widely used IQ assessment. Terman developed the mental tests in order to identify gifted students and those with cognitive challenges. His comprehensive assessments measured creativity, mathematical ability, memory, motor skills, logic, and language proficiency. This version became the standard intelligence test in the United States, offering a more comprehensive and scientifically rigorous approach to measuring cognitive abilities. Terman also expanded intelligence testing by introducing a more comprehensive set of cognitive tasks that were less tied to formal education.
Terman built on Stern's mental quotient (MQ) by multiplying the result by 100, which ensured whole-number scores.
IQ = (Mental Age / Chronological Age) x 100
This approach transformed the measurement into what became known as the IQ, providing a standardized method of assessing cognitive abilities.
The Stanford-Binet Intelligence Scale was the first individually administered test in the United States for children and adolescents. It grouped test items by age levels, featuring tasks that became progressively more challenging. Here is a picture of the test administration kit for the earliest version of the SBIS:
The test included specific cognitive challenges like comparing two lines at age 4, repeating a 10-syllable sentence at age 5, and comparing two objects from memory at age 8. These items were selected based on increasingly lower percentages of correct answers across different age groups. Terman aimed to create a comprehensive assessment that measured a broader range of intellectual skills, moving beyond traditional academic knowledge.
During World War I, the U.S. Army implemented the Army Alpha and Beta Tests for large-scale intelligence testing. The image above was taken in October 1917, where a group examination was being held. The Stanford-Binet test was adapted into text-based (Army Alpha) and picture-based (Army Beta) versions. Below is an example of Army Beta questions, specifically for pictorial completion (Test 6) and Geometrical Construction (Test 7):
Testing continued until early 1919, by which time over 1.7 million soldiers had taken these tests. Based on their scores, men were given a label ranging from “A” (highest scoring) through “E” (lowest scoring).. These group tests screened and classified military recruits, demonstrating the potential of standardized testing for organizational decision-making and highlighting intelligence assessment's practical applications.
David Wechsler created the Wechsler-Bellevue Intelligence Scale (WBIS), introducing a more nuanced view of intelligence. His approach recognized that cognitive abilities are complex and multifaceted, moving beyond a single IQ score. Wechsler was dissatisfied with what he believed were the limitations of the Stanford-Binet intelligence test. Among his chief complaints about that test was the single score that emerged, its emphasis on timed tasks, and the fact that the test had been designed specifically for children and was poorly suited for adults. Wechsler devised a new test during the 1930s, known as the Wechsler-Bellevue Intelligence Scales. One of the significant modifications was the splitting of the Full-Scale Intelligence Quotient (FSIQ) to yield a Verbal (VIQ) and Performance (PIQ) intelligence quotient. Below is a picture of the WBIS testing kit:
Wechsler produced a battery of intelligence tests known as the Wechsler-Bellevue Intelligence Scale. The original battery was geared specifically to the measurement of adult intelligence for clinical use. He rejected the idea that there is an ideal mental age against which individual performance can be measured, and he defined normal intelligence as the mean test score for all members of an age group; the mean could then be represented by 100 on a standard scale. Instead of giving a single overall score, the WAIS provides a profile of several score that represent the test-taker's overall strengths and weaknesses.
The test was later revised and became known as the Wechsler Adult Intelligence Scale, or WAIS. The Wechsler-Bellevue test quickly became the most widely used adult intelligence test in the United States.
Researchers began critically examining intelligence testing for the possibility of cultural and racial bias. This pivotal year marked a significant turning point in understanding how standardized tests might inadvertently disadvantage certain population groups. Psychologist T. Anne Cleary (1968) published the first study on test bias (in the SAT), and others soon followed. Most studies showed little bias, and the most common type of bias (predictive bias) actually favored examinees in most minority groups. By 1980, there was enough evidence for psychologist Arthur Jensen to publish Bias in Mental Testing. This massive book that gathered all the data on test bias in cognitive tests and showed that the bias was small or non-existent in professionally developed tests. Today, it is standard practice and a professional mandate to screen tests for bias and eliminate it when it is found.
James Flynn published an article in 1984 showing that Americans’ average scores on intelligence tests had been gradually rising for decades. He followed this up with an article in 1987 showing that this was an international phenomenon and (surprisingly) it was stronger for non-verbal tests than verbal tests. This remarkable phenomenon was later named the “Flynn Effect” in his honor, but Flynn did not discover it. Test creators for decades had been aware of this increase in test performance. Flynn popularized the phenomenon and showed that it was more pervasive and stronger than the experts had believed. The Flynn effect would spawn a generation of research into the meaning of IQ scores and the influences on them. In time, the research would show that the increase is due to improvements in the specific abilities that contribute to IQ and not the core of intelligence itself.
Richard J. Haier published the first study that found a relationship between IQ and features of brain scans. This study re-introduced neurological perspectives (which had received little attention since Galton’s era) to intelligence research. Haier’s work began to explore the physiological underpinnings of cognitive performance. Haier challenged traditional views of intelligence by demonstrating that cognitive performance relates more to brain efficiency than brain size. Using positron emission tomography (PET) scans, Haier discovered that high-performing individuals use less neural energy when solving complex cognitive tasks, a finding that surprised scientists at the time.
The image above shows the correlations between the regional gray matter and the two highest g-loaded subtests, vocabulary and block design, from his study on Distributed brain sites for the g-factor of intelligence. Haier's research suggests that intelligence is not about how hard the brain works, but how effectively it processes information.
The Cattell-Horn-Carroll (CHC) Theory emerged in 1993 as a leading model of cognitive abilities, with the publication of John B. Carroll’s Human Cognitive Abilities. The CHC theory recognized multiple cognitive domains and described their interconnections. Synthesizing the work of Raymond Cattell, John Horn, and John Carroll, this theory has gained widespread acceptance due to its robust research foundation and its ability to reconcile previously contradictory viewpoints, such as the global vs. multi-ability views of intelligence. The CHC theory has become the primary model for selecting, organizing, and interpreting intelligence and cognitive ability assessments. The illustration below visualizes the CHC as a threestratum theory, with specific, narrow abilities (e.g., word recall) at the lowest stratum, broad abilities (e.g., spatial reasoning) in the middle stratum, and general intelligence at the highest stratum as the only global ability:
The CHC theory conceptualizes intelligence as a complex, interconnected system with multiple dimensions. These cognitive abilities are organized hierarchically, meaning that some have a broader scope than others. The CHC theory identifies broad cognitive abilities like Gf (fluid intelligence) and Gc (crystallized intelligence), which encompass approximately 70 more specialized, narrow abilities. These narrow abilities represent highly specific cognitive skills that reflect individual learning experiences, strategies, and performance nuances. CHC adapts and updates based on emerging research findings. The most recent refinement of the model encompasses 16 broad cognitive abilities, further refined by more than 80 specific, narrow abilities.
The International Society for Intelligence Research (ISIR) was founded, providing a dedicated platform for scholars to collaborate, share findings, and advance the scientific understanding of intelligence. ISIR focuses primarily on human intelligence, while also exploring cognitive abilities across species. The society welcomes rigorous studies and theoretical articles from diverse perspectives, including psychometrics, genetics, individual differences, evolutionary theory, and neuroscience.
Today, ISIR is the premiere organization for intelligence researchers to share ideas and announce new breakthroughs. The society publishes the journal Intelligence and Cognitive Abilities.
The Reasoning and Intelligence Online Test (RIOT) launched in 2025, representing the latest development in intelligence assessment. Built on the innovations of over a century of research, it combines computerized testing with scientifically validated methodologies, reflecting our most sophisticated understanding of human cognitive potential. Riot IQ's mission is to provide validated intelligence testing that gives individuals a deeper understanding of their cognitive abilities.
Dr. Russell T. Warne and his research team developed the RIOT to address a critical gap in intelligence testing: the lack of a professionally administered, rigorously developed online IQ test. The RIOT team aims to create a test that maintains the scientific validity of traditional methods while overcoming significant barriers.
The RIOT's innovative approach eliminates the cost, logistical challenges, and outdated limitations that have historically restricted access to intelligence testing. The result is a scientifically robust assessment designed for contemporary needs—accessible, affordable, and aligned with modern technological capabilities.
The history of intelligence testing reveals a profound scientific journey of understanding human cognition. Throughout the past century, intelligence testing has shifted from simplistic, narrow measurements to increasingly nuanced, comprehensive assessments. Critical milestones include:
The transition from physical measurements to cognitive assessments
Recognition of intelligence as a multifaceted construct
Increasing awareness of cultural and environmental influences on cognitive performance
Technological integration and neurological insights
Perhaps most importantly, the field has become increasingly self-reflective. From the studies of cultural bias to modern comprehensive testing approaches, intelligence research has consistently worked to become more inclusive, accurate, and continually improve its approaches and scientific standards.
As technology and scientific understanding advance, intelligence testing will undoubtedly continue to change. In the future, psychologists will likely discover more sophisticated insights into the fascinating complexity of human thought. TheRIOT test represents the latest chapter in this ongoing exploration—a testament to our continuing quest to understand human cognitive abilities.
Key Citations:
🔗 Francis Galton: Narrative of an explorer in the human sciences
🔗 William Stern’s IQ Formula: The Birth of Intelligence Quotient Measurement
🔗 How Does the Wechsler Adult Intelligence Scale Measure IQ?
🔗 Test Bias: Prediction of Grades of Negro and White Students in Integrated Colleges
🔗 Social Class, Race, and Genetics: Implications for Education
Be the first to experience RIOT. Sign up for exclusive updates and priority access.