Quantifying Biological Evolution: How Phylogenetic Trees Are Constructed
- 演化之聲

- Mar 9
- 5 min read

A phylogenetic tree—also called an evolutionary tree—is a graphical model used to represent the evolutionary relationships among organisms. Its branching structure illustrates how species diverge from common ancestors through evolutionary time. Each branching point represents an evolutionary split, the root of the tree represents the common ancestor of all taxa included in the analysis, and branches closer to the crown correspond to more recently evolved lineages.
Historically, phylogenetic trees were constructed primarily by comparing anatomical and morphological differences among organisms. With the rapid development of molecular biology, however, modern studies frequently use DNA sequence data as the primary source of information for reconstructing evolutionary relationships. Fossils, in contrast, rarely preserve genetic material, so paleontological phylogenies still rely largely on morphological evidence preserved in fossils. In many cases, paleontologists analyze these features using a method known as maximum parsimony, which seeks the evolutionary tree that requires the fewest changes in traits.
How DNA Sequences Are Used to Construct Phylogenetic Trees
When researchers attempt to reconstruct evolutionary relationships among species, they must identify characteristics shared among those species. These characteristics typically display slight variations among different organisms. By measuring the degree of difference, scientists can infer how closely related the species are.
Consider a hypothetical gene A with the following DNA sequences in three species:
Species 1: CAGTCGATGTCGTAGTGCTA
Species 2: CAGGCGTTGTCGTAGTGGTA
Species 3: CAGGCGTTGTCATAGTGGTA
We can compare these sequences pairwise and count the number of differing nucleotide positions:
Species 1 vs. Species 2: Differences at positions 4, 7, and 18 → 3 differences
Species 1 vs. Species 3: Differences at positions 4, 7, 12, and 18 → 4 differences
Species 2 vs. Species 3: Difference at position 12 → 1 difference
From this comparison, a distance matrix can be constructed.

The matrix shows that species 2 and species 3 are most closely related because they differ at only one position. Both also share three positions that differ from species 1 (positions 4, 7, and 18). Subsequently, species 3 appears to have experienced an additional mutation at position 12. These relationships can then be translated into a phylogenetic tree.

Real genes behave similarly but are typically much longer and contain more complex patterns of variation. For example, the gene encoding actin occurs in organisms as diverse as yeast, plants, and animals. Because this gene already existed in their common ancestor, its sequence is shared across these groups, although small differences accumulate through evolutionary time. In general, the actin sequence of yeast differs far more from those of birds or mammals than bird and mammal sequences differ from one another.
Another example is the insulin gene. Unlike actin, insulin evolved after animals appeared. Yeast and plants therefore lack this gene entirely, while different animal species possess slightly different insulin sequences.
If we compare insulin gene sequences from four species—the house sparrow (Passer domesticus), humans (Homo sapiens), domestic dogs (Canis lupus familiaris), and domestic cats (Felis catus)—we can observe how sequence variation reflects evolutionary relationships.

Because real DNA sequences are much longer than the simple example above and contain numerous mutations, calculating relationships manually would be extremely inefficient. Scientists therefore rely on computational algorithms that analyze sequence differences and generate phylogenetic trees automatically.

In such an analysis, the insulin gene tree closely matches known biological relationships: dogs and cats are most closely related because both belong to the order Carnivora. Humans are also mammals and therefore branch near them, whereas the house sparrow, being a bird, lies farther from the mammalian group.
However, phylogenetic inference rarely relies on a single gene. A single gene may occasionally produce a misleading evolutionary signal. For example, two distantly related species might share similar sequences for a particular gene purely by chance, while more closely related species show greater differences.
This phenomenon can occur through a process known as incomplete lineage sorting. Imagine a hypothetical gene H with two variants, H0 and H1, present in an ancestral population. When species C diverges first, only the H1 variant remains in that lineage. Meanwhile, the ancestor of species A and B retains both variants. Later, when A and B diverge, species A retains H0 while species B retains H1. If researchers compare only the H gene, they would mistakenly conclude that species B and C are closest relatives because both carry H1. In reality, A and B share the more recent common ancestor.

A real-world example illustrates this problem. Humans and chimpanzees are widely recognized as closest relatives, yet roughly 30% of the human genome appears genetically closer to gorillas. If researchers analyzed only those genomic regions, they might incorrectly infer that humans and gorillas share the most recent ancestry.
Because of such complications, modern phylogenetic studies typically analyze many genes—or even entire genomes—simultaneously. Bioinformatics has produced specialized algorithms designed to reduce errors caused by incomplete lineage sorting. Well-known examples include ASTRAL, StarBEAST2, and MP-EST. These approaches, combined with large-scale data such as whole-genome sequencing and whole-exome sequencing, greatly improve the reliability of phylogenetic reconstruction.
How Paleontologists Construct Phylogenetic Trees
In paleontology, researchers reconstruct evolutionary relationships by examining morphological evidence preserved in fossils. Clues include overall body form, skeletal structures, soft-tissue impressions, and sometimes chemical signatures of preserved molecules.
To compare fossil species, paleontologists identify anatomical characters across many parts of the body and convert them into numerical data. For instance:
Presence of a feature may be coded as 1, absence as 0.
Alternatively, different shapes or states of a structure may be assigned multiple numerical codes.
These values are then compiled into a character matrix and analyzed computationally to generate a phylogenetic tree.
A simplified hypothetical example might look like this:
Feature present = 1
Feature absent = 0
Character 1: Nasal bone expands laterally across the dorsal margin of the antorbital fenestra
Dinosaur A (0)
Dinosaur B (1)
Dinosaur C (1)
Character 2: Posterior projection of the parietal bone extends beyond the supraoccipital
Dinosaur A (1)
Dinosaur B (1)
Dinosaur C (1)
Character 3: Scapula positioned close to the neck in lateral view
Dinosaur A (1)
Dinosaur B (0)
Dinosaur C (0)

From these characters, dinosaurs B and C appear to share the closest relationship.

In real studies, paleontologists often examine dozens or even hundreds of characters. Specialized software then evaluates the complete dataset and identifies the most plausible phylogenetic trees.
Morphological comparisons can also be complicated by convergent evolution. For example, the tails of sharks and ichthyosaurs appear superficially similar, yet sharks belong to cartilaginous fishes while ichthyosaurs were marine reptiles. Their resemblance reflects similar adaptations to swimming rather than shared ancestry. Incorporating many independent characters helps reduce the risk of such misinterpretations.
Phylogenetic trees are among the most powerful tools for understanding the relationships among organisms on Earth. They reveal how species have diversified through time and allow researchers to trace the origins of modern biodiversity back to common ancestors. As molecular biology and computational methods continue to advance, phylogenetic reconstruction becomes increasingly precise, providing deeper insight into the history of life.
Author: Shui-Ye You
Reference:
Liang L et al. (2013). Adaptive evolution of the Hox gene family for development in bats and dolphins. PLOS ONE.




Comments