Glossary

A

ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing): A technique used to map regions of open (accessible) chromatin across the genome. In MorPhiC, ATAC-seq is used to assess how gene knockouts affect chromatin accessibility, providing insight into regulatory changes that follow loss of gene function.


B

Bulk RNA-seq: A sequencing method that measures the average gene expression across all cells in a sample. MorPhiC uses bulk RNA-seq alongside single-cell methods to profile transcriptional changes resulting from null alleles at population scale.


C

Catalog (MorPhiC Catalog): The central deliverable of the MorPhiC program — a publicly available, standardised collection of molecular and cellular phenotypes associated with null alleles for every human protein-coding gene. The catalog is hosted at morphic.bio and is freely accessible to the biomedical community.

Cell Village / KO Village: An experimental strategy in which cells carrying different gene knockouts are pooled and cultured together, then individually identified by sequencing. This multiplexed approach enables high-throughput phenotyping of many gene knockouts simultaneously.

CRISPR: A genome-editing technology (commonly CRISPR-Cas9) used throughout MorPhiC to create null alleles by disrupting specific target genes. Guide RNAs (gRNAs) direct the Cas9 enzyme to cut precise genomic locations, causing loss-of-function mutations.

Community Resource Project: A designation from the National Human Genome Research Institute (NHGRI) indicating that MorPhiC data must be shared openly and promptly with the research community, without restriction, once verified and deposited into public databases.


D

DAV (Data Analysis and Validation Center): One of several MorPhiC consortium centres responsible for independently analysing data generated by the DPCs to assess its quality, reproducibility, and utility for diverse biological questions.

DRACC (Data Resource and Administrative Coordinating Center): The central hub of the MorPhiC consortium responsible for acquiring, standardising, validating, and storing multi-omics data and metadata. The DRACC also maintains the morphic.bio web portal and ensures data are accessible for download, analysis, and visualisation by the broader research community.

DPC (Data Production Research and Development Center): One of several MorPhiC consortium centres responsible for generating experimental data at scale. DPCs develop and apply diverse cellular systems and assays to produce phenotypic data for gene knockouts.


F

Functional Genomics: The field of biology focused on understanding the roles of genes and regulatory elements, especially at a genome-wide scale. MorPhiC is a functional genomics initiative aimed at characterising the function of every human protein-coding gene.


G

Gene Knockout (KO): The deliberate inactivation of a gene, typically using CRISPR, to create a null allele. MorPhiC systematically creates knockouts across all human protein-coding genes to observe the resulting molecular and cellular phenotypes.

gRNA (Guide RNA): A short RNA molecule that guides the CRISPR-Cas9 complex to a specific DNA sequence for editing. MorPhiC uses gRNAs to direct knockout of target genes. The gRNA-Enrichment pipeline in MorPhiC tracks and analyses the representation of guide RNAs across experimental samples.


H

hPSC (Human Pluripotent Stem Cell): A stem cell type — including both embryonic stem cells (hESCs) and induced pluripotent stem cells (iPSCs) — capable of differentiating into virtually any cell type in the body. MorPhiC uses hPSCs as a standardised, renewable in vitro system for generating null alleles and measuring phenotypes across diverse cell lineages.


I

iPSC (Induced Pluripotent Stem Cell): An adult cell that has been reprogrammed back into a pluripotent state. iPSCs are widely used in MorPhiC experiments because they are renewable, genetically tractable, and can be differentiated into many cell types for phenotypic assays.

In Vitro Multicelullar System: A laboratory model in which cells are grown and sometimes differentiated outside a living organism. MorPhiC relies on in vitro multicellular systems — including stem cell-derived organoids and co-cultures — to model gene function in physiologically relevant contexts while enabling systematic, scalable experimentation.


M

Metadata Scheme: A standardised set of fields and controlled vocabularies used by MorPhiC to describe experimental conditions, cell lines, protocols, and data files. The schema ensures that data produced by different DPCs are interoperable and reproducible.

MorPhiC (Molecular Phenotypes of Null Alleles in Cells): An NHGRI-funded consortium program launched in 2022 with the goal of cataloguing the molecular and cellular phenotypes of null alleles for every human protein-coding gene using in vitro multicellular systems. The program aims to address a major gap in understanding: the function of the majority of human genes remains unknown or poorly characterised.

Multi-omics: The integration of data from multiple "omics" disciplines — such as genomics, transcriptomics, epigenomics, and proteomics — to build a comprehensive picture of how gene knockouts affect cellular biology. MorPhiC generates and integrates multi-omics datasets for each knocked-out gene.


N

NHGRI (National Human Genome Research Institute): The U.S. federal institute that funds and oversees the MorPhiC program. NHGRI's 2020 Strategic Vision called for the biological function of every human gene to be known by 2030 — a goal that MorPhiC directly supports.

Null Allele: A version of a gene that produces no functional protein, typically due to a mutation that disrupts gene expression or creates a non-functional product. Creating null alleles is the primary experimental intervention in MorPhiC, enabling study of what happens to a cell when a given gene is completely absent.


O

Organoid: A version of a gene that produces no functional protein, typically due to a mutation that disrupts gene expression or creates a non-functional product. Creating null alleles is the primary experimental intervention in MorPhiC, enabling study of what happens to a cell when a given gene is completely absent.


P

Phenotype (Molecular/Cellular): The observable characteristics of a cell or organism resulting from its genetic makeup and environment. In MorPhiC, phenotypes are measured at the molecular level (e.g., changes in gene expression, chromatin state) and the cellular level (e.g., changes in cell morphology, proliferation, or differentiation) following gene knockout.

Protein-coding gene: A gene whose DNA sequence is transcribed into RNA and then translated into a protein. The human genome contains approximately 19,000–20,000 protein-coding genes. Systematically characterising all of them is the core mission of MorPhiC.


S

scRNA-seq (Single-Cell RNA Sequencing): A sequencing method that measures gene expression in individual cells rather than across a bulk population. MorPhiC uses scRNA-seq to capture cell-to-cell variability in transcriptional responses to gene knockouts, enabling discovery of phenotypes that might be masked in bulk assays.

STAR-suite: A bioinformatics software suite used within MorPhiC for aligning RNA sequencing reads to the human genome. It is a core tool in the consortium's computational pipelines for processing both bulk and single-cell RNA-seq data.


U

UMI (Unique Molecular Identifier): A short, random DNA barcode attached to individual molecules before sequencing. UMIs allow MorPhiC researchers to detect and remove PCR duplicates, improving the accuracy of quantification in single-cell and bulk sequencing experiments.


This glossary covers key terms used across the MorPhiC program and its associated data, methods, and infrastructure. For full details, refer to the consortium's Nature Perspective (Adli et al., 2025).