I. Introduction: Beyond the Paperless Corporation

The legal profession has, by most measures, gone paperless. Law firms store documents in the cloud, courts accept electronic filings, and corporate transactions close without a single sheet of paper changing hands. Yet despite this surface-level digitization, corporate law remains conceptually tethered to paper. The artifacts that define corporate existence (articles of incorporation, bylaws, shareholder resolutions, minute books) are still conceived, drafted, and treated as documents. They happen to be stored as PDFs rather than in filing cabinets, but their fundamental nature has not changed. They emulate sheets of paper rather than represent information in a natively digital form.

This paper argues that the persistence of what we call the “paper paradigm” is no longer adequate. It obscures the true nature of the corporation in a digital environment and forecloses possibilities for automation, interoperability, and legal certainty that structured data representations would enable.

The Paper Paradigm

In contemporary legal practice, corporate reality is represented through a familiar set of artifacts: articles of incorporation, bylaws, board and shareholder resolutions, share certificates, minute books, contracts, and regulatory filings. These documents are treated as authoritative, static records. When lawyers speak of “the corporate record,” they mean a collection of such documents, typically organized chronologically and stored, whether physically or electronically, as the definitive account of the corporation’s existence and actions.

Digitization, as it has occurred in legal practice, has largely meant converting paper into electronic documents. A PDF is not a rethinking of the document; it is a digital replica of a printed page. Word processors produce files optimized for eventual printing or PDF conversion. Even “born digital” documents, those never intended to exist on paper, retain the formatting conventions, pagination, and visual structure of their paper ancestors. They are designed for human reading, not for computation.

The consequences of this approach are significant. Documents become the source of truth rather than views over underlying information. The relationships between corporate elements (shareholders and shares, directors and resolutions, subsidiaries and parents) remain implicit in prose rather than explicit in structure. Extracting information requires reading, interpreting, and often re-keying data that could have been structured from the start.

Why This Matters Now

The scale of the problem is considerable. Industry estimates suggest that approximately ninety percent of enterprise-generated data is unstructured, comprising emails, documents, images, and similar artifacts that lack a predefined schema.¹ Legal data, including contracts, corporate records, and regulatory filings, constitutes a substantial portion of this unstructured mass.

A note on terminology: this paper distinguishes between unstructured data (free-form text, images, scanned documents requiring interpretation to extract meaning), structured data (fixed schemas, relational databases, rigidly defined fields), and semi-structured data (flexible schemas, self-describing formats like JSON and XML). When we propose representing corporations as “structured data,” we use the term broadly to encompass both structured and semi-structured formats. What unites these categories, and distinguishes them from unstructured data, is explicit organization amenable to programmatic processing. A JSON document describing corporate ownership is semi-structured in the technical sense, but it is structured in the sense that matters: its contents can be queried, validated, and transformed without human interpretation.

Every corporate transaction generates documents that, despite containing highly structured information (parties, dates, amounts, conditions), encode that information in prose paragraphs rather than data fields.

This creates a growing mismatch between what organizations need and what their legal infrastructure provides. Modern enterprises depend on data pipelines, APIs, and automated workflows. Yet their legal foundations remain locked in formats that resist integration. Compliance checks require manual review. Due diligence means reading hundreds of documents. Corporate restructurings demand painstaking reconciliation of inconsistent records across jurisdictions.

Increasingly, organizations turn to artificial intelligence to bridge this gap. Large language models can extract information from contracts, summarize board minutes, and flag potential issues in corporate records. But this approach, using sophisticated AI to interpret documents that were never designed to be machine-readable, is fundamentally compensatory. It deploys advanced tools to work around poor underlying data structures rather than addressing the structure itself.

The Core Thesis

At a conceptual level, a corporation is not a stack of documents. It is an abstract legal construct defined by a set of attributes (jurisdiction of incorporation, legal form, capital structure), relationships (shareholders, directors, officers, subsidiaries), rules (governance constraints, voting thresholds, transfer restrictions), and a history of state changes over time (incorporations, amendments, issuances, transfers, dissolutions). These characteristics are far more naturally represented as structured data than as prose documents.

Software architects will recognize a familiar pattern here: the Model-View-Controller (MVC) paradigm. In this frame, the corporation’s underlying data (ownership, governance, history, metadata) constitutes the Model, the authoritative source of truth. Documents, certificates, and reports are Views, rendered representations formatted for specific audiences and purposes. The processes that update corporate state (board approvals, share transfers, regulatory filings) are Controllers, mediating between human decisions and state changes. The paper paradigm conflates these layers, treating the View (the document) as if it were the Model itself. Corporation-as-code restores the separation: the Model is structured state, Views are generated on demand, and Controllers implement permitted state transitions with an append-only event log and a clear, unidirectional flow from decision to record to representation.

The central claim of this paper is that we should reconceive the corporation as what we term “corporation as code”: a model in which the legal entity is natively represented as a structured, version-controlled data object rather than as a collection of documents. Under this approach, corporate attributes and relationships are expressed as standardized data structures (JSON schemas, for example), with traditional documents functioning as human-readable views generated from an authoritative computational source. Electronic signatures attach to data states or commits, not merely to rendered documents.

This is a conceptual shift more than a technological one. The tools to implement such a model largely exist. Entity management platforms already store corporate information as structured data behind their interfaces. Government registries maintain corporate records in databases. The missing element is recognition that the document is not the corporation, and that treating it as such imposes unnecessary constraints on what corporate law can achieve.

Scope and Structure

This paper develops the corporation-as-code thesis across eleven sections. Following this introduction, Section II examines the transition from paper artifacts to data structures, drawing on examples from corporate registries that have begun exposing information through APIs. Section III surveys existing approaches in legal technology and entity management, arguing that current tools replicate paper-based assumptions behind proprietary interfaces. Section IV presents the corporation-as-code model in detail, drawing analogies from software engineering practices such as version control and continuous integration. Section V distinguishes this proposal from decentralized autonomous organizations (DAOs), which execute organizational logic as code rather than merely representing it.

The paper then turns to implications. Section VI considers how structured corporate representations enable new organizational forms, particularly the emergence of highly capitalized companies with minimal staff. Section VII proposes integrating legal architecture as a layer within enterprise architecture frameworks. Section VIII examines why using AI to interpret unstructured legal documents represents an inefficient detour, arguing that structure should precede intelligence. Section IX examines the role of government registries and public APIs in enabling interoperability.

Section X addresses risks, objections, and governance concerns, including questions of access, complexity, and the relationship between human judgment and automated systems. Section XI concludes with reflections on the path toward computable corporate law.

The paper sources its examples from, and focuses on, the corporate law tradition in common law jurisdictions (the United States, United Kingdom, Canada, and Australia) while recognizing that the underlying arguments apply more broadly. The goal is conceptual: to articulate a shift in mental models from the corporation as document to the corporation as code. This paper does not propose detailed legal analysis of whether “pure” corporation-as-code artifacts would be accepted as regulatory filings or address evidentiary questions in litigation. Rather, it explores how structured data can transform corporate practice within existing legal frameworks.

A note on scope: while this paper focuses on corporations for clarity and concision, the analysis extends naturally to other organizational forms. Partnerships, limited liability companies, non-profit organizations, trusts, and investment funds all face similar challenges: governance structures encoded in documents, membership and ownership tracked in prose, compliance obligations scattered across contracts and filings. Private investment funds, in particular, may prove early adopters given their repetitive fund formation processes, standardized terms, and dense webs of investor obligations. The principles developed here apply wherever organizational complexity meets document-centric infrastructure.

IBM, “Structured vs. Unstructured Data: What’s the Difference?” https://www.ibm.com/think/topics/structured-vs-unstructured-data ↩