Table of Contents Introduction I. The First Amendment and the Principle of Speech Certainty A. Text: Founding-Era Definitions of "Speech B. History and the Original Public Meaning of "Speech" 1. Prior restraint 2. Subsequent punishment C. Speech Certainty and the Purpose(s) of the First Amendment 1. Speech certainty and the leading theories of the First Amendment 2. Speech certainty and the purpose of the First Amendment's categorical exceptions II. Speech Certainty and First Amendment Jurisprudence A. Editorial Discretion 1. Exercises judgment about the contents of the compilation 2. Publishes the compilation B. Expressive Conduct 1. Intent to convey a particularized message through the conduct 2. Great likelihood that the message will be understood by a reasonable observer III. Understanding Algorithmic Output A. What We Mean By "Machine Learning" B. Code: The Shift from Traditional Programming to Machine Learning 1. Predictions with traditional programming 2. Predictions with machine learning a. Machine learning: Key terms b. Training the model: Labeled examples & parameters c. Writing the rules: Gradient descent i. Step 1: Calculate initial loss ii. Step 2: Gradient descent iii. Gradient descent with complex models d. Inference: Making predictions e. Explainability: The "black box" problem f. The inevitability of errors C. Platforms' Purported Machine Learning "Speech" 1. Ranking 2. Recommendation 3. Removal IV. Because Machine Learning "Speech" Lacks Speech Certainty, It Is Not Protected By the First Amendment A. Speech Certainty Is a Threshold Question to the Protection Analysis B. Assessing the Speech Certainty and Protection of Algorithmic Output 1. Traditional code a. The output of traditional code is characterized by speech certainty b. The output of traditional code is protected editorial discretion c. The output of traditional code is protected expressive conduct 2. Probabilistic traditional code a. The output of probabilistic traditional code is characterized by speech certainty b. The output of probabilistic traditional code is protected editorial discretion c. Probabilistic traditional code may qualify as expressive conduct 3. Machine learning a. Machine learning output is not characterized by speech certainty b. Machine learning output is not protected editorial discretion because it lacks speech certainty c. Machine-learning output also doesn't qualify as expressive conduct because it lacks speech certainty V. Regulatory Implications Conclusion
Introduction
When does something that isn't yet speech--an intuition, a spark of imagination, an embryonic thought--become speech, and, as a result of that becoming, earn the protection of the First Amendment? This is the rare case where inquiry confirms intuition. An idea becomes speech when it's spoken by a speaker: words actually written, a speech actually given, brushstrokes actually painted. The concept is so plain that it has hardly merited any scrutiny. In this Article, however, we argue that this concept--which we call the principle of "speech certainty"--defines the limits of what constitutes speech under the First Amendment, with important implications for our online public discourse.
The idea is simple: Speech is characterized by speech certainty when the speech can be identified with certainty by the speaker at the moment it is spoken. An audience might misunderstand what the speaker meant, or the speaker may have chosen her words poorly, but those words--the ones that left the speaker's mouth--constitute her speech because the speaker knew for certain what she said when she said it. And because until recently no other character of speech has existed, the First Amendment has only ever protected such speech. The principle of speech certainty is so inherent to the concept of speech that articulating its existence was never necessary.
But that has now changed with the emergence of machine learning algorithms, the outputs of which are claimed as the speech of their creators. (1) Unlike traditional algorithms, in which a programmer dictates rules for the algorithm to follow in perfectly predictable ways, machine learning algorithms write their own rules. (2) And these rules, without exception, calculate probabilities to make predictions. (3) Based on the combination of words in a post published by a particular user, for example, what is the likelihood that the post includes hate speech? Or, based on the way the pixels in an image are arranged, that the image includes nudity? The nature of these algorithms, however, is that their predictions always leave room for doubt. As a result, they can never be 100% accurate in their output. (4) Neither, then, can their programmers be certain of the contents of that output. (5)
Thus, when online platforms rely on machine learning algorithms to rank, recommend, and remove content, they can never be certain that the content published by their algorithms will align with what they intended to publish. (6) In fact, because the algorithm will always be wrong at least some of the time, it is guaranteed that the algorithm will publish precisely what the platforms intended not to publish. (7) A platform that prohibits hate speech and enforces that prohibition with a machine learning algorithm will inevitably publish hate speech. (8) But because it has outsourced enforcement to an algorithm that writes its own rules, the platform cannot know when, where, or even why that hate speech will appear on its platform. (9) Speech uncertainty is as inherent to the use of machine learning algorithms as speech certainty is to traditional speech.
Over the last decade, these machine learning algorithms have come to mediate public discourse--from search engines to social media platforms to, more recently, artificial intelligence companies. (10) These algorithms shape what we see (and don't see) across wide swathes of the internet, giving rise to a palpable and well-chronicled anxiety across the political spectrum. (11) Of course, anxiety about undue control over the public discourse is something of an American tradition--be it targeted at newspapers, broadcast companies, or, most recently, internet platforms. (12) This Article posits that today's concerns are different. Whether articulated in terms of "surveillance capitalism" or "the tyranny of Big Tech," today's concerns are not merely about the outsized influence of a speaker, (13) nor are they a moral panic concerning a new medium for speech. They arise instead from a new breed of "speech" altogether--the output of machine learning algorithms. Through their operation on social media and search platforms, these algorithms provide each of us with our own window into the world. And these windows are, as a technical matter, opaque. (14) Nobody--not even the companies that create the algorithms--fully understands why they make the decisions they do. (15) In response, legislatures the world over have lurched to exert some measure of control over this new force in our public discourse. (16) Such efforts in the United States, however, raise controversial questions of whether and how the First Amendment applies to the output of these algorithms. (17) The Supreme Court's decision in Moody v. NetChoice assures that these questions will soon have their answers. (18) Where the Court will land, however, is far from a foregone conclusion. While it ventured that "the editorial judgments influencing the content of [platforms'] feeds are ... protected expressive activity," (19) it put a stark asterisk on that conclusion: It was based only on the existing, undeveloped record. (20) The immediate contribution of this Article is to clear a path for that record's development in the lower courts. If followed, it will show in fact what this Article articulates in theory--that the output of online platforms' machine learning models does not qualify for First Amendment protection under the jurisprudence articulated by the Moody Court.
Thus, with the First Amendment status of online speech in flux, this Article also raises an alarm. If we fail to recognize the paradigm shift ushered in by machine learning, an outdated understanding of algorithmic speech threatens to lead us into a radical departure from centuries of First Amendment jurisprudence. By protecting the output of machine learning, the Constitution would, for the first time in its history, protect speech that a speaker does not know she has said. Perhaps that departure from First Amendment jurisprudence is the path we should take. Or, perhaps, as we and others argue, it is not. But whatever decision we make, it cannot be justified by simply extending the logic of First Amendment protection for the output of traditional code to the output of machine learning algorithms. (21)
In this Article, we make two arguments. First, that the principle of speech certainty defines the limits of the First Amendment. And second, because machine learning algorithms run afoul of this principle, their output is not speech within the meaning of the First Amendment and thus falls outside its protection.
We reach these conclusions by applying the most widely-accepted modes of constitutional interpretation--text, history, precedent, and purpose--to show that they compel the adoption of the principle of speech certainty. (22) In Part I, we establish that the text, history, and purpose of the First Amendment all support the principle of speech certainty. In Part II, we demonstrate that speech certainty has always been a first-order assumption underlying First Amendment jurisprudence, including the modern doctrines of editorial discretion and expressive conduct. Setting the First Amendment aside, Part III explains how machine learning algorithms work, distinguishing them from traditional algorithms and illustrating the uncertainty intrinsic to the technology. And finally, Part IV applies the technical...