Platform Liability, AI Training Data, and the Limits of Safe Harbour Protection
Abstract
The rapid expansion of artificial intelligence (AI) systems has transformed how digital platforms collect, process, and utilize vast amounts of data. A significant portion of this data, including copyrighted materials, personal information, and user-generated content, is used to train AI models that power recommendation systems, generative AI, and predictive analytics. Traditional legal frameworks governing platform liability, particularly safe harbour protections, were designed primarily for passive intermediaries that host or transmit content rather than actively analyze and transform it for machine learning purposes. This evolution raises complex legal questions about whether platforms that collect and use content for AI training should continue to benefit from immunity provisions. This paper examines the intersection of platform liability, AI training data practices, and the limitations of safe harbour doctrines. It explores how the increasing role of platforms in curating and processing large datasets challenges the foundational assumptions behind intermediary liability protections. The study further analyzes emerging legal debates regarding responsibility for copyright infringement, privacy violations, and algorithmic accountability. By evaluating regulatory developments and judicial interpretations, the paper highlights the growing tension between innovation incentives and the need to protect creators, users, and data subjects. Ultimately, it argues that existing safe harbour frameworks may require significant reform to address the new realities of AI-driven platforms.