Copyright Infringement Risks and Regulatory Pathways of Data Training in Generative Artificial Intelligence
-
Graphical Abstract
-
Abstract
At present, generative artificial intelligence (GAI) has demonstrated significant application potential, and leveraging mature models for data training to substantially promote industrial transformation has become a widely shared development consensus. However, while data training in generative artificial intelligence accelerates technological progress, the associated risks of copyright infringement have become increasingly prominent. This paper begins by analyzing the technical principles and core processes of data training in generative artificial intelligence, and further clarifies the criteria for identifying infringement risks from such dimensions as the legality of data sources and the determination of infringing data use. It also systematically examines the patterns of copyright infringement risks at different stages of data input, model training, and output generation. On this basis, drawing on relevant regulatory experiences from foreign jurisdictions, the paper proposes targeted regulatory pathways, including constructing a fair use framework adapted to data training, improving licensing and information disclosure mechanisms for data collection at the input stage, strengthening regulatory mechanisms at the output stage and optimizing infringement review and determination systems, and reinforcing technical protection measures such as blockchain and digital watermarking. In addition, it explores mechanisms for balancing the interests of generative artificial intelligence developers, data users, and copyright holders, with a view to mitigating the tension between rapid technological development and legal lag, and promoting the healthy and orderly development of generative artificial intelligence data training.
-
-