生成式人工智能数据训练著作权侵权风险及规制路径

聂洪涛; 侯景译

doi:10.12189/j.issn.1672-8505.2026.01.002

生成式人工智能数据训练著作权侵权风险及规制路径

Copyright Infringement Risks and Regulatory Pathways of Data Training in Generative Artificial Intelligence

摘要

摘要: 当前，生成式人工智能展现出显著应用潜力，依托成熟模型进行数据训练，大幅推动产业革新已成为发展共识。然而，生成式人工智能数据训练在促进技术进步的同时，相关著作权侵权风险亦日益凸显。文章从剖析生成式人工智能数据训练的技术原理与核心环节入手，进而就数据来源合法性判断、数据使用行为侵权判定等维度明确侵权风险认定标准，并系统检视输入、训练、输出各阶段的侵权风险样态。在此基础上，考察借鉴域外国家相关规制经验，通过构造适配数据训练的合理使用制度，完善输入端数据收集许可与信息披露机制，健全输出端监管机制与优化侵权审查认定体系，强化区块链、数字水印等技术保护手段，探索生成式人工智能开发者、数据使用者、著作权人三方间利益平衡机制，有针对性地提出对应的规制路径，旨在化解技术发展与法律滞后的矛盾，推动生成式人工智能数据训练健康有序发展。

Abstract: At present, generative artificial intelligence (GAI) has demonstrated significant application potential, and leveraging mature models for data training to substantially promote industrial transformation has become a widely shared development consensus. However, while data training in generative artificial intelligence accelerates technological progress, the associated risks of copyright infringement have become increasingly prominent. This paper begins by analyzing the technical principles and core processes of data training in generative artificial intelligence, and further clarifies the criteria for identifying infringement risks from such dimensions as the legality of data sources and the determination of infringing data use. It also systematically examines the patterns of copyright infringement risks at different stages of data input, model training, and output generation. On this basis, drawing on relevant regulatory experiences from foreign jurisdictions, the paper proposes targeted regulatory pathways, including constructing a fair use framework adapted to data training, improving licensing and information disclosure mechanisms for data collection at the input stage, strengthening regulatory mechanisms at the output stage and optimizing infringement review and determination systems, and reinforcing technical protection measures such as blockchain and digital watermarking. In addition, it explores mechanisms for balancing the interests of generative artificial intelligence developers, data users, and copyright holders, with a view to mitigating the tension between rapid technological development and legal lag, and promoting the healthy and orderly development of generative artificial intelligence data training.

HTML全文

参考文献(47)

施引文献

资源附件(0)