The media industry and various sectors are resisting the unauthorized use of creative works for AI training. Experts assert the necessity of fair profit distribution.

Recently, “Hype Boy” by New Jeans, sung by world-renowned singer Bruno Mars, and Yoon Jong-Shin's “Uphill” sung by Freddie Mercury are hot topics in video media platforms such as YouTube. These videos used AI technology to imitate human voices so convincingly that people are amazed by their realism.

This significant advancement in AI learning technology has led to the mass collection of existing copyrighted materials. Only AI that has been trained on vast amounts of copyrighted data can generate creative works. This has sparked a debate between those who believe AI's use of copyrighted material for learning should be protected and those who prioritize the advancement of AI technology.

A picture created by "DALL·E 2," a Image-generating AI made by Open AI, which was asked to draw "AI Taking Copyright." [Created by DALL·E 2]
A picture created by "DALL·E 2," a Image-generating AI made by Open AI, which was asked to draw "AI Taking Copyright." [Created by DALL·E 2]

■Is It Illegal to Use AI’s Copyright Data?

AI industry professionals generally argue that using copyrighted material in AI training is considered “fair use,” closely tied to research and technological progress. In most countries, including South Korea, the use of copyrighted works is not considered a copyright violation if it corresponds to the fair use category according to copyright law. However, there is ongoing global debate regarding the specific boundaries of fair use, because the interpretation of the specific category of fair use is not clear.

Experts suggest that if AI-generated creations have the potential to devalue the economic worth of original works, it should not be considered fair use. Kang Myung-Soo (Prof. of Law School, PNU) argues, “If AI's commercial potential remains, and AI-generated works could undermine the market value of existing works, it should be considered copyright infringement.” In South Korea, using another person’s voice to create songs or videos is also considered a violation of the Unfair Competition Prevention Act. The revision of the law in March added a clause that considers “the case of infringing on the economic interests of others by using the voices of others with widely recognized and economic value in Korea” and allows the owner of the voice used in the video to claim damages.

However, regulating the growing AI market is a global complex issue. Large datasets used for building massive language models like ChatGPT are challenging to identify without the cooperation of developers. In response to this issue, the US and EU are preparing laws to monitor AI activities. In May, the EU introduced a new provision in its “AI Law” that makes it mandatory to label the copyright of datasets used for AI training. South Korea is also working on legislation to address AI development, but no definitive results have been achieved so far. In the 21st National Assembly, 13 bills related to AI were proposed, but they are still under consideration in the relevant deliberation committee. 

■The Press Opposes the Reckless Use of AI

The extensive use of information by AI is a global concern, not only within the AI industry but also in the media sector because various types of data from the press, including news data stored on the web, are primarily used for AI training. On September 6th, the World Association of News Publishers (WAN) called on AI developers to respect intellectual property rights and existing content and to advocate for transparency in AI systems through the “Global AI Principles.” The Korea Newspaper Association (KNA) also expressed concerns about the release of “Hyper ClovaX,” a generative AI model by Naver, on August 24th, as it used news data in the training of the previous model, “Hyperclova.”

On August 22nd, the KNA presented “Five Demands for Preventing Copyright Infringement by Generative AI in News” to major domestic and international IT companies developing generative AI. These demands included discussions with news copyright holders on the criteria for AI usage, adherence to the global AI principles of the WAN, transparency of sources, contents and routes in AI training data, and clear specifications of methods of using news content in AI training, as well as the establishment of a fair compensation system for news creators.

Media outlets are increasingly adopting “crawler prevention” terms to protect copyrights. This aims to prevent the act of crawling, which involves scraping data from the web for AI learning purposes. According to a report from the KNA on September 6th, major domestic newspapers such as Hankook Ilbo, Chosun Ilbo, and JoongAng Ilbo have been blocking access by the “GPT bot,” OpenAI's crawling tool, since August. They have also implemented “AI and mass crawling prevention terms” to protect their content. Similarly, numerous overseas media outlets like The Guardian in the UK, The New York Times, and the Chicago Tribune in the US are responding to the use of newspaper materials in AI learning through similar terms and conditions.

■Technology Advancement and Rights Protection Need to Be Harmonized

Experts emphasize the need for a balance between AI’s rapid development and copyright protection in the current AI market. Excessive regulation may hinder societal progress, given the high value that the AI market holds. According to a report by Bloomberg Intelligence, a financial information network based in the US, released in June of this year, the size of the generative AI market is predicted to grow nearly 30 times, from $39.8 billion in 2022 to $1.3 trillion in 2032. This substantial growth is attributed to global considerations regarding both the protection of AI training data copyrights and the promotion of the AI industry.

To maintain a balance between two elements, it is considered crucial to appropriately distribute the revenue generated by AI. This involves fair distribution not only to AI developers and creators who utilize AI to generate new works but also to the copyright holders of the data used for AI training. Prof. Kang said, “The law should be changed in a way that AI developers use data first and distribute the profits generated from it to copyright holders. We need to think about how to allow more use of works for technological development but protect the status of rights holders as well.”

Open-source data development accessible to everyone should also continue. It’s a global principle that information accumulated by humanity should not be monopolized by specific companies, but rather shared to promote technological development. In this regard, the Korean Copyright Commission has been running the “Osori Project” since April, collaborating with companies like Kakao, Samsung Electronics, and LG Electronics to establish an open-source software database. Kwon Hyuk-Chul (Prof. of Information and Computer Engineering, PNU) in the AI lab said, “In a system where AI developers initially use data and share the profits with copyright holders, we should acknowledge that vast data is the common asset of humanity and strive for transparency in sharing data used in AI learning.”

Reporter You Seung-Hyun

Translated by Ha Chae-Won

키워드

#AI #copyright #PNU
저작권자 © 채널PNU 무단전재 및 재배포 금지