WebinsidesWebinsides
  • Home
  • Features
  • Tech News
  • Internet
  • Business & Finance
  • Explore More
    • PCs & Components
What's Hot

The top free PDF readers as of March 2023

March 8, 2023

Top Wearables You Should Buy Today: A Guide to the Best Smartwatches of 2023

March 8, 2023

Slack is the latest app to get ChatGPT integration

March 7, 2023
Facebook Twitter Instagram
Sunday, March 26
Trending
  • The top free PDF readers as of March 2023
  • Top Wearables You Should Buy Today: A Guide to the Best Smartwatches of 2023
  • Slack is the latest app to get ChatGPT integration
  • The Nothing Phone 2 may lose out to flagship rivals in one key area
  • How to start an online store from scratch and drive sales
  • Windows 11 ‘Moment 2’ update has been released by Microsoft, featuring the following exciting new features.
  • Qualcomm aims to substitute eSIMs with iSIMs and has achieved the first certified SoC for this purpose.
  • Twitter’s revenue and earnings dropped by 40% soon after the acquisition by Musk
Facebook WhatsApp
WebinsidesWebinsides
  • Home
  • Features
  • Tech News
  • Internet
  • Business & Finance
  • Explore More
    • PCs & Components
WebinsidesWebinsides
Home»Internet»Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers
Internet

Language Models Like ChatGPT Could Be Plagiarising in More Ways Than Just ‘Copy-Paste’, Say Researchers

February 26, 2023Updated:March 2, 20235 Mins Read
Share
Facebook Twitter LinkedIn Email Reddit Telegram WhatsApp

Penn University research team tested OpenAI’s GPT-2 for plagiarism.

Photo Credit: Unsplash

The utilization of language models, such as ChatGPT, in generating responses for user prompts has raised concerns about potential plagiarism issues. The models may inadvertently reuse concepts from the training data without providing proper citation to the original source.

Before relying on chatbots to complete their assignments, students should be aware of the findings from a study conducted by a research team led by Penn University. This study specifically examined the topic and discovered that language models exhibit various forms of plagiarism when generating text in response to user prompts.

Dongwon Lee, professor of information sciences and technology at Penn State, explains that “plagiarism can take different forms”. The study aimed to investigate whether language models not only copy and paste but also engage in more sophisticated forms of plagiarism without being aware of it.

The researchers’ main aim was to pinpoint three types of plagiarism: verbatim plagiarism, which involves directly copying and pasting content, paraphrasing, which involves rewording and restructuring content without citing the original source, and idea plagiarism, which entails using the primary idea from a text without providing appropriate attribution. To achieve this, they established a system for detecting plagiarism automatically and evaluated it against OpenAI’s GPT-2. The decision to use the language model for the test was based on the fact that its training data is available online, making it possible for the researchers to compare generated texts to the 8 million documents utilized to pre-train the model.

To identify instances of plagiarism in pre-trained language models and fine-tuned language models, which were trained to specialize in specific topics, the researchers analyzed 210,000 generated texts. Specifically, they fine-tuned three language models to focus on scientific documents, scholarly articles related to COVID-19, and patent claims. The researchers utilized an open-source search engine to retrieve the top 10 training documents that were most similar to each generated text. They also modified a text alignment algorithm to more accurately detect instances of verbatim, paraphrased, and idea plagiarism.

During their research, the team discovered that language models were found to engage in all three types of plagiarism, and that the frequency of plagiarism increased with larger datasets and parameters used to train the model. The study also observed that language models that were fine-tuned experienced a decrease in verbatim plagiarism, but saw an increase in instances of paraphrasing and idea plagiarism. The researchers further identified that individuals’ private information could be exposed through all forms of plagiarism. The team is scheduled to present their findings at the ACM Web Conference in Austin, Texas from April 30 to May 4, 2023.

According to Jooyoung Lee, a doctoral student at Penn State’s College of Information Sciences and Technology, individuals pursue big language models as their generation capabilities improve with size. However, this pursuit also jeopardizes the uniqueness and creativity of the content in the training corpus, highlighting a crucial discovery.

According to the researchers, the study emphasizes the necessity for additional research on text generators and the ethical and philosophical concerns they bring up.

Thai Le, an assistant professor of computer and information science at the University of Mississippi, cautioned that while the results of language models may be attractive and they can be useful for certain tasks, it does not necessarily mean they are practical. Le, who began the project as a doctoral candidate at Penn State, emphasized that ethical and copyright concerns must be addressed in order to effectively utilize text generators in practical applications.

The study’s findings are only relevant to GPT-2; nonetheless, the automatic detection process for plagiarism established by the researchers can be used on more recent language models like ChatGPT to assess their tendency to plagiarize training content. Nevertheless, the researchers stated that testing for plagiarism relies on developers making their training data openly available.

According to the scientists, the present research can aid AI researchers in developing more resilient, dependable, and ethical language models in the future. However, for the time being, they caution individuals to be careful when utilizing text generators.

According to Jinghui Chen, an assistant professor of information sciences and technology at Penn State, researchers and scientists are working on improving language models to make them more efficient and durable. Meanwhile, many people use language models regularly for different tasks that enhance productivity. While it’s acceptable to use language models as a search engine or to debug code, Chen warns against using them for other purposes. This is because they could generate plagiarized content, which may lead to unfavorable outcomes for the user.

According to Dongwon Lee, the result of the plagiarism is not surprising.

“We have trained language models to imitate human writing without instructing them on proper plagiarism avoidance techniques, as stochastic parrots,” he explained. “Now, our objective is to educate them on writing more accurately, but we have a significant journey ahead of us.”

AI ArtificialIntelligence ChatGPT CopyPaste Ethics Innovation LanguageModels MachineLearning NaturalLanguageProcessing NLP Plagiarism Research Technology
Next Article ‘AI Could Be the Next Platform’: Silicon Valley Investors Rush to Find the Next ChatGPT

Related Posts

Top Wearables You Should Buy Today: A Guide to the Best Smartwatches of 2023

March 8, 2023

The Nothing Phone 2 may lose out to flagship rivals in one key area

March 7, 2023

Qualcomm aims to substitute eSIMs with iSIMs and has achieved the first certified SoC for this purpose.

March 7, 2023
Top Posts

Subscribe to Updates

Get the latest sports news from SportsSite about soccer, football and tennis.

The entirety of this site is protected by copyright © 2023 Webinsides. Dev - Shipan.
  • About Us
  • Contact Us
  • Terms of Services
  • Privacy Policy
  • Disclaimer
  • DMCA

Type above and press Enter to search. Press Esc to cancel.