TechDigits

Tech news
Wednesday, May 08, 2024

Google’s SummAE AI generates abstract summaries of paragraphs

Google’s SummAE AI generates abstract summaries of paragraphs

Google researchers propose a novel AI summarization model - SummAE- capable of generating abstract summaries of paragraphs.
Machines have a tougher time summarizing text than you’d think, at least where the summarization is abstractive rather than extractive. While the extraction requires merely concatenating sentences, abstraction involves the task of paraphrasing using novel sentences. Progress has been made in the news domain recently, perhaps owing to the abundance of corpora on which algorithmic systems can be trained. But robust summarization of most other writing forms remains an unsolved problem.

Motivated by this, a team at Google Brain investigated an abstractive summarization system dubbed SummAE that’s largely unsupervised, meaning it’s able to generalize from a small amount of training data to unseen textual examples. While it couldn’t summarize beyond single five-sentence paragraphs, the researchers claim it “significantly” improves upon the baseline and represents a “major” step in the direction of human-level performance.


Machines have a tougher time summarizing text than you’d think, at least where the summarization is abstractive rather than extractive. While the extraction requires merely concatenating sentences, abstraction involves the task of paraphrasing using novel sentences. Progress has been made in the news domain recently, perhaps owing to the abundance of corpora on which algorithmic systems can be trained. But robust summarization of most other writing forms remains an unsolved problem.

Motivated by this, a team at Google Brain investigated an abstractive summarization system dubbed SummAE that’s largely unsupervised, meaning it’s able to generalize from a small amount of training data to unseen textual examples. While it couldn’t summarize beyond single five-sentence paragraphs, the researchers claim it “significantly” improves upon the baseline and represents a “major” step in the direction of human-level performance.

Recommended videosPowered by AnyClip
Go Eat A McRib
Play

Unmute
Duration
0:59
/
Current Time
0:17

Fullscreen
Up Next

NOW PLAYINGGo Eat A McRib
Scientists Discover What Makes 'Water Bears' Virtually Indestructible
Doctor diagnoses his own cancer with an app
There's A Bigger Danger To Pedestrians Than Walking While Distracted
Prince Harry to edit National Geographic's Instagram
The Secret Culprit Of America's Student Debt Crisis
5 Quotes About The Power of Books

The data set and code are freely available on GitHub, along with the configuration settings for the best model.

“As one of the very first works approaching single-document [abstract summarization], we propose a novel neural model — SummAE,” wrote the coauthors. “[We believe it] is therefore desirable to have models capable of automatically summarizing documents abstractively with little to no supervision.”

SummAE contains a denoising autoencoder that encodes (that is, generates numerical representations of) sentences and paragraphs of the target text in a shared space. Guided by a decoder whose input is prepended with a token signaling whether to decode a sentence or a paragraph, the system generates summaries by decoding each sentence from the encoded paragraphs.

The researchers discovered that most traditional approaches to training the auto-encoder resulted in long, multi-sentence summaries. To encourage it to learn higher-level concepts disentangled from their original expression, the team employed two denoising approaches — randomly masking tokens and permuting the order of sentences within paragraphs — that increased the number of training examples substantially. They also experimented with an adversarial critic component that could distinguish between sentences and paragraphs, in addition to two pretraining tasks that encouraged the encoder to learn how sentences narratively followed within a paragraph.

The researchers trained three different variations of SummAE on the ROCStories, a corpus of self-contained, diverse, non-technical, and concise prose. They split the original 98,159 training stories into three separate collections — a training set, a validation set, and a test set — and collected three human summaries each for 500 validation examples and 500 test examples.

After 100,000 training steps with pretraining, the team reports that the best model significantly outperformed a baseline extractive sentence generator on the Recall-Oriented Understudy for Gisting Evaluation (ROUGE), a set of metrics devised to evaluate automatic summarization. Moreover, they say that in a qualitative study involving evaluators recruited through Amazon’s Mechanical Turk, volunteers rated one of the three SummAE models’ summaries “fluent” and “information-relevant” 80% of the time.

“The paragraph reconstructions show some coherence, although with some disfluencies and factual inaccuracies that are common with neural generative models,” wrote the coauthors. “Since the summaries are decoded from the same latent vector as the reconstructions, improving them could lead to more accurate summaries.”
Newsletter

Related Articles

TechDigits
0:00
0:00
Close
FTX's Bankman-Fried headed for jail after judge revokes bail
America's First New Nuclear Reactor in Nearly Seven Years Begins Operations
Southeast Asia moves closer to economic unity with new regional payments system
Today Hunter Biden’s best friend and business associate, Devon Archer, testified that Joe Biden met in Georgetown with Russian Moscow Mayor's Wife Yelena Baturina who later paid Hunter Biden $3.5 million in so called “consulting fees”
Google testing journalism AI. We are doing it already 2 years, and without Google biased propoganda and manipulated censorship
Musk announces Twitter name and logo change to X.com
The future of sports
TikTok Takes On Spotify And Apple, Launches Own Music Service
Hacktivist Collective Anonymous Launches 'Project Disclosure' to Unearth Information on UFOs and ETIs
Typo sends millions of US military emails to Russian ally Mali
Server Arrested For Theft After Refusing To Pay A Table's $100 Restaurant Bill When They Dined & Dashed
Democracy not: EU's Digital Commissioner Considers Shutting Down Social Media Platforms Amid Social Unrest
Sarah Silverman and Renowned Authors Lodge Copyright Infringement Case Against OpenAI and Meta
Why Do Tech Executives Support Kennedy Jr.?
The New York Times Announces Closure of its Sports Section in Favor of The Athletic
Florida Attorney General requests Meta CEO's testimony on company's platforms' alleged facilitation of illicit activities
The Poor Man With Money, Mark Zuckerberg, Unveils Twitter Replica with Heavy-Handed Censorship: A New Low in Innovation?
The Double-Edged Sword of AI: AI is linked to layoffs in industry that created it
US Sanctions on China's Chip Industry Backfire, Prompting Self-Inflicted Blowback
Meta Copy Twitter with New App, Threads
BlackRock Bitcoin ETF Application Refiled, Naming Coinbase as ‘Surveillance-Sharing’ Partner
UK Crypto and Stablecoin Regulations Become Law as Royal Assent is Granted
A Delaware city wants to let businesses vote in its elections
Alef Aeronautics Achieves Historic Milestone with Flight Certification for World's First Flying Car
Google Blocked Access to Canadian News in Response to New Legislation
French Politicians Advocate for Pan-European Regulation on Social Media Influencers
Melinda French Gates Advocates for Increased Female Representation in AI to Prevent Bias
Snapchat+ gains 4 million paying subscribers in its first year
Apple Makes History as the First Public Company Valued at $3 Trillion
Elon Musk Implements Twitter Limits to Tackle Data Scraping, but Faces Criticism for Technical Misunderstanding
EU and UK's Slow Electric Vehicle Adoption Raises Questions About the Transition to Green Mobility
Top Companies Express Concerns Over Europe's Proposed AI Law, Citing Competitiveness and Investment Risks
Meta Unveils Insights on AI Usage in Facebook and Instagram, Amid Growing Calls for Transparency
Crypto Scams Against Seniors Soar by 78% in 2022, Experts Urge Vigilance
The End of an Era: National Geographic Dismisses Last of Its Staff Writers
Shield Your Wallet: The Perils of Wireless Credit Card Theft
Harvard Scientist Who Studies Honesty Accused Of Data Fraud, Put On Leave
Putting an End to the Subscription Snare: The Battle Against Unwitting Commitments
The Legal Perils of AI: Lawyer Faces Sanctions for Relying on Fictional Cases Generated by Chatbot
ChatGPT’s "Grandma Exploit": Ingenious Hack Exposes Loophole in AI, Generates Free Software Codes
The Disney Downturn: A Near Billion-Dollar Box Office Blow for the House of Mouse
A Digital Showdown: Canada Challenges Tech Giants with The Online News Act, Meta Strikes Back
Distress in the Depths: Submersible and Passengers Missing in Titanic Wreckage Expedition
Mark Zuckerberg stealing another idea: Twitter
European Union's AI Regulations Risk Self-Sabotage, Cautions smart and brave Venture Capitalist Joe Lonsdale
Nvidia GPUs are so hard to get that rich venture capitalists are buying them for the startups they invest in
Chinese car exports surge
Reddit Blackout: Thousands of Communities Protest "Ludicrous" Pricing Changes
Nvidia Joins Tech Giants as First Chipmaker to Reach $1 Trillion Valuation
AI ‘extinction’ should be same priority as nuclear war – experts
×