A 4-Step Guide on How to Build an AI-based Text to Audio using LLMs with RAGs

How to build an AI-based Text-Speech News Summary using LLMs with RAGs

Mukundan Sankar

Sep 26, 2024

Introduction: Why Should You Care?

Understanding the Rise of AI
Why Convert News Articles to Audio with RAG?

2. Who Is This Guide For?

Beginners in Data Science and AI
RAG and Its Simplicity

3. What is Retrieval-Augmented Generation (RAG)?

Breaking Down RAG: The Retriever and Generator
Why RAG is Relevant to AI and Data Science Enthusiasts

4. Why News? A Practical Application for RAG

My Experience with News Data
Leveraging RAG for News Summarization and Audio Conversion

5. The 4-Step Guide to Convert News to Audio

Step 0: Importing Libraries
Step 1: Fetching the News
Step 2: Extracting Key Information
Step 3: Converting Text to Audio
Step 4: Implementing Retrieval-Augmented Generation

6. The Benefits of Using RAG

Accuracy and Relevance in AI-Generated Content
Time-Saving Automation
Audio Accessibility for Learning on the Go

7. Getting Started with RAG for Your Projects

Setting Up Your Retriever and LLM
Automating Your Workflow

8. Conclusion: Why RAG is the Future of AI Content Generation

The Power of Combining Search and Intelligence
Resources for Further Learning

Introduction: Why Should You Care?

Understanding the Rise of AI

We live in the age of AI. Over the last 2 years, since the advent of Open AI, thanks to Sam Altman and his team, the world has moved at a rapid pace in the field of Artificial Intelligence. Amazon’s Alexa and Google Home Assistant devices had been there prior to that but they didn’t give us visibility into how they made a little circular or square box talk. OpenAI solved this problem by helping us understand how to create everything and anything. ChatGPT has become the new Google Search, Google Assistant, Alexa or Siri. With all of this super technology advancement available to us, why should you care to convert text news to audio using an advanced technique?

Why Convert News Articles to Audio with RAG?

Well, the idea is to teach you to Do It Yourself (#DIY). Why don’t you want to learn how to do something yourself and perhaps use your creative imagination to take it to the next level? You can think of other use cases with RAGs, and other problems to solve once you learn how to convert text to audio. With that said, let’s go into who this is for and how we convert Text news to audio using RAGs.

Who is it for?

Beginners in Data Science and AI

Whether you’re new to Data Science and AI or have been dabbling for a while, you have come to the right place. Join me as we dive into something super cool: Retrieval-Augmented Generation (RAG).

I know hearing terms like RAG, AI, or Large Language Models (LLMs) can sound intimidating, but trust me: You’ve got this! I’m learning this stuff alongside you, and I will help break it all down together. We’ll go step-by-step and have some fun with it.

RAG and Its Simplicity

RAG is a powerful tool that will make your data science journey easier, and the best part? It’s way more beginner-friendly than you think!

The ever-evolving landscape of Data Science and AI is making significant strides in automating processes, extracting valuable insights, and offering efficient solutions to complex problems. One of the latest trends in this space is Retrieval-Augmented Generation (RAG). This approach is gaining traction due to its potential to merge the strengths of large language models (LLMs) with document retrieval systems.

In this post, we’ll explore how RAG can be applied to transform news articles into valuable insights and even convert them into audio formats, making it a fantastic tool for Data Science enthusiasts and AI beginners alike.

What is Retrieval-Augmented Generation (RAG)?

Before I dive deeper into its application, let’s first understand what RaG is and why it’s essential for anyone getting started with Data Science or AI.

Breaking Down RAG: The Retriever and Generator

According to Amazon Web Services, RAG is an optimization process for a Large Language model. RAG is a hybrid system that combines:

A Retriever: A tool to fetch relevant information from a large corpus or a set of documents based on a query.
A Generator: A large language model (like GPT-4) that synthesizes the retrieved information into a coherent response, summary, or insight.

In essence, RAG ensures that AI-generated content is relevant and grounded in factual data, making it a highly accurate method for producing insights, summaries, or even converting text to audio.

Why RAG is Relevant to AI and Data Science Enthusiasts

If you are a Data Science or AI beginner, you might wonder: Why should I care about RAG?

Here’s why:

Accurate Information: RAG ensures that content is not only generated by AI but also supported by relevant data sources, reducing hallucinations (AI-generated inaccuracies).
Data-Driven Insights: You can leverage RAG to extract specific insights from vast amounts of data. For example, instead of manually reading through news articles, RAG fetches and generates summaries based on specific queries.
Time-Saving: RAG automates tedious tasks like summarizing long documents or generating reports, freeing you up to focus on analysis and decision-making.
Accessibility: Through audio conversion, RAG makes complex data more accessible, allowing you to listen to important insights on the go.

Why News? A Practical Application for RAG

My Experience with News Data

If you know about me, you may know I write a lot about extracting news data and you can see some of my previous work here:

How to get the Daily news using Python
Getting the news using NewsAPI in Python and generating word clouds to understand the author’s opinion about a topic.towardsdatascience.com

In this step-by-step guide, I show you how to use Python and the NewsAPI to fetch daily news articles, analyze them, and visualize trends using word clouds, helping you build your news aggregator and stay informed efficiently. By leveraging libraries like pandas, BeautifulSoup, and wordcloud, I walk you through automating news retrieval, parsing article content, and even setting up notifications with Twilio or SendGrid.

How to Extract Daily News Headlines and Convert to Audio Using Python
towardsdev.com

In this blog post, I walk you through a step-by-step guide on how to extract daily news headlines from CNN’s website and convert them to audio using Python. Using web scraping tools like requests and BeautifulSoup, I show how to extract article details with newspaper3k and then convert the news text into audio using gTTS.

Leveraging LLMs for Efficient News Summarization: A Hugging Face and Python Guide
Efficient News Summarization Using LLMs, Hugging Face, and Python: Discover how to efficiently summarize news articles…blog.gopenai.com

In this post, I explain how to use Large Language Models (LLMs) from Hugging Face to efficiently summarize news articles using Python. By leveraging models like BART, I guide you through setting up a summarization pipeline that extracts key information from news content, helping you stay informed without reading lengthy articles.

Leveraging RAG for News Summarization and Audio Conversion

Returning to the problem we are trying to solve here, let’s walk through the 4-step process of how I used RAG to turn a set of news articles from CNN into meaningful insights — this will help demonstrate how you can apply RAG in your data science projects.

The 4-Step Guide to Convert News to Audio

Step 0: Import relevant Libraries

This has to be a step 0, kind of a pre-requisite for any Data Science or AI related project:

import requests
from bs4 import BeautifulSoup
from newspaper import Article
from gtts import gTTS
import os
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import openai
import nltk
nltk.download('punkt')

Step 1: Fetching the News

First, I built a simple Python script to scrape news articles from the news website’s homepage. Using tools like BeautifulSoup and requests, I retrieved the latest articles. This step is crucial because the retriever in RAG needs access to a corpus of documents to function.

While I have used CNN here, you can just about use another website supported by the newspaper3k library.

def get_cnn_homepage():
    url = "https://www.cnn.com"
    response = requests.get(url)
    return response.text

Step 2: Extracting Relevant Information

After retrieving the articles, I used the newspaper3k library to extract key information like the title, link, author, publication date, and the full text of each article. The goal was to create a dataset of recent news articles that could be used for further analysis.

def extract_article_links(html):
    soup = BeautifulSoup(html, 'html.parser')
    article_links = []
    for link in soup.find_all('a', href=True):
        href = link['href']
        if href.startswith('/') and '/2024/' in href:  # Filter for recent articles
            full_url = f"https://www.cnn.com{href}"
            if full_url not in article_links:
                article_links.append(full_url)
    return article_links[:5]  # Limit to 5 articles

# Step 2: Extract Article Details
def extract_article_details(url):
    article = Article(url)
    article.download()
    article.parse()
    return {
        'title': article.title,
        'text': article.text
    }

Step 3: Converting Text to Audio

One of the standout features of this approach is the ability to convert the generated summaries into audio. Using the gTTS (Google Text-to-Speech) library, I transformed the summaries into audio files, making it easier for users to consume the insights while on the go.

This makes the RAG-generated content not only informative but also highly accessible — perfect for data enthusiasts who prefer learning while multitasking.

# Step 3: Text to Speech Conversion
def text_to_speech(text, filename):
    tts = gTTS(text=text, lang='en')
    tts.save(filename)
    return filename

Step 4: Retrieval-Augmented Generation

Next, I implemented the retrieval part of RAG using FAISS, a tool developed by Facebook AI for efficient similarity search. FAISS helped me index the articles and retrieve the most relevant ones based on user queries, such as “latest trends in AI” or “how automation affects business.”

I then used GPT-4 to generate summaries of the retrieved articles, ensuring that the content was grounded in factual news sources.

def index_articles(articles):
    model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
    texts = [article['text'] for article in articles]
    embeddings = model.encode(texts, convert_to_tensor=False)
    index = faiss.IndexFlatL2(embeddings.shape[1])
    index.add(np.array(embeddings))
    return index, texts

def retrieve_articles(query, index, texts):
    model = SentenceTransformer('paraphrase-MiniLM-L6-v2')
    query_embedding = model.encode([query], convert_to_tensor=False)
    D, I = index.search(np.array(query_embedding), 5)
    retrieved_articles = [texts[i] for i in I[0]]
    return retrieved_articles

def generate_summary(retrieved_articles):
    openai.api_key = 'your_openai_api_key'  # Replace with your OpenAI API key
    prompt = "Summarize the following news articles:\n"
    for article in retrieved_articles:
        prompt += f"\nArticle: {article}\n"
    response = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=500
    )
    return response['choices'][0]['text']

And finally, the Main Function to Run the Program

def main():
    print("Fetching CNN homepage...")
    homepage_html = get_cnn_homepage()
    print("Extracting article links...")
    article_links = extract_article_links(homepage_html)

    print(f"Found {len(article_links)} articles. Extracting details...")
    articles = []
    for link in article_links:
        article_details = extract_article_details(link)
        articles.append(article_details)
        print(f"Extracted: {article_details['title']}")

    print("Indexing articles...")
    index, texts = index_articles(articles)

    query = input("Enter a topic or keyword for news retrieval: ")
    retrieved_articles = retrieve_articles(query, index, texts)
    print(f"Found {len(retrieved_articles)} relevant articles.")

    print("Generating summary...")
    summary = generate_summary(retrieved_articles)
    print("Summary:")
    print(summary)

    # Convert summary to audio
    filename = "news_summary.mp3"
    text_to_speech(summary, filename)
    print(f"Summary converted to audio: {filename}")

if __name__ == "__main__":
    main()

The Benefits of RAG for Data Science and AI Beginners

1. Combining Search with Intelligence

RAG allows you to ask specific questions — just like you would on a search engine — but instead of simply retrieving documents, it synthesizes the information into a cohesive and insightful answer. This is incredibly useful for those just starting in Data Science and AI, where learning requires understanding complex concepts.

2. Real-Time Insights

Unlike traditional content generation methods that rely solely on pre-existing models, RAG updates itself based on the latest data it retrieves. This means your insights are always fresh and relevant, which is crucial in fast-evolving fields like Data Science and AI.

3. Audio Accessibility

Converting content into audio using RAG makes it accessible to a broader audience. Whether you’re commuting, working out, or just need a break from the screen, audio summaries allow you to continue learning effortlessly.

To see the complete code, you can visit my Github.

How You Can Implement RAG in Your Projects!

Whether you’re working on a personal data science project or looking to incorporate AI into a larger business process, RAG can add immense value. Here’s a simple guide to get started:

Choose Your Retriever: Use tools like FAISS or Elasticsearch to index and search your document corpus.
Leveraging an LLM: GPT-4 or any other large language model will help you generate insightful summaries or content from the retrieved data.
Automate Your Pipeline: Integrate RAG into a pipeline that can fetch new data, retrieve relevant documents, and generate summaries on demand.

Conclusion: Why RAG is the Future

Retrieval-Augmented Generation is more than just a buzzword; it’s a transformative technology that makes AI-generated content more reliable, relevant, and useful. This opens up a world of possibilities for Data Science and AI beginners. From news extraction to audio conversion, RAG can help you streamline your workflow, gain insights faster, and stay informed — all while ensuring that your data-driven decisions are grounded in the latest information.

If you’re just getting started in Data Science and AI, RAG is a powerful tool you don’t want to miss! Here are some other useful links you can use to learn more about RAGs:

freeCodeCamp offers a detailed tutorial on building RAG from scratch. This includes indexing, retrieval, and generation techniques. The course covers advanced topics such as RAG fusion, query translation, and adaptive RAG, making it ideal for those looking to apply RAG in real-world projects :
NVIDIA’s Technical Blog provides an insightful guide on how to build and deploy RAG pipelines. It explains key concepts such as document ingestion, embedding generation, and real-time query processing. NVIDIA also offers examples and public resources to help fast-track your RAG development.
Nexla explores practical use cases for RAG in industries like healthcare, customer service, and financial analysis. The guide compares RAG to model fine-tuning and highlights its benefits for handling real-time data and improving the contextual relevance of AI-generated responses.

Bonus!

Do you also want to know about how you can apply Machine Learning to your Business? Here is a Free 5-day Email Course that teaches how Machine Learning Can Transform Your Business. Discover practical ways to leverage machine learning to grow your business, improve efficiency, and drive results.

Free 5-Day Email Course: How Machine Learning Can Transform Your Business!

Data & AI with Mukundan

Discussion about this post