How Workflows and the Gemini Model Can Summarise Anything

Large language model (LLM) use cases can be automated and coordinated by server less execution engines like Google Cloud's Workflows, which is why generative AI is currently a hot issue among developers and business stakeholders. They recently talked about coordinating Vertex AI's PaLM and Gemini APIs through workflows. They provide an example of a particular use case for long-document summarising using workflows that has wide applicability in their blog post.

Large Language Model Gemini

Open-source LLM orchestration frameworks like LangChain (for Python and TypeScript developers) and LangChain4j (for Java developers) connect many components including LLMs, document loaders, and vector databases to perform complex tasks like document summarising. Because of this, using workflows in an LLM orchestration system doesn't necessitate a significant time investment.

Methods for Recapitulating

It is as easy as typing a brief document's content in its whole as a prompt into the LLM context box to summarise it. Large language model prompts, however, are usually token-count limited. Longer documents require an alternative approach. There are two common approaches:

Map/reduce

A long document is broken up into smaller sections to fit the context window. Each part has a summary, and as a final stage, an overview of all the summaries is written.

Iterative enhancement

Google Cloud uses an individual document evaluation process, similar to the map/reduce method. Up until the very end of the text, the first portion is summarised, and the LLM iteratively improves it using data from the following section, and so forth.

Both strategies yield very good results. There is one advantage that the map/reduce approach has over the refining method, though. Refinement is a step-by-step process where the next section of the document is summarised using the previously improved summary.

As seen in the image below, you may use map/reduce to create a final summary in the last phase (the "reduce" action) and a summary for each segment in parallel (the "map" operation). This is faster than the sequential method.

Gemini Models on Google

A while back, they showcased one of Workflows' main features concurrent step execution while demonstrating how to utilise it to call PaLM and Gemini models. Google can simultaneously generate summaries of the long document sections thanks to this function.

The workflow begins when a new text document is added to a Cloud Storage bucket.
The text file is broken up into "chunks" that are summarised one after the other.
In the final step of summarising, all of the smaller summaries are collected and merged into a single summary.
All calls to the Gemini 1.0 Pro model are made possible by a subworkflow.
obtaining the text file while simultaneously compiling a summary of the pertinent sections (the "map" component).

In the assign file vars phase, a few data structures and constants are prepared. In this case, Google Cloud set the chunk size to 64,000 characters in order to comply with Workflow's memory limits and fit the text into the LLM's context window. There are other variables for the lists of summaries and one for the final summary.

Before the loop over chunks step eliminates each text chunk concurrently, the dump file content sub-phase loads every piece of the document from Cloud Storage. The Gemini model is then used to activate the subworkflow in create chunk summary, which summaries that section of the document. Finally, the summary is stored in the summaries array.

An outline of summaries (the part marked "reduce")

Now that they have all of the chunk summaries, they may integrate the smaller summaries to create a final summary of summaries, often known as an aggregate summary:Concat summaries are created by concatenating each chunk summary. In the reduction summary step, they issue a final call to the Google Cloud Gemini model summarization subworkflow in order to retrieve the final summary. Additionally, they provide the findings in return, which include the chunk summaries and the final summary.

Requesting summaries from the Google cloud "map" and "reduce" processes both start a subworkflow that records calls using Gemini models. Let's take a closer look at this final stage of their procedure: Init, they configure a few variables for the intended LLM setup (Gemini Pro in this case).

During the call gemini phase, they send an HTTP POST request to the model's REST API. Notice that we may declaratively authenticate to this API by only giving the OAuth2 authentication method. They pass the maximum length of the summary that can be generated in the body, as well as the prompt for a summary and some model parameters (such temperature).

The summary that follows

Saving the text of Jane Austen's "Pride and Prejudice" into a Cloud Storage bucket is the first step in the procedure. This results in the preliminary and thorough summaries that follow:

Mercury Models

For the purposes of this post, they kept the workflow simple, although there are other ways it may be improved. For example, they hard-coded the total character count for each section summary, but it could be derived from the model's context-window limit or even be a process parameter.

Workflows themselves have a memory limit for the variables and data they keep in memory, thus they might handle scenarios where an extraordinarily large list of document section summaries wouldn't fit in memory. Not to be missed is Gemini 1.5, the most recent large language model from Google Cloud, which can take in up to one million tokens and summarise a long document in a single run. Of course, you may also utilise an LLM orchestration framework, but as this example demonstrates, Workflows can also handle some interesting use cases for LLM orchestration.

To put it briefly

In this research, they explored a new use case for coordinating LLMs with Workflows and constructed a long document summarising exercise without the use of a specific LLM framework. By utilising Workflows' parallel step features to produce section summaries simultaneously, they were able to reduce the amount of time needed to develop the overall summary.

News source: Gemini Model