Streaming OpenAI Responses in Laravel with Server-Sent Events (SSE)

When working with OpenAI in your Laravel project, implementing a streaming response can improve the user experience. The LLM predicts the next token based on given words or prompts, which requires intensive computation by the GPT model.

That's why it will take some time for GPT to finish generating the text. Instead of displaying a blank screen while the user waits for GPT to finish generating the text, we can enhance it by directly displaying the generated text to the user. Here's example how it look like:

What is Server-Sent Events (SSE)?

When working with real-time applications in Laravel, there are three options:

1. Long polling

Long polling is a technique used in real-time applications to establish a persistent connection between a client and a server. Instead of the client continuously sending requests to the server, it sends a request and the server holds the response until there is new data available or a timeout occurs. This allows the server to push updates to the client as soon as they are available, creating a near-real-time experience.

2. Web Sockets

Web Sockets are a communication protocol used in real-time applications to establish a bidirectional, persistent connection between a client and a server. Unlike traditional HTTP requests, which are stateless, Web Sockets enable ongoing, low-latency communication by keeping the connection open. This allows the server to send data to the client whenever updates occur, and vice versa.

3. Server-sent events

Server-Sent Events (SSE) is a technology used in real-time applications to enable the server to push updates to the client over a single, long-lived HTTP connection. With SSE, the client establishes a connection with the server and receives a continuous stream of events. The server can send data updates as individual events, which the client can handle in real-time. SSE provides a simple and lightweight solution for real-time communication, allowing for server-initiated updates without the need for continuous polling. It is particularly useful for applications that require real-time data updates, such as chat applications, live feeds, or real-time monitoring systems.

Which one do we need to choose?

Creating a real-time application with long polling is not a good solution because it requires too much logic to be handled on both the client and server. Therefore, it is not a good option. The second option, Web Sockets, is not quite suitable in this situation because we do not need interactive communication. In this case, we are only listening to the generated token from GPT. The last option, server-sent events, is the perfect option because the server will send the response to the client.

Setting Up Laravel Project with OpenAI

Let's create new Laravel project with this command.

laravel new laravel-openai-streaming

Add laravel openai package:

composer require openai-php/laravel --with-all-dependencies

Now publish the configuration file:

php artisan vendor:publish --provider="OpenAI\Laravel\ServiceProvider"

Don't forget to add this env to your environment .env file:

OPENAI_API_KEY=sk-...

Now add this html in welcome.blade.php to display text from our sse.

<section class="...">
  <div class="...">
    <div class="...">
      <div class="...">
        <div class="...">
          <div class="...">
            <div>
              <p class="...">Laravel Streaming OpenAI</p>
              <p class="...">
                Streaming OpenAI Responses in Laravel with Server-Sent Events
                (SSE).
                <a class="..." href="...">Read tutorial here</a>
              </p>
              <p id="question" class="..."></p>
              <p id="result" class="..."></p>
            </div>
            <form id="form-question" class="...">
              <input
                required
                type="text"
                name="input"
                placeholder="Type your question here!"
                class="..."
              />
              <button type="submit" href="#" class="...">
                Submit
                <span aria-hidden="true"> → </span>
              </button>
            </form>
          </div>
        </div>
      </div>
    </div>
  </div>
</section>

Listening Server-Sent Event with JavaScript

Now let's handle form submit and then listen for the server-sent event.

<script>
  const form = document.querySelector("form");
  const result = document.getElementById("result");

  form.addEventListener("submit", (event) => {
    event.preventDefault();
    const input = event.target.input.value;
    if (input === "") return;
    const question = document.getElementById("question");
    question.innerText = input;
    event.target.input.value = "";

    const queryQuestion = encodeURIComponent(input);
    const source = new EventSource("/ask?question=" + queryQuestion);
    source.addEventListener("update", function (event) {
      if (event.data === "<END_STREAMING_SSE>") {
        source.close();
        return;
      }
      result.innerText += event.data;
    });
  });
</script>

Calling Server-Sent Events is easy. We can use the EventSource class as follows: const source = new EventSource("/ask?question=" + queryQuestion);. We pass the request body with a query parameter, in this case, we use the question query parameter to send the prompt to the backend.

To listen for real-time updates, we can use source.addEventListener("update", (e) => {}). We can name the event anything, but make sure it matches the event name sent from the server and in this case we use update.

One more important thing we need to handle is to clone connection with the SSE endpoint. In this case we will simply close the connection when the server sending "<END_STREAMING_SSE>" string to the client.

Creating Server-Sent event in Laravel

Let's create new controller called AskController, using this command:

php artisan make:controller AskController

Then register the controller to routes/web.php:

<?php

use App\Http\Controllers\AskController;
use Illuminate\Support\Facades\Route;

Route::get('/', function () {
    return view('welcome');
});

Route::get("/ask", AskController::class);

Since the controller don't specify the method name we will use __invoke method to listen any incoming request.

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use OpenAI\Laravel\Facades\OpenAI;

class AskController extends Controller
{
    public function __invoke(Request $request)
    {
        $question = $request->query('question');
        return response()->stream(function () use ($question) {
            $stream = OpenAI::chat()->createStreamed([
                'model' => 'gpt-3.5-turbo',
                'temperature' => 0.8,
                'messages' => [
                    [
                        'role' => 'user',
                        'content' => $question
                    ]
                ],
                'max_tokens' => 1024,
            ]);

            foreach ($stream as $response) {
                $text = $response->choices[0]->delta->content;
                if (connection_aborted()) {
                    break;
                }

                echo "event: update\n";
                echo 'data: ' . $text;
                echo "\n\n";
                ob_flush();
                flush();
            }

            echo "event: update\n";
            echo 'data: <END_STREAMING_SSE>';
            echo "\n\n";
            ob_flush();
            flush();
        }, 200, [
            'Cache-Control' => 'no-cache',
            'X-Accel-Buffering' => 'no',
            'Content-Type' => 'text/event-stream',
        ]);
    }
}

Nginx Config

When deploying your service with Nginx, it is crucial to configure your Nginx configuration file correctly. Specifically, you should unset the Connection header and set proxy_http_version to 1.1.

Here's example config for this example project.

location ^~ /ask$ {
    proxy_http_version 1.1;
    add_header Connection '';

    fastcgi_pass unix:/var/run/php/php8.1-fpm.sock;
    fastcgi_index index.php;
    fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
    include fastcgi_params;
}

Adjusting PHP Configuration for Herd on macOS

When using Herd for your PHP binary installation on macOS, it's important to ensure that your php.ini configuration is appropriately set to prevent issues related to buffer flushing. In some cases, the configuration may default to output_buffering=0, causing potential complications.

To mitigate this, consider adjusting your php.ini file as follows:

output_buffering = 4096

Source: github/herd-community#137

Conclusion

Implementing server-sent events (SSE) in Laravel can significantly enhance the user experience when working with OpenAI models. By using SSE, we can stream the generated text from the OpenAI GPT model to the user in real-time, eliminating the need for the user to wait for the entire response to be generated before displaying any content. This improves the perceived speed and interactivity of the application.

We then provided a step-by-step guide on setting up a Laravel project with OpenAI integration. We demonstrated how to handle form submissions and listen for server-sent events using JavaScript. Additionally, we created a server-side controller in Laravel to generate the GPT response and stream it to the client using SSE.

By following the steps outlined in this article, hopefully you can effectively implement streaming OpenAI responses in Laravel using server-sent events, providing a seamless and interactive experience for users interacting with AI-powered features in their applications.