Building PHP Extension with Rust for Fun and Profit

Rust is one of the best languages that you can learn no matter where you are coming from, and one of the things that I love is that you can use Rust to write plugins or extensions for your language. For example, in today's topic, we are going to create a PHP extension with Rust.

What is PHP Extension?

PHP Extension is a library or plug-in that provides a function that can be used in your php application. You might not realize if we are writing php application we might using a plugin. For example if you are using Laravel you will use pdo_mysql plugin to connect with mysql database.

And the good news is you can write your own PHP extension, before rust most people write their php extension using C or C++. But now we can use Rust to write our own PHP Extension.

PHPER

In today's topic we will write PHP Extension to read PDF files. To do that we will use rust crate called phper. PHPer (PHP Enjoy Rust) is a framework that allows developers to write PHP extensions using Rust.

There are several rust crate that we can use to build a PHP Extension, for example there is also ext-php-rs crate that provide us the lightweight framework to create php extension. With this library we can build the extension for mac, linux and windows so you might want to consider this library if you are going to support that platform.

But for now we will use phper and only support mac and linux platform.

Building PHP Extension

Let's create new Rust project using cargo.

cargo new php-pdf --lib

Then add lopdf and phper library to our Cargo.toml file:

[lib]
crate-type = ["lib", "cdylib"]

[dependencies]
lopdf = { version = "0.31.0", features = ["pom", "pom_parser"] }
phper = "0.12.0"

Then create file build.rs and setup build config for macos.

fn main() {
    #[cfg(target_os = "macos")]
    {
        println!("cargo:rustc-link-arg=-undefined");
        println!("cargo:rustc-link-arg=dynamic_lookup");
    }
}

To create php function extension we can just simply register a module like this.

#[php_get_module]
pub fn get_module() -> Module {
    let mut module = Module::new(
        env!("CARGO_CRATE_NAME"),
        env!("CARGO_PKG_VERSION"),
        env!("CARGO_PKG_AUTHORS"),
    );

    module
        .add_function("some_function_extension", some_function_extension)
        .argument(Argument::by_val("name"));

    module
}

The PHP PDF Extension

So the extension it self will do a simple thing. We will provide 3 function.

  • php_pdf_get_page_size: To get the total page number.
  • php_pdf_read_page: To read pdf text by the given page number.
  • php_pdf_read_all: To read all pdf text documents and stored in array.

If you haven't followed my blog before, you can read this blog post about using the lopdf library to extract text from a PDF document. This time, we will simply copy and paste the code from there.

use lopdf::Document;
use phper::{
    arrays::{InsertKey, ZArray},
    functions::Argument,
    modules::Module,
    php_get_module,
    values::ZVal,
};

fn php_pdf_read_all(arguments: &mut [ZVal]) -> phper::Result<ZArray> {
    let path = arguments[0].expect_z_str()?.to_str()?;
    let doc: Result<Document, lopdf::Error> = Document::load(path);
    match doc {
        Ok(document) => {
            let pages = document.get_pages();
            let mut texts = ZArray::new();

            for (i, _) in pages.iter().enumerate() {
                let page_number = (i + 1) as u32;
                let text = document.extract_text(&[page_number]);
                texts.insert(
                    InsertKey::Index(page_number as u64),
                    text.unwrap_or_default(),
                );
            }
            Ok(texts)
        }
        Err(err) => Err(phper::Error::Boxed(err.into())),
    }
}

fn php_pdf_read_page(arguments: &mut [ZVal]) -> phper::Result<String> {
    let path = arguments[0].expect_z_str()?.to_str()?;
    let page = arguments[1].expect_long()?;
    let doc = Document::load(path);
    match doc {
        Ok(document) => {
            let pages = document.get_pages();
            if page >= pages.len() as i64 {
                return Err(phper::Error::Boxed("invalid page number".into()));
            }
            let text = document.extract_text(&[page as u32]);
            Ok(text.unwrap().to_string())
        }
        Err(err) => Err(phper::Error::Boxed(err.into())),
    }
}

fn php_pdf_get_page_size(arguments: &mut [ZVal]) -> phper::Result<i64> {
    let path = arguments[0].expect_z_str()?.to_str()?;
    let doc = Document::load(path);
    match doc {
        Ok(document) => {
            let pages = document.get_pages();
            Ok(pages.len().try_into().unwrap())
        }
        Err(err) => Err(phper::Error::Boxed(err.into())),
    }
}

#[php_get_module]
pub fn get_module() -> Module {
    let mut module = Module::new(
        env!("CARGO_CRATE_NAME"),
        env!("CARGO_PKG_VERSION"),
        env!("CARGO_PKG_AUTHORS"),
    );

    module
        .add_function("php_pdf_read_all", php_pdf_read_all)
        .argument(Argument::by_val("path"));

    module
        .add_function("php_pdf_read_page", php_pdf_read_page)
        .argument(Argument::by_val("path"));

    module
        .add_function("php_pdf_get_page_size", php_pdf_get_page_size)
        .argument(Argument::by_val("path"));

    module
}

Building the Library

To build debug mode you can just simply run this command.

cargo build

And then rename the binary name:

cp target/debug/libphp_pdf.dylib target/debug/libphp_pdf.so

And you can run the debug library like this.

php -d "extension=target/debug/libphp_pdf.so" test.php

Building the Release Version

To build the production release, follow these steps:

1. Execute the following command in your terminal:

cargo build --release

This command compiles the code with optimizations for a release build.

2. After the build process is complete, rename the generated library file:

cp target/release/libphp_pdf.dylib target/release/libphp_pdf.so

This command ensures that the library has the appropriate file extension for your system.

3. Next, navigate to your extension directory. For example, if your extension directory is located at /opt/homebrew/lib/php/pecl/20210902, run the following command:

cp target/release/libphp_pdf.so /opt/homebrew/lib/php/pecl/20210902

This command copies the extension library to the specified directory.

4. Finally, enable the extension configuration in your php.ini file. Open the file and add the following line:

extension=/opt/homebrew/lib/php/pecl/20210902/libphp_pdf.so

This line instructs PHP to load the libphp_pdf extension.

Usage

After that we are ready to go and here's example how to use our php pdf extension.

<?php

$size = php_pdf_get_page_size('2303.12712.pdf');
echo $size;

$text = php_pdf_read_page('2303.12712.pdf', 100);
echo $text;

$texts = php_pdf_read_all('2303.12712.pdf');
foreach ($texts as $text) {
    echo $text;
}