Spiria logo.

Tutorial: Dynamic search with htmx, hyperscript and ProcessWire

March 17, 2022.

Using ProcessWire, you can easily create a dynamic search with very little code. This search can’t compete with engines such as Elasticsearch or Solr, of course. However, it is suitable for most “showcase” sites. Here’s how we did it on Spiria’s site using the htmx small library and its companion hyperscript.

The goal

decorative

You can try out the search for yourself just above this article.

The recipe

  1. Including htmx and hyperscript libraries (the latter is optional).
  2. A textarea-type field integrated with the page templates that we want to index.
  3. A code for indexing the existing content in the file ready.php.
  4. A search controller, which we named api.php. This controller will also become a page with the api template.
  5. A form placed in the pages that require the search.

Content indexing

Before we can program, we need to index the content to which we want to apply our search. In my proof of concept, I have developed two strategies. This is probably overkill since I am not sure of the increase in speed.

  1. Indexing for a single term search.
  2. Indexing for a multiple-term search.

To do this, we need to introduce two fields in each model we want to be indexed.

  1. The search_text field, which will contain only one occurrence of each word on a page.
  2. The search_text_long field, which will preserve all sentences without HTML tags.

This is how we place a hook in the ready.php page in this way:

<?php namespace ProcessWire;
    
pages()->addHookAfter("saveReady", function (HookEvent $event) {
    $p = $event->arguments[0];
    switch ($p->template->name) {
        case "blog_article":
            $french = languages()->get('fr');
            $english = languages()->get('default');
            $txt_en = $p->page_content->getLanguageValue($english) . ' ' . $p->blog_summary->getLanguageValue($english);
            $txt_fr = $p->page_content->getLanguageValue($french) . ' ' . $p->blog_summary->getLanguageValue($french);
            $title_en = $p->title->getLanguageValue($english);
            $title_fr = $p->title->getLanguageValue($french);
            $resultEn = stripText($txt_en, $title_en);
            $resultFr = stripText($txt_fr, $title_fr);
            $p->setLanguageValue($english, "search_text", $resultEn[0]);
            $p->setLanguageValue($english, "search_text_long", $resultEn[1]);
            $p->setLanguageValue($french, "search_text", $resultFr[0]);
            $p->setLanguageValue($french, "search_text_long", $resultFr[1]);
            break;
    }
});

And:

function stripText($t, $s)
{

    $resultText = [];
    $t = strip_tags($t);
    $t .= " " . $s;
    $t = str_replace(["\n", ",", "“", "”", "'", "?", "!", ":", "«", "»", ",", ".", "l’", "d’", "&nbsp;"], "", $t);
    //$t = preg_replace('/\?|\[\[.*\]\]|“|”|«|»|\.|!|\&nbsp;|l’|d’|s’/','',$t);
    $arrayText = explode(" ", $t);
    foreach ($arrayText as $item) {
        if (strlen(trim($item)) > 3 && !in_array($item, $resultText)) {
            $resultText[] = $item;
        }
    }
    return [implode(" ", $resultText), $t];
}

If you have the ListerPro module, it’s easy to batch-save all the pages to be indexed; then any new page you create will be indexed.

The stripText() function scrubs the text to our specifications. Note that, in my example, I make the distinction between French and English. This little algorithm is entirely perfectible! I have noted a shorter way to clean up the text, though at the expense of ease of comprehension.

$t = preg_replace('/\?|\[\[.*\]\]|“|”|«|»|\.|!|\&nbsp;|l’|d’|s’/','',$t);

As I mentioned before, it’s probably unnecessary to create two search fields. Most important thing would be to optimize the text as much as possible, since so many short words serve no purpose. The current code restricts us to words longer than three characters, which is tricky in a computing context such as our site where words like C#, C++ and PHP compete with the, for, not, etc. That said, perhaps this optimization is superfluous in the context of a single-content search and limited in number.

So now let’s see the process and the research code.

The structure

decorative

This graphic is a classic and needs no explanations. The htmx library makes a simple Ajax call.

The form

decorative

  1. The form has a get method that sends us back to a conventional search page when the user presses the enter key.
  2. A hidden field with a secret key generated on the fly enhances security.
  3. The third field is the input field involved in the dynamic search. It has an htmx syntax. The first command, hx-post, indicates how data is sent to the API – a post in this case. htmx handles events on any DOM element. So, for example, we could have several calls on different elements of a form.
  4. The second line indicates where the API response will be sent, in this case div#searchResult below the form.
  5. The hx-trigger command describes the context of the dispatch to the API: when the user releases a key and with a delay of 200 ms between each occurrence.
  6. The hx-indicator command is optional. It signals to the user that something is underway. In our example, the #indexsearch image (point 9) is displayed. htmx automatically handles this.
  7. The _=on command comes from the hyperscript syntax. It adds a class to the #screenWindow division.
  8. We can add other parameters to the search using the hx-vals command. In our simplified example, we send the search language.
  9. This is an optional image. htmx controls its appearance.
  10. The last command is hyperscript again. It removes the contents of the search when we click outside this area.
  11. Finally, this is coupled with the #screenWindow division’s behavior. Note how the simple the syntax is.

This example clearly shows that no javascript is called, except for the htmx and hyperscript libraries. It is worth visiting these two libraries’ websites to understand their methodology and potential.

The Search API

The API resides in a normal ProcessWire page. Although it is published, it remains "hidden" from CMS searches. Several requests to the CMS are gathered in this type of page where requests can be answered and the correct functions called.

<?php namespace ProcessWire;

$secretsearch = session()->get('secretToken');
$request = input()->post();
$lang = sanitizer()->text($request["lang"]);

if (isset($request['CSRFTokenBlog'])) {
    if (hash_equals($secretsearch, $request['CSRFTokenBlog'])) {
        if (!empty($request["search"])) {
            echo page()->querySite(sanitizer()->text($request["search"]),$lang);
        }
    } else {
        echo __("A problem occurred. We are sorry of the inconvenience.");
    }
}
exit;

In this case :

  1. We extract the secret token for the session, which will be created in the search-form page.
  2. We then process everything that is in the post query. Remember that this is a simplified example.
  3. We compare the token with the one received in the query. If all goes well, we call the SQL query. Our example uses a class residing in site/classes/ApiPage.php; it can therefore be directly called with page(). Any other strategy is valid.

The following code represents the core of the process:

<?php namespace ProcessWire;

public function querySite($q, $l)
    {
        $this->search = "";
        $this->lang = $l == 'en' ? 'default' : 'fr';
        user()->setLanguage($this->lang);
        $whatQuery = explode(" ", $q);
        $this->count = count($whatQuery);
        if ($this->count > 1) {
            $this->search = 'template=blog_article,has_parent!=1099,search_text_long~|*=  "' . $q . '",sort=-created';
        } elseif (strlen($q) > 1) {
            $this->search = 'template=blog_article,has_parent!=1099,search_text*=' . $q . ',sort=-created';
        }
        if ($this->search !== "") {
            $this->result = pages()->find($this->search);
            return $this->formatResult();
        }
        return "";
    }

protected function formatResult()
    {
        $html = '<ul id="found">';
        if (count($this->result) > 0) {
            foreach ($this->result as $result) {
                $html .= '<li><a href="' . $result->url . '">' . $result->title . '</a></li>';
            }
        } else {
            $html .= __('Nothing found');
        }
        $html .= '</ul></div>';
        return $html;
    }

The formatResult() function is simple to understand, and this is where the ul#found div appears, which gets deleted by the hyperscript line of the form.

_="on click from elsewhere remove #found"

No need to add CSS to display the result in the current code. It is invisible at first because it is placed in an empty #searchResult div. But when the search result fills it, everything becomes accessible as the CSS targets the ul#found list and not its parent.

Conclusion

The purpose of this article was to experiment with htmx and hyperscript. I was just scratching the surface of the libraries under construction. The search as described is perfectible and sometimes shows its limitations. There are so many possible combination strategies that advanced search options should eventually be proposed. This could be the subject of another article.

The haiku placed at the end of the introduction page of htmx is very fitting:

javascript fatigue:
longing for a hypertext
already in hand

Finally, SearchEngine is an excellent search module for ProcessWire, which coexists very well with the code described here.