Recently I wanted to add a dynamic thesaurus in a medical videos information system. Videos are hosted at Vimeo, sometimes with access rights. End user must go through the system to view all the videos.

We would that end users to be able to navigate from one video to another according to suggestions from common keywords, visibles or not. In order to increase the average session duration on the website.

Administrators are medical professionals, they add scientific content with references potentially very detailed, and impossible to guess exhaustively.

So to organize the videos it was necessary to allow administrators to enter free keywords, in a simple text field. Nothing incredible so far:

This is enough to enrich user searches. But to gracefully link the videos between them, the keywords entered must feed a formatted thesaurus, which can be used with conventional SQL. Otherwise you will face with limitations of use, or need to multiply requests to display few links, with error risks.

To be happy I had to develop this process myself. With a little organization we get something very standard and customizable. The concept can be reproduced under any SQL infrastructures, and is suitable for large volumes of data.

That's why I allow myself the following experience feedback. You can see the live website, with its suggested contents here:

https://wosiam.org (Home or Videos tab, in any video page)

The input

Create a text field to receive keywords in the table storing records to reference.

Your script will handle the usual grumpy users mistakes, but just for their education add some rules as:

  • Not empty
  • 5 keywords maximum
  • Each word separated with a comma
  • Special characters, quote and semicolon prohibited

The values entered will not change in the corresponding table. But they will undergo two treatments outside. A first in the background, upstream in a parallel table - then a second, directly in the page display queries.

Upstream data processing

The 1st keywords entered in each record can be retrieved with a SQL substring sequence:

SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 1), ',', -1) AS unique_keyword
FROM my_table

Get also the attributes that interest you, which will be used to create links later.

The 2nd keyword can be retrieved as follows:

SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 2), ',', -1) AS unique_keyword
FROM my_table

And so on.

There is more to create a table cumulating all unique keywords (we said 5 maximum) for each records:

DROP TABLE IF EXISTS my_unique_keywords ;
 
CREATE TABLE my_unique_keywords AS
 
SELECT DISTINCT id_related, title, category, date_publish, unique_keyword FROM (
 
SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 1), ',', -1) AS unique_keyword
FROM my_table
 
UNION SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 2), ',', -1)
FROM my_table
 
UNION SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 3), ',', -1)
FROM my_table
 
UNION SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 4), ',', -1)
FROM my_table
 
UNION SELECT id, title, category, date_publish,
SUBSTRING_INDEX(SUBSTRING_INDEX(my_keywords, ',', 5), ',', -1)
FROM my_table
 
) AS my_unique_keywords
ORDER BY id DESC ;

And a task to run it regularly. Take the opportunity to correct some typing errors (punctuation, space, case ...):

UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, '/', ' ') ;
UPDATE my_unique_keywords SET unique_keyword= REPLACE(unique_keyword, '/', ' ') ;
UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, '(', ' ') ;
UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, ')', ' ') ;
UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, ';', ',') ;
UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, '!', ' ') ;
UPDATE my_unique_keywords SET unique_keyword = REPLACE(unique_keyword, '?', ' ') ;
 
UPDATE my_unique_keywords SET unique_keyword = REPLACE(tag_unique, ' ', ' ') ;
 
UPDATE my_unique_keywords SET unique_keyword = SUBSTR(tag_unique, 2) WHERE tag_unique LIKE ' %' ;
UPDATE my_unique_keywords  SET unique_keyword = SUBSTR(unique_keyword, 1, CHAR_LENGTH(unique_keyword) - 1) WHERE unique_keyword LIKE '% ' ;

And add some indexes, to improve searches:

CREATE INDEX index_id ON my_unique_keywords (id_related) ;
CREATE INDEX index_tag ONmy_unique_keywords (unique_keyword(100)) ;

Good! We get our thesaurus, for example here in a table my_unique_keywords:

id_related title category date_publish  unique_keyword
1 Title 1 Cat1 2019-05-07 Genetics 
1 Title 1 Cat1 2019-05-07 Southeast Asia
1 Title 1 Cat1 2019-05-07 Oncology
1 Title 1 Cat1 2019-05-07 Classic errors
1 Title 1 Cat1 2019-05-07 Nurse
2 Title 2 Cat3 2019-05-02 Biology
... ... ... ... ...

All that remains is to query it on our pages.

Downstream data displaying

Now that we have a background thesaurus, we are free to request it on the pages of the recordings. An IN SQL code will be perfect, searching all keywords from a record in the thesaurus.

OK but remember our my_heywords field. Its cells contain something like this:

Genetics, Southeast Asia, oncology, classic errors, nurse

Unusable as it with SQL. To be able to use it in a query we have to enclose each words correctly. Example with some simplified PHP:

// Call your field
$MyKeywords = '{my_table___my_keywords}' ;
 
// Enclose each words
$MyKeywords = str_replace(' "', '"', '"'.str_replace('" ', '"', str_replace(',', '","', $MyKeywords)).'"') ;
 
//echo $MyKeywords ;

We juste replace the comma with an enclosed comma, and add quote at the beginning and the end. Now we get a beautiful:

'Genetics','Southeast Asia','oncology','classic errors','nurse'

Okay. More than build the query. Here again in simplified PHP, but translate the code into your favorite language.

$myQuery = $myDb->getQuery(true);
 
$myQuery = "
SELECT DISTINCT id_related, title,
CONCAT('published on ', DATE_FORMAT(date_publish, '%d/%m/%Y')) as my_date_publish
FROM my_unique_keywords
WHERE unique_keyword IN ($MyKeywords)
AND id_related <> $MyId
ORDER BY date_publish DESC
LIMIT 0,20
" ;
 
// echo $myQuery;
 
$myDb->setQuery($myQuery);
$MyResults = $myDb->loadObjectList();$list = array();

Here we exclude the current id, to display suggestions related, but call and use the necessary fields as you want (to manage ACLs, categories ...).

Now try the query. It became a simply:

SELECT DISTINCT id_related, title ...
FROM my_unique_keywords
WHERE unique_keyword IN ('Genetics','Southeast Asia','oncology','classic errors','nurse') ...

You retrieve an array of results, rather than looping it to display it in HTML, with links ... For example :

foreach ($MyResults as $row)
{
$list[] = "<div style='margin-bottom: 10px;'><a href='http://mywebsite.com/". $row->id ."' style='font-size: 18px; font-weight: bold'>" . $row->title . "</a><font style='font-size: 14px; font-style: italic;'> - " . $row->my_date_publish . "</font></div>";
}
if (!empty($MyResults)) {return "<div style='font-size: 22px;margin-bottom: 15px; margin-top: 30px;'>Others records related</div>
<div>" . implode($list) . "</div>";
}else {
return '';
}
;

Hop! Just adapt the little code above in the desired pages to display links towards other records related.

Many websites are information systems, where the user should access data through intuitive ways. Frameworks, CMS and libraries provide numerous plugins for this purpose (global search, advanced, filters, tags ...). Well used all this is very convenient, and very UX. Good!

But sometimes these plugins are insufficient, or at least do not allow you to manipulate data as you wish, and serve them where you want. To display customized suggestions for example. The method above makes it easier to manage a thesaurus without compelling the input.