Accentuation Query

Hello,

I'm facing a problem when trying to search for a specific string. The problem is, in the database I have both strings, for example: João and Joao.

When I type Joao in the search field, it should return both strings: João and Joao. The same should happen if I typed João.

What I'm trying to do is ignoring the accentuation.

This is executed with QueryBuilder by running a LIKE "%$term%".

I've already tried to use LIKE "%$term%" collate utf8_general_ci but it throws me an error.

By the way, the table collation is already set as utf8_general_ci

Thank you in any advance.

edited 10d ago

Make sure to pass in utf8 as the charset when constructing the db adapter:

$connection = new \Phalcon\Db\Adapter\Pdo\Mysql(
    [
        'host'     => 'localhost',
        'username' => 'root',
        'password' => 'sigma',
        'dbname'   => 'test_db',
        'charset'  => 'utf8',
    ]
);

Make sure to pass in utf8 as the charset when constructing the db adapter:

$connection = new \Phalcon\Db\Adapter\Pdo\Mysql(
   [
       'host'     => 'localhost',
       'username' => 'root',
       'password' => 'sigma',
       'dbname'   => 'test_db',
      'charset'  => 'utf8',
   ]
);

Yep, I've already did that. Even so, the problem persists.

Are you using raw SQL? Can you show us the full query?

After some cursory searching, it seems changing the collation on the fly disables indexes, so your query could really slow down. One common suggestion is to make a separate column in the table just for searching. When saving, both Joao and João get converted to Joao and stored in that special "search" column. This allows indexing to still work, greatly speeding up the query.

edited 10d ago

Are you using raw SQL? Can you show us the full query?

After some cursory searching, it seems changing the collation on the fly disables indexes, so your query could really slow down. One common suggestion is to make a separate column in the table just for searching. When saving, both Joao and João get converted to Joao and stored in that special "search" column. This allows indexing to still work, greatly speeding up the query.

Hi, Dylan.

No, I'm not using raw SQL, I'm using the Phalcon Query Builder object (\Phalcon\Mvc\Model\Query\Builder).

This is how I'm trying to mount the where clause:

    private function getWhereClause(string $searchTerm): string
    {
        $subtaskTitle = htmlentities($searchTerm, ENT_QUOTES, "UTF-8", false);
        $subtaskTitleNoAccent = $this->removeAccent($subtaskTitle);

        $whereClause = [
            'DWTasks.id LIKE "%%%1$s%%"',
            'DWTasks.taskTitle LIKE "%%%1$s%%"',
            'DWTasks.taskTitle LIKE "%%%3$s%%"',
            'DWTasks.subtaskTitle LIKE "%%%2$s%%"',
            'DWTasks.subtaskTitle LIKE "%%%3$s%%"',
            'DWProject.clientName LIKE "%%%1$s%%"',
            'DWProject.projectTitle LIKE "%%%1$s%%"',
            'Status.name LIKE "%%%1$s%%"',
            'DWTasks.estimated LIKE "%%%1$s%%"',
            'DWTasks.worked LIKE "%%%1$s%%"',
            'DWProjectDocs.involveds LIKE "%%%1$s%%"',
            'DWTasks.item LIKE "%%%1$s%%"',
            'DATE_FORMAT(DWTasks.deadline, "%%d/%%m/%%Y %%H:%%i:%%s") LIKE "%%%1$s%%"',
        ];

        return sprintf(
            implode(' OR ', $whereClause),
            $searchTerm,
            $subtaskTitle,
            $subtaskTitleNoAccent
        );
    }

That part of code is going to be used on the $queryBuilder->where();.

As you can see, I've had to implement a removeAccent() method because the query isn't working. The problem here is that this works when I search for João, because in this case I just need to remove the accents, but when I search for Joao, of course I would have to test plenty of possibilities of accents once I haven't got a clue if the word has accents or not.

Yes, I know, I'm not proud of that, but it was a partial workaround to the problem.

But, when I run a simple query directly in the database it works.

Well, I've changed to PHQL and it's working now.

Thank you Dylan and Lajos.

edited 10d ago

removeAccent -> are you sure you've covered every possible case out there? Systems which support such search are way complex than MySQL/MariaDB.

Also, with that many wildcarded LIKE I believe performance is just bad as hell with many rows. If possible you should use full text search syntax as DB engine will use indexes for sure then.

edited 10d ago

Provide "simple query" which works for this scenario. removeAccent -> are you sure you've covered every possible case out there? Systems which support such search are way complex than MySQL/MariaDB.

Hi, Jonathan.

Yes, the query is way simple now.

And yes, I've replaced that removeAccent for a more comlete package which does that.