Full text searches

I'm trying to use the query builder to build a full text search.

This is my code:

$builder = self::getQueryBuilder()
    ->from('table')
    ->where('MATCH(column) AGAINST (:query: IN BOOLEAN MODE)');

$result = $builder->getQuery()
    ->execute(array('query' => $query));

But all I get is this error:

Syntax error, unexpected token IDENTIFIER(AGAINST), near to ' (:query: IN BOOLEAN MODE)'

How can I perform a full text search?



85.0k

MATCH/AGAINST is a MySQL extension that cannot be translated to the other database systems supported like PostgreSQL/Sqlite.

You can use Raw SQL to execute specific database extensions like this:

http://docs.phalconphp.com/en/latest/reference/phql.html#using-raw-sql

The information on the above link is new or a was I blind?

Thank you for your answer! :)

Have you looked at Sphinxsearch project? It works way better than mysql or postgre.

Fulltext search is a feature that will definitely slow things down in your site the more data is entered in your tables. In addition to this your results and performance from the FULLTEXT are a function of the minimum length of words you will allow to be indexed.

As @roman-kulish mentioned Sphinx is definitely a good and extremely fast alternative. Solr is also another alternative.

For a project that will scale easily, you can use the interface of one of those (Solr, Sphinx) and create code that will create the indexed data/documents for your records, and then maintain it by deleting/editing/adding to the document index when needed. Your search then will become a matter of querying Solr/Sphinx vs. using MySQL. The Solr/Sphinx server could very well be in a different box and scale as your project grows.

Yes, I'm aware of solutions like Sphinx and Solr. I'm using them on several projects and I intend to use Solr on this particular project where I'm also using phalcon.

As for the mysql fulltext search, I'm only using it using it on a specific query on the website's backoffice while I don't have Solr up and running! :)

Anyway... thank you for your advices! :)



47.6k
edited Aug '14

I can heavily recommend Lucene. It offers a language stemmer that vastly improve the quality of the results. Here are a few functions that I've written. They evolved over time and are quite ugly at the moment (possibly my ugliest code in the project by far - no love for Java here). However, I think that it could save you a lot of time. It took me quite a while to compile all of the code bits to make it work properly and with Java there are a million ways to wrap everything.

So basically in this code there are some fields that you can ignore. They will give you an example of different types if you want it.

Insertion creates a new segment so two insert operations can be running at once without problems. Merging segments is a blockiing operation. The main advantage to using something like Solr over raw Java Lucene is that it is more efficient (and more complicated) since it is a long running server. In my program I need a bunch of disjoint databases so I'm using this approach.

    private static void search(IndexReader reader, String q, String pathTxtGeneric, String idName, String lcptName, String lidName, String lelemName, int segNum, int segSize) throws Exception
    {
        IndexSearcher searcher = new IndexSearcher(reader);

        Analyzer analyzer = new EnglishAnalyzer(Version.LUCENE_35);

        QueryParser parser = new QueryParser(Version.LUCENE_35, "__content", analyzer);


        Query query = parser.parse(q);
        query = query.rewrite(reader); //required to expand search terms

        // maximum number of documents to search through.
        int maxNumDocs = 100000;

        TopDocs topDocs;
//      if (searchType.equals("complete"))
//      {
            TermQuery filterQuery = new TermQuery(new Term(lcptName, "file"));
            QueryWrapperFilter filter = new QueryWrapperFilter(filterQuery);
            topDocs = searcher.search(query, filter, maxNumDocs);
//      }
//      else if (searchType.equals("pages"))
//      {
//          TermQuery filterQuery = new TermQuery(new Term("__indextype", "page"));
//          QueryWrapperFilter filter = new QueryWrapperFilter(filterQuery);
//          topDocs = searcher.search(query, filter, maxNumDocs);
//      }

        Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter("<span class=\"luceneMatchWord\">", "</span>"), new QueryScorer(query));



        int recordCount = topDocs.totalHits;
        int recordStart, recordEnd;


        if (recordCount == 0)
        {
            recordStart = 0;
            recordEnd = -1;
        }
        else
        {
            recordStart = segNum * segSize;
            if (recordStart + segSize > recordCount)
                recordEnd = recordCount - 1;
            else
                recordEnd = recordStart + segSize - 1;
        }


        System.out.print(Integer.toString(recordCount));


        String lcptVal;
        String lidVal;
        String lelemVal;

        String txtExtractPath;
        String contents;
        TokenStream tokenStream;
        String hitFrag;


        for (int i = recordStart; i <= recordEnd; i++)
        {
            int docID = topDocs.scoreDocs[i].doc;
            Document doc = searcher.doc(docID);

            lcptVal = doc.get(lcptName);
            lidVal = doc.get(lidName);
            lelemVal = doc.get(lelemName);

            // Context 
            txtExtractPath = pathTxtGeneric.replaceAll("##id##", lidVal);
            contents = readTextFile(txtExtractPath);

            // Get 3 best fragments and seperate with a "..."
            tokenStream = analyzer.tokenStream("__content", new StringReader(contents));
            hitFrag = highlighter.getBestFragments(tokenStream, contents, 3, "...<br>");



            System.out.print("\n");
            printFileHit(i + 1, lcptVal, lidVal, lelemVal, hitFrag);
        }
    }

    private static void startAdd(Directory indexDir, String[] args) throws Exception
    {
        String pathTxtGeneric = "";

        String lcptName = "lcpt";
        String lidName = "lid";
        String lelemName = "lelem";
        String lvalName = "lval";

        String lvalVal = "";
        String lcptVal = "";
        String lidVal = "";
        String lelemVal = "";

        boolean optimize = false;

        int i = 2;
        while (i < args.length)
        {
            String option, value;
            option = args[i];
            if (i + 1 < args.length)
                value = args[i + 1];
            else
                break;

            ///////////////////////////////////////////
            if (option.equals("-pathTxtGeneric"))
                pathTxtGeneric = value;
            ///////////////////////////////////////////
            else if (option.equals("-optimize") && value.equals("true"))
                optimize = true;
            ///////////////////////////////////////////
            else if (option.equals("-lvalVal"))
                lvalVal = value.toLowerCase();
            ///////////////////////////////////////////
            else if (option.equals("-lcptVal"))
                lcptVal = value;
            ///////////////////////////////////////////
            else if (option.equals("-lidVal"))
                lidVal = value;
            ///////////////////////////////////////////
            else if (option.equals("-lelemVal"))
                lelemVal = value;
            ///////////////////////////////////////////

            i += 2;
        }

//      if (addMode.equals("complete"))
//      {
            Document doc = new Document();

            String txtContentFile = pathTxtGeneric.replaceAll("##id##", lidVal);
            doc.add(new Field("__content", new FileReader(txtContentFile)));

            doc.add(new Field(lcptName, lcptVal, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.add(new Field(lidName, lidVal, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.add(new Field(lelemName, lelemVal, Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.add(new Field(lvalName, lvalVal, Field.Store.YES, Field.Index.NOT_ANALYZED));

            IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_35, new EnglishAnalyzer(Version.LUCENE_35));
            IndexWriter writer = new IndexWriter(indexDir, conf);

//          writer.setUseCompoundFile(false);
            writer.addDocument(doc);

            if (optimize)
                writer.optimize();

            writer.close();
//      }
//      else if(addMode.equals("pages"))
//          addPages(indexDir, pathTxtGeneric, Integer.parseInt(elemCountVal), idName, idVal, elemCountName, elemCountVal);

    }

`

edited Aug '14

> How about https://mariadb.com/kb/en/fulltext-index-overview?

performance is very slow... i tested a lot of FTS for our projects... Our one of projects uses Apache SOLR with cutom GIS extension... other projects use ElasticSearch... and i think ElasticSearch is best for current time.