Solved thread

This post is marked as solved. If you think the information contained on this thread must be part of the official documentation, please contribute submitting a pull request to its repository.

findFirst super slow inside large loop

I'm using Phalcon 3.0.4. I made a foreach on each file inside my folder. Currently I have just 4000 files. I did a findFirstto check if the filename already exist in MySQL (I have 100 000 rows in my table). But when I use findFirst, the response is super slow (I have to wait 20 minutes to get a response). Here is my code :

     $dir = new FilesystemIterator("files/path/to/my/files/");
     foreach ($dir as $file) {
         if ($file->getExtension() == 'json') {
             $filename = $file->getFilename();
             $explode_filename = explode("_", $filename);
             $date = $explode_filename[0];

             $unformatted_date = DateTime::createFromFormat("Ymd-His", $date);
             $date_server = $unformatted_date->format("Y-m-d H:i:s");

             $timestamp_app = $explode_filename[2];
             $date_app = date("Y-m-d H:i:s", $timestamp_app/1000);
             echo $date_server;
             $json_data = json_decode(file_get_contents($file), true);

             $scan = Scans::findFirst(array(
                 "name = :name:",
                 "bind" => array("name" => $filename)
             ));

             if  (!$scan) {
                 ...
             }
         }
     }

If I remove the findFirst the response is ~30ms but with the findFirst it will takes ~20 minutes... How can I do to increase the performance of the search in my table ?



30.0k
Accepted
answer

so you're doing 4 000 times a findFirst in a 100 000 database table ?

phql is more efficient then ORM in that case



12.6k

So I have to use QueryBuilder for that ?



30.0k
edited Mar '17

use something like:

$modelsManager = $this->modelsManager;

     $dir = new FilesystemIterator("files/path/to/my/files/");
     foreach ($dir as $file) {
         if ($file->getExtension() == 'json') {
             $filename = $file->getFilename();
             $explode_filename = explode("_", $filename);
             $date = $explode_filename[0];

             $unformatted_date = DateTime::createFromFormat("Ymd-His", $date);
             $date_server = $unformatted_date->format("Y-m-d H:i:s");

             $timestamp_app = $explode_filename[2];
             $date_app = date("Y-m-d H:i:s", $timestamp_app/1000);
             echo $date_server;
             $json_data = json_decode(file_get_contents($file), true);

             $query = $modelsManager->createQuery("SELECT * FROM Scans WHERE name = :name:");
             $cars  = $query->execute(
               [
                    "name" => $filename,
               ]
           );


             if  (!$scan) {
                 ...
             }
         }
     }


12.6k
edited Mar '17

Wow this is really fast thanks you. PHQL give a good performance !! I did something like this :

$scan = $this->modelsManager->createBuilder()
                                ->from("Scans")
                                ->where("name = :name:", ["name" => $filename])
                                ->limit(1)
                                ->getQuery()
                                ->execute();

There is a difference of performance between createQuery and queryBuilder ?



30.0k

I'm not expert, so I can't tell you.

https://docs.phalconphp.com/en/3.0.1/reference/models-cache.html#caching-of-phql-planning

Maybe quertBuilder offer the same approach ...



12.6k

Okay. Thanks for you help :)



30.0k

There is a difference of performance between createQuery and queryBuilder

https://docs.phalconphp.com/en/latest/reference/phql.html#creating-queries-using-the-query-builder

in fact, query builder is just anther way to build phql query, so finally, I guess so



12.6k
edited Mar '17

Hum, in fact I got the same performance :/. I made a mistake previously when I used queryBuilder and it was ignored. That is why I had "good perf". But now it's super slow again :(



30.0k

look at the example that is shown in the doc I've mentionned.

What is important, is the way you call the query builder (or phql query) in your foreach loop.

See that post also: https://forum.phalconphp.com/discussion/15027/allowed-memory-size-of-134217728-bytes-exhausted-after-many-mode

edited Mar '17

I wrote you solution on stackocerflow, instead of doing one find 4000 times find all scans or 1000 or more depending on memory limit at once and filter resultset to check if your filename what you want exists. I will try to post solution for you soon.



30.0k

I think the use case from @Johngtrs is close from the example given in the doc: https://docs.phalconphp.com/en/latest/reference/models-cache.html#caching-of-phql-planning

<?php

$phql = "SELECT * FROM Store\Robots WHERE id = ?0";

$query = $this->modelsManager->createQuery($phql);

for ($i = 1; $i <= 10; $i++) {
    $robots = $query->execute(
        $phql,
        [
            $i,
        ]
    );

    // ...
}
edited Mar '17
$dir = new FilesystemIterator("files/path/to/my/files/");
$fileNames = [];
foreach ($dir as $file) {
    if ($file->getExtension() == 'json') {
        $filesNames[] = $file->getFilename();
    }
}

$existingFileNames = Scans::find(
    [
        'columns'    => 'name',
        'conditions' => 'name IN ({names:array})',
        'bind'       => [
            'names' => $fileNames,
        ],
    ]
)->toArray();
// maybe you could even remove this condition or use name IS NOT NULL, your choice

array_walk(
    $existingFileNames,
    function (&$v) {
        $v = $v['name'];
    }
);

foreach ($fileNames as $fileName) {
    if (!in_array($fileName, $existingFileNames)) {
        $explode_filename = explode("_", $fileName);
        $date = $explode_filename[0];

        $unformatted_date = DateTime::createFromFormat("Ymd-His", $date);
        $date_server = $unformatted_date->format("Y-m-d H:i:s");

        $timestamp_app = $explode_filename[2];
        $date_app = date("Y-m-d H:i:s", $timestamp_app / 1000);
        echo $date_server;
        $json_data = json_decode(file_get_contents("files/path/to/my/files/".$fileName), true);
        // do rest of stuff
    }
}

How about something like this? Keep in mind to add index on name column.

edited Mar '17

If you are expecting that Scans::find will return big resultset then you could use code like this:

$dir = new FilesystemIterator("files/path/to/my/files/");
$fileNames = [];
while($dir->valid()) {
    $i = 0;
    $file = $dir->current();
    $filesNames[] = $file->getFilename();
    $i++;
    if($i == 1000) { // change this number to your needs
        $existingFileNames = Scans::find(
            [
                'columns'    => 'name',
                'conditions' => 'name IN ({names:array})',
                'bind'       => [
                    'names' => $fileNames,
                ],
            ]
        )->toArray();
// maybe you could even remove this condition or use name IS NOT NULL, your choice

        array_walk(
            $existingFileNames,
            function (&$v) {
                $v = $v['name'];
            }
        );

        foreach ($fileNames as $fileName) {
            if (!in_array($fileName, $existingFileNames)) {
                $explode_filename = explode("_", $fileName);
                $date = $explode_filename[0];

                $unformatted_date = DateTime::createFromFormat("Ymd-His", $date);
                $date_server = $unformatted_date->format("Y-m-d H:i:s");

                $timestamp_app = $explode_filename[2];
                $date_app = date("Y-m-d H:i:s", $timestamp_app / 1000);
                echo $date_server;
                $json_data = json_decode(file_get_contents("files/path/to/my/files/".$fileName), true);
                // do rest of stuff
            }
        }

        $i = 0;
        $fileNames = [];
    }
    $dir->next();
}