work with api and manage RAM and Time

Question

work with api and manage RAM and Time

EhsanJamshidi Mar '19

Created Mar '19	Last Reply Apr '19	Replies 2	Views 267	Votes 0

EhsanJamshidi 8.0k

Mar '19

Sorry to ask this question here it is not completly related to PHALCON, In my project I need to get JSON from an API, this API divide their information in to pages, I use some code like this to get JSON from API

$LastPage = 750; $Type1 = []; $Type2 = []; for ($i = 0; $i < $LastPage; $i = $i + 1) { $UrlJson = "https://example.com/page?".$i; $contents = @file_get_contents($UrlJson); if ($contents !== false) { foreach($contents['entity'] as $entity){ array_push($Type1, $entity['typeOne']) array_push($Type2, $entity['typeTwo']) } } }

each json have size around 400 KB this process take 20 minute any idea how can I make it faster to receive the info from API, something like read a few page together?

Lajos Bencz
77.7k

Accepted
answer

edited Apr '19
Apr '19

Well, there's no free lunch.

You can request the API one at a time, and save memory, complexity but suffer in execution time.

You can use parallel fetching to save execution time and complexity, but suffer more memory usage.

You can also refactor the whole task to parallelized, streaming requests to save memory and time, but suffer from a more complex code.

I used your suggestion, but it took more then 4GB memory of my server?

If the size of one response is around 400MB, then it's working as intented. You can decrease that by streaming the response and parsing it in chunks. But that would mean you have to parse the JSON yourself aswell.

EDIT:

If you want to do the third (memory-friendly, parallel fetching), I'd suggest you save the response JSON files to disk, then parse it with a 3rd party library.

An example for a single streaming CURL request saved to disk: https://gist.github.com/lmammino/4353399

A library for streamed parsing of JSON files: https://github.com/salsify/jsonstreamingparser

Lajos Bencz · Answer 1 · 2019-04-01T03:47:35-07:00

If pagination info is only available after a successful request, you can't really parallelize the requests (each request relies on the previous state).

But in theory, you could refactor your http request to use curl_multi_exec:

$nodes = array($url1, $url2, $url3);
$node_count = count($nodes);

$curl_arr = array();
$master = curl_multi_init();

for($i = 0; $i < $node_count; $i++)
{
    $url =$nodes[$i];
    $curl_arr[$i] = curl_init($url);
    curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
    curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
    curl_multi_exec($master,$running);
} while($running > 0);

for($i = 0; $i < $node_count; $i++)
{
    $results[] = curl_multi_getcontent  ( $curl_arr[$i]  );
}
print_r($results);

https://stackoverflow.com/questions/9308779/php-parallel-curl-requests

EhsanJamshidi · Answer 2 · 2019-04-01T05:57:37-07:00

I used your suggestion, but it took more then 4GB memory of my server? any idea where I am doing wrong, I used exact code as yours and just extract information from it, by the way, I send 10 request at once, then 10 more in a loop.

If pagination info is only available after a successful request, you can't really parallelize the requests (each request relies on the previous state).

But in theory, you could refactor your http request to use curl_multi_exec:
$nodes = array($url1, $url2, $url3);
$node_count = count($nodes);

$curl_arr = array();
$master = curl_multi_init();

for($i = 0; $i < $node_count; $i++)
{
   $url =$nodes[$i];
   $curl_arr[$i] = curl_init($url);
   curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
   curl_multi_add_handle($master, $curl_arr[$i]);
}

do {
   curl_multi_exec($master,$running);
} while($running > 0);

for($i = 0; $i < $node_count; $i++)
{
   $results[] = curl_multi_getcontent  ( $curl_arr[$i]  );
}
print_r($results);
https://stackoverflow.com/questions/9308779/php-parallel-curl-requests