We have moved our forum to GitHub Discussions. For questions about Phalcon v3/v4/v5 you can visit here and for Phalcon v6 here.

How to crawl web site

Hello there,

Is it possible to crawl web site by phalcon library?



85.5k
Accepted
answer
edited Aug '17

you dont need MVC framework for that. Casper js / phantom js ( old school ) or https://github.com/facebook/php-webdriver + selenium

it does require small amount of different type of skills until you set it up , configure etc.. but once used to it , it quite simple.

have fun and dont get cough :-)



2.2k

I decided to use fabpot/goutte. But, thank for your comment.



85.5k

keep in mind js doesnt work in those crawers, things like ajax laoded data wont be crawable

don;t forget wget. it has a mirror option



9.7k
edited Aug '17

HTTrack records a whole Website to disk so you can crawl it offline. Good for before/after snapshots. You could then use something like Meld to see the differences.

The follow on question is: why do you want to crawl the Web site? That will determine the level of detail you will need.