Solved thread

This post is marked as solved. If you think the information contained on this thread must be part of the official documentation, please contribute submitting a pull request to its repository.

How to crawl web site

Hello there,

Is it possible to crawl web site by phalcon library?

edited Aug '17

you dont need MVC framework for that. Casper js / phantom js ( old school ) or + selenium

it does require small amount of different type of skills until you set it up , configure etc.. but once used to it , it quite simple.

have fun and dont get cough :-)


I decided to use fabpot/goutte. But, thank for your comment.


keep in mind js doesnt work in those crawers, things like ajax laoded data wont be crawable

don;t forget wget. it has a mirror option

edited Aug '17

HTTrack records a whole Website to disk so you can crawl it offline. Good for before/after snapshots. You could then use something like Meld to see the differences.

The follow on question is: why do you want to crawl the Web site? That will determine the level of detail you will need.