Website spider program
Project ID: 1231236970
Project Details
- Status: Closed (Cancelled)
- Posted: 1/6/2009 at 5:16 EST
- Cancelled: 1/8/2009 at 3:45 EST
- Project Creator:
- Budget: N/A
- Description: I need a program that will take a given domain name and then spider the website. The number of pages that the program spiders will need to be configurable.
The content of each page, including title, meta tags and body text needs to be saved in a mysql database for every page on the target site. The spider must follow the usual spider rules such as those defined in the robots.txt file and not following NOFOLLOW links, etc. It should also check for a sitemap file to determine the relative importance of pages on the site.
The program needs to run on a Linux platform. - Tags:
| Project Bids |






