website scraping
Project ID: 1328312638
Project Details
  • Status:
    Closed (Chosen Programmer: raul27868; Paid; Rated 10 out of 10)
  • Posted:
    2/3/2012 at 18:43 EST
  • Closed:
    2/10/2012 at 11:48 EST
  • Project Creator:
    Rated 10 out of 10 for this project.
  • Budget:
    N/A
  • Description:
    Hello coders.

    Background:
    The project is part of a site which is supposed to give a better answer to consumer needs regarding comparing prices and getting to better deals over the web.

    Project description:
    The data is supposed to be the most basic corner stone of the site.
    Scrapping of 10 sites for the same data.
    Entering the scraped data to csv format.

    The scrapping scripts are supposed to run in a loop so that the data should be always up to date.

    In addition the data should be achieved without the need for logging into the sites.

    Looping the scripts is not part of the requirements for this project.

    Preference:
    1.Python + Scrapy for scrapping the sites.
    2.Optimization - should be both speed and space.
    The scripts should not consume minimal memory and run as fast as possible.
    3.The programmer should be smart and think out of the box , take decisions , and make everything work as expected.
    4.I will be available and expect consultation in case it is needed.

    Legal:


    1) Employer will receive exclusive and complete copyrights to all work paid for. All paid for deliverables will be considered 'work made for hire' under U.S. Copyright law.
    1b) No part of the deliverable may contain any copyright restricted 3rd party components (including GPL, GNU, Copyleft, etc.) unless all copyright ramifications are explained AND AGREED TO by the employer on the site per the worker's 'Worker Legal Agreement'.

    Thanks
    Additional Info (Added 2/4/2012 at 10:20 EST)...
    I have it almost completely working under a very slow and inefficient infrastructure and looking to move it to python/scrapy or any other which would be extremely fast , robust and error-free.

    Additional Info (Added 2/6/2012 at 16:55 EST)...
    All scripts/code given by the programmer should run under in-motion hosting under "launch" plan(http://www.inmotionhosting.com/hostingplans.html).

    All code will be handed to Employer and will be solely owned by him.
    Additional Info (Added 2/6/2012 at 17:13 EST)...
    The scraping should be 100% accurate and no errors will be accepted.

    Accuracy is extremely important in this case and crucial in order for the full funds to be admitted to the worker.
    Additional Info (Added 2/7/2012 at 4:42 EST)...
    some sites might require logging in.
    The script should scrape all data regardless of whether it requires logging in or not.

    In cases logging in is required , the script should do it as preliminary step.
    Additional Info (Added 2/9/2012 at 10:19 EST)...
    In case login is needed to one of the scraped sites- the worker will open a fictitious account for that need.

    Data scraped should be 100% full and accurate.

    Any deviation from the above would be considered as "project code not supplied".

    Additional:

    1) I require complete and fully-functional working program(s) in executable form as well as complete source code of all work done (so that I may modify it in the future).
    2) Deliverables must be in ready-to-run condition as follows (depending on the nature of the deliverables):
    2b) Any website server-side deliverables must be installed in ready-to-run condition in the employer's environment (unless overridden otherwise by the employer elsewhere in this contract.
    2c) If there are any server-side deliverables (intended to only exist in one place in the employer's environment) then they must be installed by the winning worker in ready-to-run condition (unless specified elsewhere by the employer).
    2d) All other software (including but not limited to any desktop software or software the employer intends to distribute) must include a software installation package that will install the software in ready-to-run condition on the platform(s) specified in this project (unless specified elsewhere by the employer).

    Thanks
    Additional Info (Added 2/9/2012 at 10:24 EST)...
    Attached file: Requirements for scraping data.rar
    File info: Specific requirements for dcraping the data
    Additional Info (Added 2/10/2012 at 7:17 EST)...
    Attached file: Requirements for scraping data.rar
    File info: Detailed requirements document - password needed
  • Attached Files:
  • Tags:
Project Bids



(2 bids have been placed. sdanpo has chosen to keep all bids for this project hidden.)