Coldfusion CustomTags - Scrape
Project ID: 1265034458
Project Details
  • Status:
    Closed (Cancelled)
  • Posted:
    2/1/2010 at 9:27 EST
  • Cancelled:
    2/23/2010 at 0:19 EST
  • Project Creator:
  • Budget:
    N/A
  • Description:
    Tags:

    <CF_Google searchterm="my search phrase" domain="www.acme.com" depth="100">

    <CF_Yahoo searchterm="my search phrase" domain="www.acme.com" depth="100">

    <CF_Bing searchterm="my search phrase" domain="www.acme.com" depth="100">

    <CF_Ask searchterm="my search phrase" domain="www.acme.com" depth="100">

    Basically, when called, the tag will do a query to the search engine indicated.

    - Tag name indicates which search engine to use
    - SearchTerm indicates which search phrase to query
    - Domain indicates what domain name to match
    - Depth = the number of ranks deep to check

    Each search engine allows a parameter in the URL on the GET operation that indicates the number of rows to return. If depth = 100, you will do one CFHTTP GET operation with that parameter in the URL, indicating to return one result page with 100 records. You should not be doing 10 queries to get 10 results per query.

    When you get the results, they should be parsed and placed into an array.

    IMPORTANT: We ONLY want natural search results. Ignore all Pay-Per-Click / Sponsored Search, as well as any Local Search (ie. the maps). Do not include them when calculating ranks.

    Take just the natural results, loop through, and build four arrays:
    title[x] = the title (ie. hyperlink text)
    description[x] = the description shown in the results
    domain[x] = the clean domain name of the listing
    theurl[x] = the URL of the listing

    NOTE: : Do not create an array url[x] because "url" is a reserved word in ColdFusion and this will only cause you problems. Please name it "theurl"

    So the custom tag will first create those four one-dimensional arrays, do the CFHTTP get and then loop over, populating the results. Those results will then be available as the 4 arrays.

    You will also create a variable called "ranksfound" initially set = ""

    If you find the domain name while going down the page, you will APPEND that rank to the RanksFound variable. So if it is found at ranks 7 and 12 and 23, then RanksFound = 7,12,23

    This is a simple task - I've built it before. I just don't have the time right now. It just takes a good eye to parse the rows and figure out what the "delimiter phrases" are that start and end each "row" of data.

    To be paid, this must work accurately on all four engines. Please do NOT send me something broken. Make sure you run it and compare the results to on-screen results. If your titles from listing 7 are shown with the domain name from listing 8, that is unacceptable!

    I hope to award this quickly.

    Thank you
  • Tags:
Project Bids



(3 bids have been placed. drgdrg has chosen to keep all bids for this project hidden.)