Gogol (parody)

Description of the Rage Prank algorithm

While most developers of search engines consider their algorithms as being highly confidential stuff, here's how our search engine's GogolSpider indexing bot works :

  1. Download a non-blacklisted web page, if allowed by that site's robots.txt file.
  2. Extract all HTTP urls from it. We don't want to use other protocols like FTP or HTTPS.
  3. Save all non-blacklisted urls into our database.
  4. Extract a not-yet-visited url randomly from the database.
  5. Go to 1

Whenever an end user searches something in Gogol, we save the search terms to be able to present them in Top and Latest pages. These two pages try to workaround some 15 years old ass holes who try to fill them with spam. Then we randomly extract an url from our database, and redirect the user's web browser to that url, using an HTTP 303 status code.

Finally, some cleaning software routines maintain the content as interesting and varied as possible using the following trade secret :

  1. Each day in a cron job, we delete all urls pertaining to any domain which holds more than 1% of our database content.
  2. We don't care about anything else.

Technical information

Gogol is hosted on a Debian Etch box (with OpenSSL !!!!!!) with the Apache 2 web server, the Python programming language, and the PostgreSQL RDBMS. This makes a really great combo. Currently Gogol is made of Python CGI scripts, but this may change in the future (still Python of course) if traffic grows too much...
Our (beloved) hosting provider is Gandi.

©2002-2008 LibreLogiciel.com - This site is a parody of Google.