- Starting URL
-
SuperBot will initiate the copying procedure at this location.
The URL must be in the standard format:
http://[username:password@]server[:port]/path
A username and password are only necessary if the material you are copying requires
authentication (i.e. members-only web pages). Here are some examples of good URLs:
- http://members.myserver.com/secure/document.html
- http://cia.com:81/ghetto/crack/plans.doc
- http://topsecret.co.jp
- http://EliteSys:IsGreat@207.136.80.38/~elitesys/secure/
- Save Directory
- SuperBot will save all retrieved files under this directory. For
example, if you choose c:\Program Files\SuperBot as the directory, and http://www.website.com as the starting URL, the retrieved files will be saved in the c:\Program Files\SuperBot\www.website.com directory.
- Location Restrictions
-
- Stay in or below this directory will prevent SuperBot from retrieving any files from
above the starting URL. For example, if your starting URL is
http://www.website.com/members/index.html, this option will prevent SuperBot from following a link to http://www.website.com/main.html, because main.html is "above" the members directory.
- Stay at this server will prevent SuperBot from following links that reference other servers. For example, if your starting URL is
http://www.website.com/members/index.html, this option will prevent SuperBot from following a link to http://www.othersite.com/top.htm, because top.htm is not hosted at www.website.com.
- If you choose no location restrictions, SuperBot will follow all links, regardless of their location.
Location restrictions apply only to HTML files, not their inline images and sounds. This ensures that copied pages look and act like their online counterparts.
- Restrict copying depth
-
This restriction provides an alternate way to limit SuperBot's copy procedure. If this
box is checked, the number of links SuperBot can follow at one time will be limited to the number you specify.
For example, let's assume that you start at http://www.siteone.com with the maximum number of clones set at 2. http://www.siteone.com contains a link to
http://www.sitetwo.com, which contains a link to http://www.sitethree.com. SuperBot will follow the links at www.siteone.com and at www.sitetwo.com, but will not follow the links at www.sitethree.com, because those links are three levels "deep".
If this box is left unchecked, the maximum copying depth will automatically be set to 30.
- Restrict number of files
- If this restriction is enabled, SuperBot will stop when the specified number of files have
been copied, even if some links have not been followed.
- Only download files with these extensions
Do not download files with these extensions
-
For even more control, you can have SuperBot ignore certain types of files. SuperBot determines a file's type by examining the filename extension. For example, if you want to skip downloading any sound files, you might type mid wav ra au in this box. If you want to ignore links to videos, you could type avi mov mpg rm in this box.
Up to eight extensions may be entered in this box, each up to eight characters long. Each extension should be separated by a single space, and periods, commas, semicolons, etc. should NOT be entered.
Under some circumstances, extension restrictions will be overridden:
- To avoid misuse of server resources, SuperBot will not follow any links to files with these
extensions: exe com cgi pl asp. Any URLs containing embedded arguments
(indicated by a ?, ; or =) will also be ignored.
- SuperBot ignores this restriction when downloading web pages (htm html phtml shtml).
- Download inline pictures and sounds
- If this option is enabled, SuperBot will download page background graphics, and other embedded pictures, sounds, and videos.
- Update older files
- If this option is enabled, SuperBot will update any files that have been modified since the last time they were downloaded.
- Allow authentication
- If you are copying a passworded site, this option must be enabled. (You must also enter your username and password in the Starting URL).
This option will allow SuperBot follow links with embedded usernames and passwords; you may want to enable the Stay at this server restriction to keep SuperBot out of restricted areas.
- Take naps between downloads
- To avoid monopolizing the resources of a web server, SuperBot can pause for 3 seconds between each download. If this option is disabled, SuperBot will download files as fast your network resources allow.
- Ignore robot META tags
- SuperBot will ignore the NOINDEX and NOFOLLOW ROBOT META request tags of web page authors if this option is enabled. Use this option with caution.