Including Travel Bug pages on your PDA
GPX files contain a wealth of information about geocaches. Everything you could possibly want to
know about a cache is in the GPX file. Travel bugs, on the other hand, are barely mentioned. All
that is in the GPX file is the name and ID# of the travel bugs.
Valuable information, such as the bug's goal, is left out.
GPX Spinner has a solution.
By enabling this feature, GPX Spinner will spider the geocaching.com website and include
detailed pages for all the travel bugs referenced in the GPX file(s). This will give you all the
details, goals, pictures, and logs for all the bugs to help you in your quest.
Note: This feature is provided for educational purposes only. I do not recommend it for everyday use. If you
do use it, you need to be aware of the risks. These risks include having your access to
geocaching.com restricted. Please read this page carefully.
Geocaching.com has a Terms of Use
agreement. Paragraph 5 of that agreement reads, in part:
You agree that you will not use any robot, spider, scraper or other automated means to access the
Site for any purpose without our express written permission.
I am not a lawyer, but I don't think it takes a degree to figure out that the use of this feature in GPX Spinner violates that agreement. Let's
read the next sentence of that agreement:
Additionally, you agree that you will not: (a) take any action that imposes, or may impose in our
sole discretion an unreasonable or disproportionately large load on our infrastructure;
Based on that, we can understand the reason they prohibit spiders - so you don't
impose an unreasonable load on their servers. Spinner tries to minimize that load by pausing for a
time between page grabs.
Will geocaching.com know if we violate those terms, and what will happen if they do?
This
message in the Groundspeak forums answers that question, in a roundabout way. Item #5 in that post tells us that
something referred to as "automated throttling" will kick in between 300 and 400 requests within a limited
time. Fortunately the typical 500-cache pocket query contains less than 80 travel bugs. You'd have to
spin 4 or more GPX files right after each other to hit that limit.
If you choose to use this feature, you do so at your own risk. Lil Devil takes no responsibility if
your access to geocaching.com is temporarily or permanently blocked.
So let's look at the options in Spinner's INI file that pertain to this feature.
| Name |
Description |
| Include_TB_pages |
Set this option to true (1) to enable spidering of travel bug pages. Default is off. |
| Include_TB_pic |
Set this option to true to include the travel bug's picture on the detail page.
You should only enable this if your PDA has lots of memory for images. |
| Hours_to_cache_TBs |
Spinner caches the travel bug pages for a limited time. This helps in cases where you have
multiple overlapping pocket queries, or you re-spin a pocket query for some reason. Since its
unlikely that the page has changed, instead of spidering the page again, Spinner just grabs it from
its own cache. This option configures how long that cache lasts. If all you care about is the
travel bug's goal, and you don't care about the recent logs, then you should set this to a very high
value, like 200 or more. This will further limit the load placed on geocaching.com by only spidering
a given travel bug once per week or more. |
| HTTP_delay_seconds |
This is the delay between spidering pages. The minimum value is 3 seconds. Note this is in addition to the 1 or 2 seconds
it takes to actually spider the page. |
Before anyone criticizes me for creating this feature, it should be noted that GSAK contains a crude
version of the same thing. Cache pages written by GSAK, in its default configuration, include links directly to the
travel bug pages on
geocaching.com. While GSAK does not spider the geocaching.com pages directly, it does instruct
Plucker or iSilo to spider them. Not only do these pages look like crap when rendered on a PDA, but Plucker or iSilo
spiders those pages as fast as it can, putting a disproportionately large load on the servers.
In contrast, Spinner runs in a controlled manner, with a configurable delay between pages, formats the pages specifically to fit on a small PDA screen,
and the default setting leaves this
feature turned off.
|