0.2. Structure of This Book

The book is divided into 12 chapters and 7 appendixes, as follows:

Chapter 1, "Introduction to Web Automation" covers in general terms what LWP does, the alternatives to using LWP, and when you shouldn't use LWP.

Chapter 2, "Web Basics" explains how the Web works and some easy-to-use yet limited functions for accessing it.

Chapter 3, "The LWP Class Model" covers the more powerful interface to the Web.

Chapter 4, "URLs" shows how to parse URLs with the URI class, and how to convert between relative and absolute URLs.

Chapter 5, "Forms" describes how to submit GET and POST forms.

Chapter 6, "Simple HTML Processing with Regular Expressions" shows how to extract information from HTML using regular expressions.

Chapter 7, "HTML Processing with Tokens" provides an alternative approach to extracting data from HTML using the HTML::TokeParser module.

Chapter 8, "Tokenizing Walkthrough" is a case study of data extraction using tokens.

Chapter 9, "HTML Processing with Trees" shows how to extract data from HTML using the HTML::TreeBuilder module.

Chapter 10, "Modifying HTML with Trees" covers the use of HTML::TreeBuilder to modify HTML files.

Chapter 11, "Cookies, Authentication,and Advanced Requests" deals with the tougher parts of requests.

Chapter 12, "Spiders" explores the technological issues involved in automating the download of more than one page from a site.

Appendix A, "LWP Modules" is a complete list of the LWP modules.

Appendix B, "HTTP Status Codes" is a list of HTTP codes, what they mean, and whether LWP considers them error or success.

Appendix C, "Common MIME Types" contains the most common MIME types and what they mean.

Appendix D, "Language Tags" lists the most common language tags and their meanings (e.g., "zh-cn" means Mainland Chinese, while "sv" is Swedish).

Appendix E, "Common Content Encodings" is a list of the most common character encodings (character sets) and the tags that identify them.

Appendix F, "ASCII Table" is a table to help you make sense of the most common Unicode characters. It shows each character, its numeric code (in decimal, octal, and hex), and any HTML escapes there may be for it.

Appendix G, "User's View of Object-Oriented Modules" is an introduction to the use of Perl's object-oriented programming features.