5.7. File Uploads

So far we've discussed users entering text data that they type (or paste) into forms. But there's another way to submit data: with a type=file form element, which allows users to select a file on their local systems to upload when the form is submitted.

Currently, three things have to happen for a user to upload a file via a form. First, the program that will be processing the form has to be expecting a file to be uploaded (you can't just alter the HTML for any form and stick a type=file field into it). Second, the form has to have an <input type=file name=whatever> element. And third, the form element has to have its attributes set like so:

<form method=post enctype="multipart/form-data" action="url">

This is necessary because file-upload fields can't be conveyed by the normal form-data encoding system, but instead have to use the "multipart/form-data" encoding system (which, incidentally, can be conveyed only across POST requests, not across GET requests).

Suppose, for example, that you were automating interaction with an HTML form that looked like this:

<form enctype="multipart/form-data" method=post
  action="http://pastel.int/feedback.pl">
Subject:               <input name="subject" type="text">
<br>File to process -- <input name="saywhat" type="file">
<br>Your Name --       <input name="user"    type="text">
<input type="submit" value="Send!"></form>

Modeling the first and third fields is as we've seen before -- a simple matter of $browser->post($url, ['subject'=>..., 'user'=>...]). But the file-upload part involves some doing. First off, you have to add a header line of 'Content_Type' => 'form-data' to mean that yes, you really mean this to be a "multipart/form-data" POSTing. And secondly, where you would have a string in 'saywhat'=>text, you instead have an array reference where the first array item is the path to the file you want to upload. So it ends up looking like this:

my $response = $browser->post(
  'http://pastel.int/feedback.pl',
  [ 'subject' => 'Demand for pie.',
    'saywhat' => ["./today/earth_pies1.dml"],
    'user'    => 'Adm. Kang',
  ],
  'Content_Type' => 'form-data',
  ...any other header lines...
);

Assume that ./today/earth_pies1.dml looks like this:

<?xml version="1.0" encoding='iso-8859-1' standalone="yes"?>
<Demand xml:lang="i-klingon">
  DaH chabmeyraj tunob!
</Demand>

The request that the above program actually sends will look like this:

--xYzZY
Content-Disposition: form-data; name="subject"
 
Demand for pie.
--xYzZY
Content-Disposition: form-data; name="saywhat"; filename="earth_pies1.dml"
Content-Length: 131
Content-Type: text/plain
 
<?xml version="1.0" encoding='iso-8859-1' standalone="yes"?>
<Demand xml:lang="i-klingon">
  DaH chabmeyraj tunob!
</Demand>
 
--xYzZY
Content-Disposition: form-data; name="user"
 
Adm. Kang
--xYzZY--

Note that each form-field is like a little HTTP message of its own, with its own set of headers and its own body. For the "normal" fields (the first and third fields), the header basically expresses that this is ordinary data for a particular field name, and the body expresses the form data. But for the type=file field, we get the file's content as the body. Take a look at the header again:

Content-Disposition: form-data; name="saywhat"; filename="earth_pies1.dml"
Content-Length: 131
Content-Type: text/plain

The name="saywhat" expresses what the name="..." attribute was on the <input type=file ...> element to which this corresponds, which we coded into our program in the saywhat=>[...] line. But note that LWP also tells the remote host the basename of the file we're uploading by default (i.e., the filename minus directory names) as well as its best guess at the MIME type for that file. Because LWP (specifically, the LWP::MediaTypes module) has never heard of the .dml extension, it falls back on text/plain. (If this file had clearly been a binary file, LWP would call it application/octet-stream, the MIME type for general binary files.) In case you want to change the name that LWP presents to the remote server, you can provide that name as a second item in the arrayref:

fieldname => [local_filespec => as_what_name],

So if you change the saywhat line in the above program to this:

'saywhat' => ["./today/earth_pies1.dml" => "allyourpie.xml"],

Then the resulting headers on its part of the POST request would look like this:

Content-Disposition: form-data; name="saywhat"; filename="allyourpie.xml"
Content-Length: 131
Content-Type: text/plain

Although most applications that take file uploads across the Web pay no attention to the MIME types (because so many browsers get them wrong), if you want to specify a MIME type for a particular file upload, you could do so with a third item in the array reference:

fieldname => [local_filespec => as_what_name => MIME_type],

Like so:

'saywhat' => ["./today/earth_pies1.dml" => "allyourpie.xml"
               => "application/angry-ultimatum"],

Then the resulting headers on its part of the POST request would look like this:

Content-Disposition: form-data; name="saywhat"; filename="allyourpie.xml"
Content-Length: 131
Content-Type: application/angry-ultimatum

All these file-upload options work just as well for binary files (such as JPEGs) as for text files. Note, however, that when LWP constructs and sends the request, it currently has to read into memory all files you're sending in this request. If you're sending a 20-megabyte MP3 file, this might be a problem! You can tell LWP not to read the files into memory by setting $HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1 (it bears explaining that HTTP::Request::Common is the library that LWP uses for creating these file-upload requests), but unfortunately, at the time of this writing, many servers and CGIs do not understand the resulting HTTP POST request.

One especially neat trick is that you don't even need to have a file to upload to send a "file upload" request. To send content from a string in memory instead of from a file on disk, use this syntax:

fieldname => [
    undef,   # yes, undef!
    as_what_name, 
    'Content_Type' => MIME_type,
    'Content' => data_to_send
],

For example, we could change our saywhat line in the above program to read:

'saywhat' => [
    undef,
    'allyourpie.xml', 
    'Content_Type' => 'application/angry-ultimatum',
    'Content' => "All your pies are belong to me!\nGNAR!"
],

The resulting request will contain this chunk of data for the saywhat field:

Content-Disposition: form-data; name="saywhat"; filename="allyourpie.xml"
Content-Type: application/angry-ultimatum
 
All your pies are belong to me!
GNAR!