1. multipart/form-data
serializing
A multipart/form-data
boundary is a byte sequence such that:
-
its length is greater or equal to 27 and lesser or equal to 70, and
-
it is composed by bytes in the ranges 0x30 to 0x39, 0x41 to 0x5A, or 0x61 to 0x7A, inclusive (ASCII alphanumeric), or which are 0x27 ('), 0x2D (-) or 0x5F (_).
To generate a multipart/form-data
boundary, return an implementation-defined byte sequence which fullfills the conditions for boundaries, such that
part of it is randomly generated, with a minimum entropy of 95 bits.
Previous definitions of multipart/form-data
required that the boundary associated with a multipart/form-data
payload not be present anywhere in the payload other than as a
delimiter, although they allow for generating the boundary probabilistically. Since this generation algorithm is separate from a payload, however, it has to
specify a minimum entropy instead. [RFC7578] [RFC2046]
If a user agent generates multipart/form-data
boundaries with a
length of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions
with that user agent’s boundaries, the expected length of the payload before a collision is found is
well over a yottabyte.
To escape a multipart/form-data
name with a string name, an optional encoding encoding (default UTF-8) and an optional boolean isFilename (default false):
-
If isFilename is true:
-
Set name to the result of converting name into a scalar value string.
-
-
Otherwise:
-
Assert: name is a scalar value string.
-
Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), in name, by a string consisting of U+000D (CR) and U+000A (LF).
-
-
Let encoded be the result of encoding name with encoding.
-
Replace every 0x0A (LF) bytes in encoded with the byte sequence `
%0A
`, 0x0D (CR) with `%0D
` and 0x22 (") with `%22
`. -
Return encoded.
The multipart/form-data
chunk serializer takes an entry list entries and an optional encoding encoding (default UTF-8), and returns a tuple of a multipart/form-data
boundary and a list
of chunks, each of which can be either a byte sequence or a File
:
-
Set encoding to the result of getting an output encoding from encoding.
-
Let boundary be the result of generating a
multipart/form-data
boundary. -
Let output chunks be an empty list.
-
For each entry in entries:
-
Let chunk be a byte sequence containing `
--
`, followed by boundary, followed by 0x0D 0x0A (CR LF). -
Append `
Content-Disposition: form-data; name="
`, followed by the result of escaping amultipart/form-data
name given entry’s name and encoding, followed by 0x22 ("), to chunk. -
Let value be entry’s value.
-
If value is a string:
-
Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk.
-
Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), in value, by a string consisting of U+000D (CR) and U+000A (LF).
-
Append the result of encoding value with encoding to chunk.
-
Append 0x0D 0x0A (CR LF) to chunk.
-
Append chunk to output chunks.
-
-
Otherwise:
-
Append `
; filename="
`, followed by the result of escaping amultipart/form-data
name given value’sname
with encoding and isFilename set to true, followed by 0x22 0x0D 0x0A (" CR LF), to chunk. -
Let type be value’s
type
, if it is not the empty string, or "application/octet-stream
" otherwise. -
Append `
Content-Type:
`, followed by the result of isomorphic encoding type, to chunk. -
Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk.
-
Append chunk, followed by value, followed by the byte sequence 0x0D 0x0A (CR LF), to output chunks.
-
-
Append the byte sequence containing `
--
`, followed by boundary, followed by `--
`, followed by 0x0D 0x0A (CR LF), to output chunks. -
Return the tuple boundary / output chunks.
This algorithm now matches the behavior of all major browsers.
The length of a multipart/form-data
payload, given a list of chunks chunks which can be either byte sequences or File
s, is the
result of running the following steps:
To create a multipart/form-data
readable stream from a list of chunks chunks which can be either byte sequences or File
s, run the following steps:
-
Let file stream be null.
-
Let stream be a new
ReadableStream
. -
Let pull algorithm be an algorithm that runs the following steps:
- if file stream is null and chunks is not empty
-
-
If chunks[0] is a byte sequence, enqueue a
Uint8Array
object wrapping anArrayBuffer
containing chunks[0] into stream. -
Otherwise:
-
Remove the first item from chunks.
-
- if file stream is null and chunks is empty
-
-
Close stream.
-
- if file stream is not null
-
-
Let read request be a new read request with the following items:
- chunk steps, given chunk
-
-
If chunk is not a
Uint8Array
object, error stream with aTypeError
and abort these steps. -
Enqueue chunk into stream.
-
- close steps
-
-
Set file stream to null.
-
Run pull algorithm.
-
- error steps, given e
-
-
Error stream with e.
-
-
Let reader be the result of getting a reader for file stream.
-
Read a chunk from reader with read request.
-
-
Let cancel algorithm be an algorithm that runs the following steps, given reason:
-
If file stream is not null, cancel file stream with reason.
-
-
Set up stream with pullAlgorithm set to pull algorithm and cancelAlgorithm set to cancel algorithm.
-
Return stream.
2. multipart/form-data
parsing
These algorithms are a first attempt at defining a multipart/form-data
parser for use in Body
's formData()
method. The current algorithms don’t yet match
any browser because their behavior disagrees at various points.
Note that Gecko and Chromium also implement a Web Extensions API that parses multipart/form-data
independently from the parser in Body
(see Gecko bug 1697292):
chrome. webRequest. onBeforeRequest. addListener( ( details) => { // Returns an object mapping names to an array of values represented by // either the string value or by the file’s filename. console. log( details. requestBody. formData); }, { urls: [ "<all_urls>" ]}, [ "requestBody" ] );
The multipart/form-data
parser takes a byte sequence input and a MIME type mimeType, and returns either an entry list or failure:
-
If mimeType’s parameters["
boundary
"] does not exist, return failure. Otherwise, let boundary be the result of UTF-8 decoding mimeType’s parameters["boundary
"].The definition of MIME type in [MIMESNIFF] has the parameter values being ASCII strings, but the parse a MIME type algorithm can create MIME type records containing non-ASCII parameter values. See whatwg/mimesniff issue #141. Gecko and WebKit accept non-ASCII boundary strings and then expect them UTF-8 encoded in the request body; Chromium rejects them instead.
-
Let entry list be an empty entry list.
-
Let position be a pointer to a byte in input, initially pointing at the first byte.
-
While true:
-
If position points to a sequence of bytes starting with 0x2D 0x2D (`
--
`) followed by boundary, advance position by 2 + the length of boundary. Otherwise, return failure. -
If position points to the sequence of bytes 0x2D 0x2D 0x0D 0x0A (`
--
` followed by CR LF) followed by the end of input, return entry list. -
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure.
-
Advance position by 2. (This skips past the newline.)
-
Let name, filename and contentType be the result of parsing
multipart/form-data
headers on input and position, if the result is not failure. Otherwise, return failure. -
Advance position by 2. (This skips past the empty line that marks the end of the headers.)
-
Let body be the empty byte sequence.
-
Body loop: While position is not past the end of input:
-
Append the code point at position to body.
-
If body ends with boundary:
-
Remove the last 4 + (length of boundary) bytes from body.
-
Decrease position by 4 + (length of boundary).
-
Break out of body loop.
-
-
-
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure. Otherwise, advance position by 2.
-
If filename is not null:
-
If contentType is null, set contentType to "
text/plain
". -
If contentType is not an ASCII string, set contentType to the empty string.
-
Let value be a new
File
object with name filename, type contentType, and body body.
-
-
Otherwise:
-
Let value be the UTF-8 decoding without BOM of body.
-
-
Assert: name is a scalar value string and value is either a scalar value string or a
File
object. -
Create an entry with name and value, and append it to entry list.
-
To parse multipart/form-data
headers, given a byte sequence input and a pointer into it position, run the following steps:
-
Let name, filename and contentType be null.
-
While true:
-
If position points to a sequence of bytes starting with 0x0D 0x0A (CR LF):
-
If name is null, return failure.
-
Return name, filename and contentType.
-
-
Let header name be the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x3A (:), given position.
-
Remove any HTTP tab or space bytes from the start or end of header name.
-
If header name does not match the field-name token production, return failure.
-
If the byte at position is not 0x3A (:), return failure.
-
Advance position by 1.
-
Collect a sequence of bytes that are HTTP tab or space bytes given position. (Do nothing with those bytes.)
-
Byte-lowercase header name and switch on the result:
- `
content-disposition
` -
-
Set name and filename to null.
-
If position does not point to a sequence of bytes starting with `
form-data; name="
`, return failure. -
Advance position so it points at the byte after the next 0x22 (") byte (the one in the sequence of bytes matched above).
-
Set name to the result of parsing a
multipart/form-data
name given input and position, if the result is not failure. Otherwise, return failure. -
If position points to a sequence of bytes starting with `
; filename="
`:-
Advance position so it points at the byte after the next 0x22 (") byte (the one in the sequence of bytes matched above).
-
Set filename to the result of parsing a
multipart/form-data
name given input and position, if the result is not failure. Otherwise, return failure.
-
-
- `
content-type
` -
-
Let header value be the result of collecting a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given position.
-
Remove any HTTP tab or space bytes from the end of header value.
-
Set contentType to the isomorphic decoding of header value.
-
- Otherwise
-
Collect a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given position. (Do nothing with those bytes.)
- `
-
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure. Otherwise, advance position by 2 (past the newline).
-
To parse a multipart/form-data
name, given a byte sequence input and a pointer into it position, run the following steps:
-
Assert: The byte at (position - 1) is 0x22 (").
-
Let name be the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x22 ("), given position.
-
If the byte at position is not 0x22 ("), return failure. Otherwise, advance position by 1.
-
Replace any occurrence of the following subsequences in name with the given byte:
- `
%0A
` -
0x0A (LF)
- `
%0D
` -
0x0D (CR)
- `
%22
` -
0x22 (")
- `
-
Return the UTF-8 decoding without BOM of name.
This is the way parsing of files and filenames should ideally work. It is not how it currently works in browsers. See issue #1 for more details.
Intellectual property rights
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License. To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.