1. multipart/form-data serializing
A multipart/form-data boundary is a byte sequence such that:
-
its length is greater or equal to 27 and lesser or equal to 70, and
-
it is composed by bytes in the ranges 0x30 to 0x39, 0x41 to 0x5A, or 0x61 to 0x7A, inclusive (ASCII alphanumeric), or which are 0x27 ('), 0x2D (-) or 0x5F (_).
To generate a multipart/form-data boundary, return an implementation-defined byte sequence which fullfills the conditions for boundaries, such that
part of it is randomly generated, with a minimum entropy of 95 bits.
Previous definitions of multipart/form-data required that the boundary associated with a multipart/form-data payload not be present anywhere in the payload other than as a
delimiter, although they allow for generating the boundary probabilistically. Since this generation algorithm is separate from a payload, however, it has to
specify a minimum entropy instead. [RFC7578] [RFC2046]
If a user agent generates multipart/form-data boundaries with a
length of 27 and an entropy of 95 bits, given a payload made specifically to generate collisions
with that user agent’s boundaries, the expected length of the payload before a collision is found is
well over a yottabyte.
To escape a multipart/form-data name with a string name, an optional encoding encoding (default UTF-8) and an optional boolean isFilename (default false):
-
If isFilename is true:
-
Set name to the result of converting name into a scalar value string.
-
-
Otherwise:
-
Assert: name is a scalar value string.
-
Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), in name, by a string consisting of U+000D (CR) and U+000A (LF).
-
-
Let encoded be the result of encoding name with encoding.
-
Replace every 0x0A (LF) bytes in encoded with the byte sequence `
%0A`, 0x0D (CR) with `%0D` and 0x22 (") with `%22`. -
Return encoded.
The multipart/form-data chunk serializer takes an entry list entries and an optional encoding encoding (default UTF-8), and returns a tuple of a multipart/form-data boundary and a list
of chunks, each of which can be either a byte sequence or a File:
-
Set encoding to the result of getting an output encoding from encoding.
-
Let boundary be the result of generating a
multipart/form-databoundary. -
Let output chunks be an empty list.
-
For each entry in entries:
-
Let chunk be a byte sequence containing `
--`, followed by boundary, followed by 0x0D 0x0A (CR LF). -
Append `
Content-Disposition: form-data; name="`, followed by the result of escaping amultipart/form-dataname given entry’s name and encoding, followed by 0x22 ("), to chunk. -
Let value be entry’s value.
-
If value is a string:
-
Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk.
-
Replace every occurrence of U+000D (CR) not followed by U+000A (LF), and every occurrence of U+000A (LF) not preceded by U+000D (CR), in value, by a string consisting of U+000D (CR) and U+000A (LF).
-
Append the result of encoding value with encoding to chunk.
-
Append 0x0D 0x0A (CR LF) to chunk.
-
Append chunk to output chunks.
-
-
Otherwise:
-
Append `
; filename="`, followed by the result of escaping amultipart/form-dataname given value’snamewith encoding and isFilename set to true, followed by 0x22 0x0D 0x0A (" CR LF), to chunk. -
Let type be value’s
type, if it is not the empty string, or "application/octet-stream" otherwise. -
Append `
Content-Type:`, followed by the result of isomorphic encoding type, to chunk. -
Append 0x0D 0x0A 0x0D 0x0A (CR LF CR LF) to chunk.
-
Append chunk, followed by value, followed by the byte sequence 0x0D 0x0A (CR LF), to output chunks.
-
-
Append the byte sequence containing `
--`, followed by boundary, followed by `--`, followed by 0x0D 0x0A (CR LF), to output chunks. -
Return the tuple boundary / output chunks.
This algorithm now matches the behavior of all major browsers.
The length of a multipart/form-data payload, given a list of chunks chunks which can be either byte sequences or Files, is the
result of running the following steps:
To create a multipart/form-data readable stream from a list of chunks chunks which can be either byte sequences or Files, run the following steps:
-
Let file stream be null.
-
Let stream be a new
ReadableStream. -
Let pull algorithm be an algorithm that runs the following steps:
- if file stream is null and chunks is not empty
-
-
If chunks[0] is a byte sequence, enqueue a
Uint8Arrayobject wrapping anArrayBuffercontaining chunks[0] into stream. -
Otherwise:
-
Remove the first item from chunks.
-
- if file stream is null and chunks is empty
-
-
Close stream.
-
- if file stream is not null
-
-
Let read request be a new read request with the following items:
- chunk steps, given chunk
-
-
If chunk is not a
Uint8Arrayobject, error stream with aTypeErrorand abort these steps. -
Enqueue chunk into stream.
-
- close steps
-
-
Set file stream to null.
-
Run pull algorithm.
-
- error steps, given e
-
-
Error stream with e.
-
-
Let reader be the result of getting a reader for file stream.
-
Read a chunk from reader with read request.
-
-
Let cancel algorithm be an algorithm that runs the following steps, given reason:
-
If file stream is not null, cancel file stream with reason.
-
-
Set up stream with pullAlgorithm set to pull algorithm and cancelAlgorithm set to cancel algorithm.
-
Return stream.
2. multipart/form-data parsing
These algorithms are a first attempt at defining a multipart/form-data parser for use in Body's formData() method. The current algorithms don’t yet match
any browser because their behavior disagrees at various points.
Note that Gecko and Chromium also implement a Web Extensions API that parses multipart/form-data independently from the parser in Body (see Gecko bug 1697292):
chrome. webRequest. onBeforeRequest. addListener( ( details) => { // Returns an object mapping names to an array of values represented by // either the string value or by the file’s filename. console. log( details. requestBody. formData); }, { urls: [ "<all_urls>" ]}, [ "requestBody" ] );
The multipart/form-data parser takes a byte sequence input and a MIME type mimeType, and returns either an entry list or failure:
-
If mimeType’s parameters["
boundary"] does not exist, return failure. Otherwise, let boundary be the result of UTF-8 decoding mimeType’s parameters["boundary"].The definition of MIME type in [MIMESNIFF] has the parameter values being ASCII strings, but the parse a MIME type algorithm can create MIME type records containing non-ASCII parameter values. See whatwg/mimesniff issue #141. Gecko and WebKit accept non-ASCII boundary strings and then expect them UTF-8 encoded in the request body; Chromium rejects them instead.
-
Let entry list be an empty entry list.
-
Let position be a pointer to a byte in input, initially pointing at the first byte.
-
While true:
-
If position points to a sequence of bytes starting with 0x2D 0x2D (`
--`) followed by boundary, advance position by 2 + the length of boundary. Otherwise, return failure. -
If position points to the sequence of bytes 0x2D 0x2D 0x0D 0x0A (`
--` followed by CR LF) followed by the end of input, return entry list. -
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure.
-
Advance position by 2. (This skips past the newline.)
-
Let name, filename and contentType be the result of parsing
multipart/form-dataheaders on input and position, if the result is not failure. Otherwise, return failure. -
Advance position by 2. (This skips past the empty line that marks the end of the headers.)
-
Let body be the empty byte sequence.
-
Body loop: While position is not past the end of input:
-
Append the code point at position to body.
-
If body ends with boundary:
-
Remove the last 4 + (length of boundary) bytes from body.
-
Decrease position by 4 + (length of boundary).
-
Break out of body loop.
-
-
-
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure. Otherwise, advance position by 2.
-
If filename is not null:
-
If contentType is null, set contentType to "
text/plain". -
If contentType is not an ASCII string, set contentType to the empty string.
-
Let value be a new
Fileobject with name filename, type contentType, and body body.
-
-
Otherwise:
-
Let value be the UTF-8 decoding without BOM of body.
-
-
Assert: name is a scalar value string and value is either a scalar value string or a
Fileobject. -
Create an entry with name and value, and append it to entry list.
-
To parse multipart/form-data headers, given a byte sequence input and a pointer into it position, run the following steps:
-
Let name, filename and contentType be null.
-
While true:
-
If position points to a sequence of bytes starting with 0x0D 0x0A (CR LF):
-
If name is null, return failure.
-
Return name, filename and contentType.
-
-
Let header name be the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x3A (:), given position.
-
Remove any HTTP tab or space bytes from the start or end of header name.
-
If header name does not match the field-name token production, return failure.
-
If the byte at position is not 0x3A (:), return failure.
-
Advance position by 1.
-
Collect a sequence of bytes that are HTTP tab or space bytes given position. (Do nothing with those bytes.)
-
Byte-lowercase header name and switch on the result:
- `
content-disposition` -
-
Set name and filename to null.
-
If position does not point to a sequence of bytes starting with `
form-data; name="`, return failure. -
Advance position so it points at the byte after the next 0x22 (") byte (the one in the sequence of bytes matched above).
-
Set name to the result of parsing a
multipart/form-dataname given input and position, if the result is not failure. Otherwise, return failure. -
If position points to a sequence of bytes starting with `
; filename="`:-
Advance position so it points at the byte after the next 0x22 (") byte (the one in the sequence of bytes matched above).
-
Set filename to the result of parsing a
multipart/form-dataname given input and position, if the result is not failure. Otherwise, return failure.
-
-
- `
content-type` -
-
Let header value be the result of collecting a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given position.
-
Remove any HTTP tab or space bytes from the end of header value.
-
Set contentType to the isomorphic decoding of header value.
-
- Otherwise
-
Collect a sequence of bytes that are not 0x0A (LF) or 0x0D (CR), given position. (Do nothing with those bytes.)
- `
-
If position does not point to a sequence of bytes starting with 0x0D 0x0A (CR LF), return failure. Otherwise, advance position by 2 (past the newline).
-
To parse a multipart/form-data name, given a byte sequence input and a pointer into it position, run the following steps:
-
Assert: The byte at (position - 1) is 0x22 (").
-
Let name be the result of collecting a sequence of bytes that are not 0x0A (LF), 0x0D (CR) or 0x22 ("), given position.
-
If the byte at position is not 0x22 ("), return failure. Otherwise, advance position by 1.
-
Replace any occurrence of the following subsequences in name with the given byte:
- `
%0A` -
0x0A (LF)
- `
%0D` -
0x0D (CR)
- `
%22` -
0x22 (")
- `
-
Return the UTF-8 decoding without BOM of name.
This is the way parsing of files and filenames should ideally work. It is not how it currently works in browsers. See issue #1 for more details.
Intellectual property rights
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License. To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.