‟Ex igne vita”
The source text of an ECMAScript program is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly taking the longest possible sequence of characters as the next input element.
There
are two goal symbols for the lexical grammar. The InputElementDiv
symbol is used in those syntactic grammar contexts where a leading
division (/
) or
division-assignment (/=
)
operator is permitted. The InputElementRegExp
symbol is used in other syntactic grammar contexts.
NOTE There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (see 7.9); in examples such as the following:
a
= b
/hi/g.exec(c).map(d);
where
the first non-whitespace, non-comment
character after a LineTerminator
is slash
(/
)
and the syntactic context allows division
or division-assignment, no semicolon is inserted at the
LineTerminator.
That is, the above example is interpreted in the same way as:
a
= b / hi / g.
exec
(c).map(d);
Syntax
InputElementDiv ::
WhiteSpace
LineTerminator
Comment
Token
DivPunctuator
InputElementRegExp ::
WhiteSpace
LineTerminator
Comment
Token
RegularExpressionLiteral
The Unicode format-control characters (i.e., the characters in category “Cf” in the Unicode Character Database such as left-to-right mark or right-to-left mark) are control codes used to control the formatting of a range of text in the absence of higher-level protocols for this (such as mark-up languages).
It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals and regular expression literals.
<ZWNJ> and <ZWJ> are format-control characters that are used to make necessary distinctions when forming words or phrases in certain languages. In ECMAScript source text, <ZWNJ> and <ZWJ> may also be used in an identifier after the first character.
<BOM> is a format-control character used primarily at the start of a text to mark it as Unicode and to allow detection of the text's encoding and byte order. <BOM> characters intended for this purpose can sometimes also appear after the start of a text, for example as a result of concatenating files. <BOM> characters are treated as white space characters (see 7.2).
The special treatment of certain format-control characters outside of comments, string literals, and regular expression literals is summarized in Table 1.
Code Unit Value |
Name |
Formal Name |
Usage |
|
Zero width non-joiner |
<ZWNJ> |
IdentifierPart |
|
Zero width joiner |
<ZWJ> |
IdentifierPart |
|
Byte Order Mark |
<BOM> |
Whitespace |
White space characters are used to improve source text readability and to separate tokens (indivisible lexical units) from each other, but are otherwise insignificant. White space characters may occur between any two tokens and at the start or end of input. White space characters may also occur within a StringLiteral or a RegularExpressionLiteral (where they are considered significant characters forming part of the literal value) or within a Comment, but cannot appear within any other kind of token.
The ECMAScript white space characters are listed in Table 2.
Code Unit Value |
Name |
Formal Name |
|
Tab |
<TAB> |
|
Vertical Tab |
<VT> |
|
Form Feed |
<FF> |
|
Space |
<SP> |
|
No-break space |
<#x0a> |
Other category “Zs” |
Byte Order Mark Any other Unicode “space separator” |
<BOM> <USP> |
ECMAScript implementations must recognize all of the white space characters defined in Unicode 3.0. Later editions of the Unicode Standard may define other white space characters. ECMAScript implementations may recognize white space characters from later editions of the Unicode Standard.
Syntax
WhiteSpace ::
<TAB>
<VT>
<FF>
<SP>
<#x0a>
<BOM>
<USP>
Like white space characters, line terminator characters are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space characters, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar. Line terminators also affect the process of automatic semicolon insertion (7.9). A line terminator cannot occur within any token except a StringLiteral. Line terminators may only occur within a StringLiteral token as part of a LineContinuation.
A line terminator can occur within a MultiLineComment (7.4) but cannot occur within a SingleLineComment.
Line
terminators are included in the set of white space characters that
are matched by the \s
class in regular expressions.
The ECMAScript line terminator characters are listed in Table 3.
Code Unit Value |
Name |
Formal Name |
|
Line Feed |
<LF> |
|
Carriage Return |
<CR> |
|
Line separator |
<LS> |
|
Paragraph separator |
<PS> |
Only the characters in Table 3 are treated as line terminators. Other new line or line breaking characters are treated as white space but not as line terminators. The character sequence <CR><LF> is commonly used as a line terminator. It should be considered a single character for the purpose of reporting line numbers.
Syntax
LineTerminator ::
<LF>
<CR>
<LS>
<PS>
LineTerminatorSequence ::
<LF>
<CR>
[lookahead
∉
<LF>
]
<LS>
<PS>
<CR>
<LF>
Comments can be either single or multi-line. Multi-line comments cannot nest.
Because
a single-line comment can contain any character except a
LineTerminator
character, and because of the general rule that a token is always as
long as possible, a single-line comment always consists of all
characters from the //
marker to the end of the line. However, the LineTerminator
at the end of the line is not considered to be part of the
single-line comment; it is recognised separately by the lexical
grammar and becomes part of the stream of input elements for the
syntactic grammar. This point is very important, because it implies
that the presence or absence of single-line comments does not affect
the process of automatic semicolon insertion (see 7.9).
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar.
Syntax
Comment ::
MultiLineComment
SingleLineComment
MultiLineComment ::
/*
MultiLineCommentCharsopt*/
MultiLineCommentChars ::
MultiLineNotAsteriskChar
MultiLineCommentCharsopt*
PostAsteriskCommentCharsopt
PostAsteriskCommentChars ::
MultiLineNotForwardSlashOrAsteriskChar
MultiLineCommentCharsopt*
PostAsteriskCommentCharsopt
MultiLineNotAsteriskChar ::
SourceCharacter but
not asterisk *
MultiLineNotForwardSlashOrAsteriskChar ::
SourceCharacter
but
not forward-slash /
orasterisk *
SingleLineComment ::
//
SingleLineCommentCharsopt
SingleLineCommentChars ::
SingleLineCommentChar SingleLineCommentCharsopt
SingleLineCommentChar ::
SourceCharacter but not LineTerminator
Syntax
Token ::
IdentifierName
Punctuator
NumericLiteral
StringLiteral
NOTE The DivPunctuator and RegularExpressionLiteral productions define tokens, but are not included in the Token production.
Identifier Names are tokens that are interpreted according to the grammar given in the “Identifiers” section of chapter 5 of the Unicode standard, with some small modifications. An Identifier is an IdentifierName that is not a ReservedWord (see 7.6.1). The Unicode identifier grammar is based on both normative and informative character categories specified by the Unicode Standard. The characters in the specified categories in version 3.0 of the Unicode standard must be treated as in those categories by all conforming ECMAScript implementations.
This
standard specifies specific character additions: The dollar sign ($
)
and the underscore (_
)
are permitted anywhere in an IdentifierName.
Unicode
escape sequences are also permitted in an IdentifierName,
where they contribute a single character to the IdentifierName,
as computed by the CV of the UnicodeEscapeSequence
(see 7.8.4). The \
preceding the UnicodeEscapeSequence
does not contribute a character to the IdentifierName.
A UnicodeEscapeSequence
cannot be used to put a character into an IdentifierName that would otherwise be illegal. In other words, if a \
UnicodeEscapeSequence
sequence were replaced by its UnicodeEscapeSequence's
CV, the result must still be a valid IdentifierName that has the exact same sequence of characters as the
original IdentifierName.
All interpretations of identifiers within this specification are
based upon their actual characters regardless of whether or not an
escape sequence was used to contribute any particular characters.
Two IdentifierName that are canonically equivalent according to the Unicode standard are not equal unless they are represented by the exact same sequence of code units (in other words, conforming ECMAScript implementations are only required to do bitwise comparison on IdentifierName values). The intent is that the incoming source text has been converted to normalised form C before it reaches the compiler.
ECMAScript implementations may recognize identifier characters defined in later editions of the Unicode Standard. If portability is a concern, programmers should only employ identifier characters defined in Unicode 3.0.
Syntax
Identifier ::
IdentifierName but not ReservedWord
IdentifierName ::
IdentifierStart
IdentifierName
IdentifierPart
IdentifierStart ::
UnicodeLetter$
_\
UnicodeEscapeSequence
IdentifierPart ::
IdentifierStart
UnicodeCombiningMark
UnicodeDigit
UnicodeConnectorPunctuation
<ZWNJ>
<ZWJ>
UnicodeLetter
any character in the Unicode categories “Uppercase letter (Lu)”, “Lowercase letter (Ll)”, “Titlecase letter (Lt)”, “Modifier letter (Lm)”, “Other letter (Lo)”, or “Letter number (Nl)”.
UnicodeCombiningMark
any character in the Unicode categories “Non-spacing mark (Mn)” or “Combining spacing mark (Mc)”
UnicodeDigit
any character in the Unicode category “Decimal number (Nd)”
UnicodeConnectorPunctuation
any character in the Unicode category “Connector punctuation (Pc)”
UnicodeEscapeSequence
see 7.8.4.
A reserved word is an IdentifierName that cannot be used as an Identifier.
Syntax
ReservedWord ::
Keyword
FutureReservedWord
NullLiteral
BooleanLiteral
The following tokens are ECMAScript keywords and may not be used as Identifiers in ECMAScript programs.
Syntax
Keyword :: one of
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The following words are used as keywords in proposed extensions and are therefore reserved to allow for the possibility of future adoption of those extensions.
Syntax
FutureReservedWord :: one of
|
|
|
|
|
|
|
The following tokens are also considered to be FutureReservedWords when they occur within strict mode code (see 10.1.1). The occurrence of any of these tokens within strict mode code in any context where the occurrence of a FutureReservedWord would produce an error must also produce an equivalent error:
|
|
|
|
|
|
|
|
|
Syntax
Punctuator :: one of
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DivPunctuator :: one of
|
|
Syntax
Literal ::
NullLiteral
BooleanLiteral
NumericLiteral
StringLiteral
RegularExpressionLiteral
Syntax
NullLiteral ::
null
Semantics
The
value of the null literal null
is the sole value of the Null type, namely null.
Syntax
BooleanLiteral ::
true
false
Semantics
The
value of the Boolean literal true
is a value of the Boolean type, namely true.
The
value of the Boolean literal false
is a value of the Boolean type, namely false.
Syntax
NumericLiteral ::
DecimalLiteral
HexIntegerLiteral
DecimalLiteral ::
DecimalIntegerLiteral .
DecimalDigitsopt ExponentPartopt.
DecimalDigits ExponentPartopt
DecimalIntegerLiteral
ExponentPartopt
DecimalIntegerLiteral ::
0
NonZeroDigit
DecimalDigitsopt
DecimalDigits ::
DecimalDigit
DecimalDigits
DecimalDigit
DecimalDigit :: one of
0
1 2 3 4 5 6 7 8 9
NonZeroDigit :: one of
1
2 3 4 5 6 7 8 9
ExponentPart ::
ExponentIndicator SignedInteger
ExponentIndicator :: one of
e
E
SignedInteger ::
DecimalDigits+
DecimalDigits-
DecimalDigits
HexIntegerLiteral ::
0x
HexDigit0X
HexDigit
HexIntegerLiteral HexDigit
HexDigit :: one of
0
1 2 3 4 5 6 7 8 9 a b c d e f A B C D E F
The source character immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.
NOTE For example:
3in
is
an error and not the two input elements 3
and in
.
Semantics
A numeric literal stands for a value of the Number type. This value is determined in two steps: first, a mathematical value (MV) is derived from the literal; second, this mathematical value is rounded as described below.
The MV of NumericLiteral :: DecimalLiteral is the MV of DecimalLiteral.
The MV of NumericLiteral :: HexIntegerLiteral is the MV of HexIntegerLiteral.
The
MV of DecimalLiteral ::
DecimalIntegerLiteral .
is the MV of
DecimalIntegerLiteral.
The
MV of DecimalLiteral ::
DecimalIntegerLiteral .
DecimalDigits is
the MV of DecimalIntegerLiteral plus
(the MV of DecimalDigits times
10–n),
where n is the
number of characters in DecimalDigits.
The
MV of DecimalLiteral ::
DecimalIntegerLiteral .
ExponentPart is the MV of
DecimalIntegerLiteral times
10e,
where e is
the MV of ExponentPart.
The
MV of DecimalLiteral ::
DecimalIntegerLiteral .
DecimalDigits ExponentPart is
(the MV of DecimalIntegerLiteral plus
(the MV of DecimalDigits times
10–n))
times 10e,
where n is the
number of characters in DecimalDigits and
e
is the MV of
ExponentPart.
The
MV of DecimalLiteral ::.
DecimalDigits is the MV of
DecimalDigits times
10–n,
where n is the
number of characters in DecimalDigits.
The
MV
of DecimalLiteral ::.
DecimalDigits ExponentPart is
the MV of DecimalDigits times
10e–n,
where n is the
number of characters in DecimalDigits and
e is the MV of
ExponentPart.
The MV of DecimalLiteral :: DecimalIntegerLiteral is the MV of DecimalIntegerLiteral.
The MV of DecimalLiteral :: DecimalIntegerLiteral ExponentPart is the MV of DecimalIntegerLiteral times 10e, where e is the MV of ExponentPart.
The
MV of DecimalIntegerLiteral ::
0
is 0.
The MV of DecimalIntegerLiteral :: NonZeroDigit DecimalDigits is (the MV of NonZeroDigit times 10n) plus the MV of DecimalDigits, where n is the number of characters in DecimalDigits.
The MV of DecimalDigits :: DecimalDigit is the MV of DecimalDigit.
The MV of DecimalDigits :: DecimalDigits DecimalDigit is (the MV of DecimalDigits times 10) plus the MV of DecimalDigit.
The MV of ExponentPart :: ExponentIndicator SignedInteger is the MV of SignedInteger.
The MV of SignedInteger :: DecimalDigits is the MV of DecimalDigits.
The
MV of SignedInteger ::
+
DecimalDigits is
the MV of DecimalDigits.
The
MV of SignedInteger ::
-
DecimalDigits is
the negative of the MV of DecimalDigits.
The
MV of DecimalDigit ::
0
or
of HexDigit ::
0
is
0.
The
MV of DecimalDigit ::
1
or
of NonZeroDigit ::
1
or
of HexDigit ::
1
is
1.
The
MV of DecimalDigit ::
2
or
of NonZeroDigit ::
2
or
of HexDigit ::
2
is
2.
The
MV of DecimalDigit ::
3
or
of NonZeroDigit ::
3
or
of HexDigit ::
3
is
3.
The
MV of DecimalDigit ::
4
or
of NonZeroDigit ::
4
or
of HexDigit ::
4
is
4.
The
MV of DecimalDigit ::
5
or
of NonZeroDigit ::
5
or
of HexDigit ::
5
is
5.
The
MV of DecimalDigit ::
6
or
of NonZeroDigit ::
6
or
of HexDigit ::
6
is
6.
The
MV of DecimalDigit ::
7
or
of NonZeroDigit ::
7
or
of HexDigit ::
7
is
7.
The
MV of DecimalDigit ::
8
or
of NonZeroDigit ::
8
or
of HexDigit ::
8
is
8.
The
MV of DecimalDigit ::
9
or
of NonZeroDigit ::
9
or
of HexDigit ::
9
is
9.
The
MV of HexDigit ::
a
or
of HexDigit ::
A
is
10.
The
MV of HexDigit ::
b
or
of HexDigit ::
B
is
11.
The
MV of HexDigit ::
c
or
of HexDigit ::
C
is
12.
The
MV of HexDigit ::
d
or
of HexDigit ::
D
is
13.
The
MV of HexDigit ::
e
or
of HexDigit ::
E
is
14.
The
MV of HexDigit ::
f
or
of HexDigit ::
F
is
15.
The
MV of HexIntegerLiteral ::
0x
HexDigit
is the MV of HexDigit.
The
MV of HexIntegerLiteral ::
0X
HexDigit
is the MV of HexDigit.
The MV of HexIntegerLiteral :: HexIntegerLiteral HexDigit is (the MV of HexIntegerLiteral times 16) plus the MV of HexDigit.
Once
the exact MV for a numeric literal has been determined, it is then
rounded to a value of the Number type. If the MV is 0, then the
rounded value is +0;
otherwise, the rounded value must be the Number value for the
MV (as specified in 8.5), unless the literal is a DecimalLiteral
and the literal has more than 20 significant digits, in which case
the Number value may be either the Number value for the MV of a
literal produced by replacing each significant digit after the 20th
with a 0
digit or
the Number value for the MV of a literal produced by replacing each
significant digit after the 20th with a 0
digit and then incrementing the literal at the 20th significant
digit position. A digit is significant if it is not part of
an ExponentPart
and
it
is not 0
;
or
there is a nonzero digit to its left and there is a nonzero digit, not in the ExponentPart, to its right.
A conforming implementation, when processing strict mode code (see 10.1.1), must not extend the syntax of NumericLiteral to include OctalIntegerLiteral as described in B.1.1.
A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence. All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character may appear in the form of an escape sequence.
Syntax
StringLiteral ::
"
DoubleStringCharactersopt
"
SingleStringCharactersopt
''
DoubleStringCharacters ::
DoubleStringCharacter DoubleStringCharactersopt
SingleStringCharacters ::
SingleStringCharacter SingleStringCharactersopt
DoubleStringCharacter ::
SourceCharacter but
not double-quote "
or
backslash \
or
LineTerminator\
EscapeSequence
LineContinuation
SingleStringCharacter ::
SourceCharacter but
not single-quote '
orbackslash \
or
LineTerminator\
EscapeSequence
LineContinuation
LineContinuation ::
\
LineTerminatorSequence
EscapeSequence ::
CharacterEscapeSequence0
[lookahead
∉
DecimalDigit]
HexEscapeSequence
UnicodeEscapeSequence
CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
SingleEscapeCharacter :: one of
'
" \ b f n r t v
NonEscapeCharacter ::
SourceCharacter but not EscapeCharacter or LineTerminator
EscapeCharacter ::
SingleEscapeCharacter
DecimalDigitx
u
HexEscapeSequence ::
x
HexDigit HexDigit
UnicodeEscapeSequence ::
u
HexDigit HexDigit HexDigit HexDigit
The definitions of the nonterminal HexDigit is given in 7.6. SourceCharacter is defined in clause 6.
Semantics
A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of character values (CV) contributed by the various parts of the string literal. As part of this process, some characters within the string literal are interpreted as having a mathematical value (MV), as described below or in 7.8.3.
The
SV of StringLiteral ::
""
is
the empty character sequence.
The
SV of StringLiteral ::
''
is
the empty character sequence.
The
SV of StringLiteral ::
"
DoubleStringCharacters "
is the SV of
DoubleStringCharacters.
The
SV of StringLiteral ::
'
SingleStringCharacters '
is the SV of
SingleStringCharacters.
The SV of DoubleStringCharacters :: DoubleStringCharacter is a sequence of one character, the CV of DoubleStringCharacter.
The SV of DoubleStringCharacters :: DoubleStringCharacter DoubleStringCharacters is a sequence of the CV of DoubleStringCharacter followed by all the characters in the SV of DoubleStringCharacters in order.
The SV of SingleStringCharacters :: SingleStringCharacter is a sequence of one character, the CV of SingleStringCharacter.
The SV of SingleStringCharacters :: SingleStringCharacter SingleStringCharacters is a sequence of the CV of SingleStringCharacter followed by all the characters in the SV of SingleStringCharacters in order.
The
SV of LineContinuation ::
\
LineTerminatorSequence is
the empty character sequence.
The
CV of DoubleStringCharacter ::
SourceCharacter but not
double-quote "
or backslash
\
or
LineTerminator is theSourceCharacter character
itself.
The
CV of DoubleStringCharacter ::
\
EscapeSequence is
the CV of the EscapeSequence.
The CV of DoubleStringCharacter :: LineContinuation is the empty character sequence.
The
CV of SingleStringCharacter ::
SourceCharacter but not
single-quote '
or backslash
\
or
LineTerminator is theSourceCharacter character
itself.
The
CV of SingleStringCharacter ::
\
EscapeSequence is
the CV of the EscapeSequence.
The CV of SingleStringCharacter :: LineContinuation is the empty character sequence.
The CV of EscapeSequence :: CharacterEscapeSequence is the CV of the CharacterEscapeSequence.
The
CV of EscapeSequence ::
0
[lookahead
∉
DecimalDigit]
is a <NUL>
character (Unicode value 0000).
The CV of EscapeSequence :: HexEscapeSequence is the CV of the HexEscapeSequence.
The CV of EscapeSequence :: UnicodeEscapeSequence is the CV of the UnicodeEscapeSequence.
The CV of CharacterEscapeSequence ::SingleEscapeCharacter is the character whose code unit value is determined by theSingleEscapeCharacter according to Table 4:
Escape Sequence |
Code Unit Value |
Name |
Symbol |
|
|
backspace |
<BS> |
|
|
horizontal tab |
<HT> |
|
|
line feed (new line) |
<LF> |
|
|
vertical tab |
<VT> |
|
|
form feed |
<FF> |
|
|
carriage return |
<CR> |
|
|
double quote |
|
|
|
single quote |
|
|
|
backslash |
|
The CV of CharacterEscapeSequence :: NonEscapeCharacter is the CV of the NonEscapeCharacter.
The CV of NonEscapeCharacter :: SourceCharacter but not EscapeCharacter or LineTerminator is the SourceCharacter character itself.
The
CV of HexEscapeSequence ::
x
HexDigit
HexDigit is the character
whose code unit value is (16
times the MV of the first HexDigit)
plus the MV of the second HexDigit.
The
CV of UnicodeEscapeSequence ::
u
HexDigit
HexDigit HexDigit HexDigit is the
character whose code unit value is (4096
times the MV of the first HexDigit)
plus (256 times the MV of
the second HexDigit)
plus (16 times the MV of
the third HexDigit)
plus the MV of the fourth HexDigit.
A conforming implementation, when processing strict mode code (see 10.1.1), may not extend the syntax of EscapeSequence to include OctalEscapeSequence as described in B.1.2.
NOTE A line
terminator character cannot appear in a string literal, except as
part of a LineContinuation
to produce the empty character sequence. The correct way to cause a
line terminator character to be part of the String value of a string
literal is to use an escape sequence such as \n
or \u000A
.
A
regular expression literal is an input element that is converted to
a RegExp object (see 15.10) each time the literal is evaluated. Two
regular expression literals in a program evaluate to regular
expression objects that never compare as ===
to each other even if the two literals' contents are identical. A
RegExp object may also be created at runtime by new
RegExp
(see 15.10.4) or calling the RegExp
constructor as a function (15.10.3).
The productions below describe the syntax for a regular expression literal and are used by the input element scanner to find the end of the regular expression literal. The Strings of characters comprising the RegularExpressionBody and the RegularExpressionFlags are passed uninterpreted to the regular expression constructor, which interprets them according to its own, more stringent grammar. An implementation may extend the regular expression constructor's grammar, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions or the productions used by these productions.
Syntax
RegularExpressionLiteral ::
/
RegularExpressionBody /
RegularExpressionFlags
RegularExpressionBody ::
RegularExpressionFirstChar RegularExpressionChars
RegularExpressionChars ::
[empty]
RegularExpressionChars
RegularExpressionChar
RegularExpressionFirstChar ::
RegularExpressionNonTerminator but
not *
or
\
or
/
or
[
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionChar ::
RegularExpressionNonTerminator
but
not \
or
/
or
[
RegularExpressionBackslashSequence
RegularExpressionClass
RegularExpressionBackslashSequence ::
\
RegularExpressionNonTerminator
RegularExpressionNonTerminator ::
SourceCharacter but not LineTerminator
RegularExpressionClass ::
[
RegularExpressionClassChars
]
RegularExpressionClassChars
::
[empty]
RegularExpressionClassChars
RegularExpressionClassChar
RegularExpressionClassChar
::
RegularExpressionNonTerminator
but
not ]
or
\
RegularExpressionBackslashSequence
RegularExpressionFlags ::
[empty]
RegularExpressionFlags
IdentifierPart
NOTE Regular
expression literals may not be empty; instead of representing an
empty regular expression literal, the characters //
start a single-line comment. To specify an empty regular expression,
use: /(?:)/
.
Semantics
A
regular expression literal evaluates to a value of the Object type
that is an instance of the standard built-in constructor RegExp.
This value is determined in two steps: first, the characters
comprising the regular expression's RegularExpressionBody
and RegularExpressionFlags
production expansions are collected uninterpreted into two Strings
Pattern and Flags, respectively. Then each time the literal is
evaluated, a new object is created as if by the expression new
RegExp(
Pattern,
Flags
)
where RegExp is the standard built-in constructor with that name.
The newly constructed object becomes the value of the
RegularExpressionLiteral.
If the call to new RegExp
would generate an error as specified in 15.10.4.1, the error must be
treated as an early error (Clause 16).
Certain
ECMAScript statements (empty statement, variable statement,
expression statement, do
-while
statement, continue
statement, break
statement, return
statement, and throw
statement) must be terminated with semicolons. Such semicolons may
always appear explicitly in the source text. For convenience,
however, such semicolons may be omitted from the source text in
certain situations. These situations are described by saying that
semicolons are automatically inserted into the source code token
stream in those situations.
There are three basic rules of semicolon insertion:
When, as the program is parsed from left to right, a token (called the offending token) is encountered that is not allowed by any production of the grammar, then a semicolon is automatically inserted before the offending token if one or more of the following conditions is true:
The offending token is separated from the previous token by at least one LineTerminator.
The
offending token is }
.
When, as the program is parsed from left to right, the end of the input stream of tokens is encountered and the parser is unable to parse the input token stream as a single complete ECMAScript Program, then a semicolon is automatically inserted at the end of the input stream.
When, as the program is parsed from left to right, a token is encountered that is allowed by some production of the grammar, but the production is a restricted production and the token would be the first token for a terminal or nonterminal immediately following the annotation “[no LineTerminator here]” within the restricted production (and therefore such a token is called a restricted token), and the restricted token is separated from the previous token by at least one LineTerminator, then a semicolon is automatically inserted before the restricted token.
However,
there is an additional overriding condition on the preceding rules:
a semicolon is never inserted automatically if the semicolon would
then be parsed as an empty statement or if that semicolon would
become one of the two semicolons in the header of a for
statement (see 12.6.3).
NOTE The following are the only restricted productions in the grammar:
PostfixExpression :
LeftHandSideExpression
[no LineTerminator here]
++
LeftHandSideExpression
[no LineTerminator here]
--
ContinueStatement :
continue
[no LineTerminator here] Identifier;
BreakStatement :
break
[no LineTerminator here] Identifier;
ReturnStatement :
return
[no LineTerminator here] Expression;
ThrowStatement :
throw
[no LineTerminator here] Expression;
The practical effect of these restricted productions is as follows:
When
a ++
or --
token is encountered where the parser would treat it as a postfix
operator, and at least one LineTerminator
occurred between the preceding token and the ++
or --
token, then a semicolon is automatically inserted before the ++
or --
token.
When
a continue
,
break
,
return
,
or throw
token is encountered and a LineTerminator
is encountered before the next token, a semicolon is automatically
inserted after the continue
,
break
,
return
,
or throw
token.
The resulting practical advice to ECMAScript programmers is:
A
postfix ++
or --
operator should appear on the same line as its operand.
An
Expression
in a return
or throw
statement should start on the same line as the return
or throw
token.
A
Identifier
in a break
or continue
statement should be on the same line as the break
or continue
token.
The source
{
1 2 } 3
is not a valid sentence in the ECMAScript grammar, even with the automatic semicolon insertion rules. In contrast, the source
{
1
2 } 3
is also not a valid ECMAScript sentence, but is transformed by automatic semicolon insertion into the following:
{
1
;2 ;} 3;
which is a valid ECMAScript sentence.
The source
for
(a; b
)
is
not a valid ECMAScript sentence and is not altered by automatic
semicolon insertion because the semicolon is needed for the header
of a for
statement. Automatic semicolon insertion never inserts one of the
two semicolons in the header of a for
statement.
The source
return
a
+ b
is transformed by automatic semicolon insertion into the following:
return;
a
+ b;
NOTE The
expression a + b
is not treated as a value to be returned by the return
statement, because a LineTerminator
separates it from the token return
.
The source
a
= b
++c
is transformed by automatic semicolon insertion into the following:
a
= b;
++c;
NOTE The
token ++
is not
treated as a postfix operator applying to the variable b
,
because a LineTerminator
occurs between b
and ++
.
The source
if
(a > b)
else c = d
is
not a valid ECMAScript sentence and is not altered by automatic
semicolon insertion before the else
token, even though no production of the grammar applies at that
point, because an automatically inserted semicolon would then be
parsed as an empty statement.
The source
a
= b + c
(d + e).print()
is not transformed by automatic semicolon insertion, because the parenthesised expression that begins the second line can be interpreted as an argument list for a function call:
a
= b + c(d + e).print()
In the circumstance that an assignment statement must begin with a left parenthesis, it is a good idea for the programmer to provide an explicit semicolon at the end of the preceding statement rather than to rely on automatic semicolon insertion.