Commit 568b48ca authored by oliver@apple.com's avatar oliver@apple.com

JS Lexer and Parser should be more informative when they encounter errors

https://bugs.webkit.org/show_bug.cgi?id=114924

Reviewed by Filip Pizlo.

Source/JavaScriptCore:

Add new tokens to represent the various ways that parsing and lexing have failed.
This gives us the ability to produce better error messages in some cases,
and to indicate whether or not the failure was due to invalid source, or simply
early termination.

The jsc prompt now makes use of this so that you can write functions that
are more than one line long.

* bytecompiler/BytecodeGenerator.cpp:
(JSC::BytecodeGenerator::generate):
* jsc.cpp:
(stringFromUTF):
(jscSource):
(runInteractive):
* parser/Lexer.cpp:
(JSC::::parseFourDigitUnicodeHex):
(JSC::::parseIdentifierSlowCase):
(JSC::::parseString):
(JSC::::parseStringSlowCase):
(JSC::::lex):
* parser/Lexer.h:
(UnicodeHexValue):
(JSC::Lexer::UnicodeHexValue::UnicodeHexValue):
(JSC::Lexer::UnicodeHexValue::valueType):
(JSC::Lexer::UnicodeHexValue::isValid):
(JSC::Lexer::UnicodeHexValue::value):
(Lexer):
* parser/Parser.h:
(JSC::Parser::getTokenName):
(JSC::Parser::updateErrorMessageSpecialCase):
(JSC::::parse):
* parser/ParserError.h:
(ParserError):
(JSC::ParserError::ParserError):
* parser/ParserTokens.h:
* runtime/Completion.cpp:
(JSC):
(JSC::checkSyntax):
* runtime/Completion.h:
(JSC):

LayoutTests:

Update test results to cover improved error messages.

* fast/js/kde/parse-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T5-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T10-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T5-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T6-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T7-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T8-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T9-expected.txt:
* sputnik/Conformance/13_Function_Definition/S13_A7_T3-expected.txt:

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@148849 268f45cc-cd09-0410-ab3c-d52691b4dbfc
parent 5ae8a47e
2013-04-21 Oliver Hunt <oliver@apple.com>
JS Lexer and Parser should be more informative when they encounter errors
https://bugs.webkit.org/show_bug.cgi?id=114924
Reviewed by Filip Pizlo.
Update test results to cover improved error messages.
* fast/js/kde/parse-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.2_White_Space/S7.2_A5_T5-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.3_Line_Terminators/S7.3_A6_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T1-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T10-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T2-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T3-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T4-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T5-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T6-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T7-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T8-expected.txt:
* sputnik/Conformance/07_Lexical_Conventions/7.7_Punctuators/S7.7_A2_T9-expected.txt:
* sputnik/Conformance/13_Function_Definition/S13_A7_T3-expected.txt:
2013-04-21 Christophe Dumez <ch.dumez@sisa.samsung.com>
Regression(r148672): fast/events/touch/frame-hover-update.html fails
......@@ -20,9 +20,9 @@ PASS var f÷; threw exception SyntaxError: Invalid character '\u0247'.
PASS var \u0061 = 102; a is 102
PASS var f\u0030 = 103; f0 is 103
PASS var \u00E9\u0100\u02AF\u0388\u18A8 = 104; \u00E9\u0100\u02AF\u0388\u18A8; is 104
PASS var f\u00F7; threw exception SyntaxError: Unrecognized token 'f\u00F7'.
PASS var \u0030; threw exception SyntaxError: Unrecognized token '\u0030'.
PASS var test = { }; test.i= 0; test.i\u002b= 1; test.i; threw exception SyntaxError: Unrecognized token 'i\u002b'.
PASS var f\u00F7; threw exception SyntaxError: Invalid unicode escape in identifier: 'f\u00F7'.
PASS var \u0030; threw exception SyntaxError: Invalid unicode escape in identifier: '\u0030'.
PASS var test = { }; test.i= 0; test.i\u002b= 1; test.i; threw exception SyntaxError: Invalid unicode escape in identifier: 'i\u002b'.
PASS var test = { }; test.i= 0; test.i+= 1; test.i; is 1
PASS successfullyParsed is true
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u0009'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u0009'
S7.2_A5_T1
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u000B'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u000B'
S7.2_A5_T2
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u000C'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u000C'
S7.2_A5_T3
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u0020'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u0020'
S7.2_A5_T4
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u00A0'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u00A0'
S7.2_A5_T5
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u000A'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u000A'
S7.3_A6_T1
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u000D'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u000D'
S7.3_A6_T2
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u2028'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u2028'
S7.3_A6_T3
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u2029'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u2029'
S7.3_A6_T4
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u007B'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u007B'
S7.7_A2_T1
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u002F'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u002F'
S7.7_A2_T10
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u0028'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u0028'
S7.7_A2_T2
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u005B'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u005B'
S7.7_A2_T3
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u003B'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u003B'
S7.7_A2_T4
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 77: SyntaxError: Unrecognized token '\u002E'
CONSOLE MESSAGE: line 77: SyntaxError: Invalid unicode escape in identifier: '\u002E'
S7.7_A2_T5
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u002C'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u002C'
S7.7_A2_T6
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u002B'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u002B'
S7.7_A2_T7
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u002D'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u002D'
S7.7_A2_T8
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\u002A'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid unicode escape in identifier: '\u002A'
S7.7_A2_T9
PASS Expected parsing failure
......
CONSOLE MESSAGE: line 76: SyntaxError: Unrecognized token '\'
CONSOLE MESSAGE: line 76: SyntaxError: Invalid escape in identifier: '\'
S13_A7_T3
PASS Expected parsing failure
......
2013-04-21 Oliver Hunt <oliver@apple.com>
JS Lexer and Parser should be more informative when they encounter errors
https://bugs.webkit.org/show_bug.cgi?id=114924
Reviewed by Filip Pizlo.
Add new tokens to represent the various ways that parsing and lexing have failed.
This gives us the ability to produce better error messages in some cases,
and to indicate whether or not the failure was due to invalid source, or simply
early termination.
The jsc prompt now makes use of this so that you can write functions that
are more than one line long.
* bytecompiler/BytecodeGenerator.cpp:
(JSC::BytecodeGenerator::generate):
* jsc.cpp:
(stringFromUTF):
(jscSource):
(runInteractive):
* parser/Lexer.cpp:
(JSC::::parseFourDigitUnicodeHex):
(JSC::::parseIdentifierSlowCase):
(JSC::::parseString):
(JSC::::parseStringSlowCase):
(JSC::::lex):
* parser/Lexer.h:
(UnicodeHexValue):
(JSC::Lexer::UnicodeHexValue::UnicodeHexValue):
(JSC::Lexer::UnicodeHexValue::valueType):
(JSC::Lexer::UnicodeHexValue::isValid):
(JSC::Lexer::UnicodeHexValue::value):
(Lexer):
* parser/Parser.h:
(JSC::Parser::getTokenName):
(JSC::Parser::updateErrorMessageSpecialCase):
(JSC::::parse):
* parser/ParserError.h:
(ParserError):
(JSC::ParserError::ParserError):
* parser/ParserTokens.h:
* runtime/Completion.cpp:
(JSC):
(JSC::checkSyntax):
* runtime/Completion.h:
(JSC):
2013-04-21 Mark Lam <mark.lam@apple.com>
Refactor identical inline functions in JSVALUE64 and JSVALUE32_64 sections
......@@ -130,8 +130,8 @@ ParserError BytecodeGenerator::generate()
m_codeBlock->shrinkToFit();
if (m_expressionTooDeep)
return ParserError::OutOfMemory;
return ParserError::ErrorNone;
return ParserError(ParserError::OutOfMemory);
return ParserError(ParserError::ErrorNone);
}
bool BytecodeGenerator::addVar(const Identifier& ident, bool isConstant, RegisterID*& r0)
......
......@@ -146,7 +146,7 @@ public:
void parseArguments(int, char**);
};
static const char interactivePrompt[] = "> ";
static const char interactivePrompt[] = ">>> ";
class StopWatch {
public:
......@@ -268,23 +268,28 @@ GlobalObject::GlobalObject(VM& vm, Structure* structure)
{
}
static inline SourceCode jscSource(const char* utf8, const String& filename)
static inline String stringFromUTF(const char* utf8)
{
// Find the the first non-ascii character, or nul.
const char* pos = utf8;
while (*pos > 0)
pos++;
size_t asciiLength = pos - utf8;
// Fast case - string is all ascii.
if (!*pos)
return makeSource(String(utf8, asciiLength), filename);
return String(utf8, asciiLength);
// Slow case - contains non-ascii characters, use fromUTF8WithLatin1Fallback.
ASSERT(*pos < 0);
ASSERT(strlen(utf8) == asciiLength + strlen(pos));
String source = String::fromUTF8WithLatin1Fallback(utf8, asciiLength + strlen(pos));
return makeSource(source.impl(), filename);
return String::fromUTF8WithLatin1Fallback(utf8, asciiLength + strlen(pos));
}
static inline SourceCode jscSource(const char* utf8, const String& filename)
{
String str = stringFromUTF(utf8);
return makeSource(str, filename);
}
EncodedJSValue JSC_HOST_CALL functionPrint(ExecState* exec)
......@@ -607,17 +612,33 @@ static bool runWithScripts(GlobalObject* globalObject, const Vector<Script>& scr
static void runInteractive(GlobalObject* globalObject)
{
String interpreterName("Interpreter");
while (true) {
bool shouldQuit = false;
while (!shouldQuit) {
#if HAVE(READLINE) && !RUNNING_FROM_XCODE
char* line = readline(interactivePrompt);
if (!line)
break;
if (line[0])
add_history(line);
ParserError error;
String source;
do {
error = ParserError();
char* line = readline(source.isEmpty() ? interactivePrompt : "... ");
source = source + line;
source = source + '\n';
checkSyntax(globalObject->globalExec(), makeSource(source, interpreterName), error);
shouldQuit = !line;
if (!line || !line[0])
break;
if (line[0])
add_history(line);
} while (error.m_syntaxErrorType == ParserError::SyntaxErrorRecoverable);
if (error.m_type != ParserError::ErrorNone) {
printf("%s:%d\n", error.m_message.utf8().data(), error.m_line);
continue;
}
JSValue evaluationException;
JSValue returnValue = evaluate(globalObject->globalExec(), jscSource(line, interpreterName), JSValue(), &evaluationException);
free(line);
JSValue returnValue = evaluate(globalObject->globalExec(), makeSource(source, interpreterName), JSValue(), &evaluationException);
#else
printf("%s", interactivePrompt);
Vector<char, 256> line;
......
......@@ -596,21 +596,21 @@ ALWAYS_INLINE T Lexer<T>::peek(int offset) const
}
template <typename T>
int Lexer<T>::parseFourDigitUnicodeHex()
typename Lexer<T>::UnicodeHexValue Lexer<T>::parseFourDigitUnicodeHex()
{
T char1 = peek(1);
T char2 = peek(2);
T char3 = peek(3);
if (UNLIKELY(!isASCIIHexDigit(m_current) || !isASCIIHexDigit(char1) || !isASCIIHexDigit(char2) || !isASCIIHexDigit(char3)))
return -1;
return UnicodeHexValue((m_code + 4) >= m_codeEnd ? UnicodeHexValue::IncompleteHex : UnicodeHexValue::InvalidHex);
int result = convertUnicode(m_current, char1, char2, char3);
shift();
shift();
shift();
shift();
return result;
return UnicodeHexValue(result);
}
template <typename T>
......@@ -883,14 +883,14 @@ template <bool shouldCreateIdentifier> JSTokenType Lexer<T>::parseIdentifierSlow
m_buffer16.append(identifierStart, currentCharacter() - identifierStart);
shift();
if (UNLIKELY(m_current != 'u'))
return ERRORTOK;
return atEnd() ? UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_ESCAPE_ERRORTOK;
shift();
int character = parseFourDigitUnicodeHex();
if (UNLIKELY(character == -1))
return ERRORTOK;
UChar ucharacter = static_cast<UChar>(character);
UnicodeHexValue character = parseFourDigitUnicodeHex();
if (UNLIKELY(!character.isValid()))
return character.valueType() == UnicodeHexValue::IncompleteHex ? UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK : INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK;
UChar ucharacter = static_cast<UChar>(character.value());
if (UNLIKELY(m_buffer16.size() ? !isIdentPart(ucharacter) : !isIdentStart(ucharacter)))
return ERRORTOK;
return INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK;
if (shouldCreateIdentifier)
record16(ucharacter);
identifierStart = currentCharacter();
......@@ -941,7 +941,7 @@ static ALWAYS_INLINE bool characterRequiresParseStringSlowCase(UChar character)
}
template <typename T>
template <bool shouldBuildStrings> ALWAYS_INLINE bool Lexer<T>::parseString(JSTokenData* tokenData, bool strictMode)
template <bool shouldBuildStrings> ALWAYS_INLINE typename Lexer<T>::StringParseResult Lexer<T>::parseString(JSTokenData* tokenData, bool strictMode)
{
int startingOffset = currentOffset();
int startingLineNumber = lineNumber();
......@@ -969,7 +969,7 @@ template <bool shouldBuildStrings> ALWAYS_INLINE bool Lexer<T>::parseString(JSTo
shift();
if (!isASCIIHexDigit(m_current) || !isASCIIHexDigit(peek(1))) {
m_lexErrorMessage = "\\x can only be followed by a hex character sequence";
return false;
return (atEnd() || (isASCIIHexDigit(m_current) && (m_code + 1 == m_codeEnd))) ? StringUnterminated : StringCannotBeParsed;
}
T prev = m_current;
shift();
......@@ -1004,11 +1004,11 @@ template <bool shouldBuildStrings> ALWAYS_INLINE bool Lexer<T>::parseString(JSTo
} else
tokenData->ident = 0;
return true;
return StringParsedSuccessfully;
}
template <typename T>
template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenData* tokenData, bool strictMode)
template <bool shouldBuildStrings> typename Lexer<T>::StringParseResult Lexer<T>::parseStringSlowCase(JSTokenData* tokenData, bool strictMode)
{
T stringQuoteCharacter = m_current;
shift();
......@@ -1034,7 +1034,7 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
shift();
if (!isASCIIHexDigit(m_current) || !isASCIIHexDigit(peek(1))) {
m_lexErrorMessage = "\\x can only be followed by a hex character sequence";
return false;
return StringCannotBeParsed;
}
T prev = m_current;
shift();
......@@ -1043,16 +1043,16 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
shift();
} else if (m_current == 'u') {
shift();
int character = parseFourDigitUnicodeHex();
if (character != -1) {
UnicodeHexValue character = parseFourDigitUnicodeHex();
if (character.isValid()) {
if (shouldBuildStrings)
record16(character);
record16(character.value());
} else if (m_current == stringQuoteCharacter) {
if (shouldBuildStrings)
record16('u');
} else {
m_lexErrorMessage = "\\u can only be followed by a Unicode character sequence";
return false;
return character.valueType() == UnicodeHexValue::IncompleteHex ? StringUnterminated : StringCannotBeParsed;
}
} else if (strictMode && isASCIIDigit(m_current)) {
// The only valid numeric escape in strict mode is '\0', and this must not be followed by a decimal digit.
......@@ -1060,7 +1060,7 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
shift();
if (character1 != '0' || isASCIIDigit(m_current)) {
m_lexErrorMessage = "The only valid numeric escape in strict mode is '\\0'";
return false;
return StringCannotBeParsed;
}
if (shouldBuildStrings)
record16(0);
......@@ -1090,7 +1090,7 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
shift();
} else {
m_lexErrorMessage = "Unterminated string constant";
return false;
return StringUnterminated;
}
stringStart = currentCharacter();
......@@ -1103,7 +1103,7 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
// New-line or end of input is not allowed
if (atEnd() || isLineTerminator(m_current)) {
m_lexErrorMessage = "Unexpected EOF";
return false;
return atEnd() ? StringUnterminated : StringCannotBeParsed;
}
// Anything else is just a normal character
}
......@@ -1118,7 +1118,7 @@ template <bool shouldBuildStrings> bool Lexer<T>::parseStringSlowCase(JSTokenDat
tokenData->ident = 0;
m_buffer16.resize(0);
return true;
return StringParsedSuccessfully;
}
template <typename T>
......@@ -1462,6 +1462,7 @@ start:
if (parseMultilineComment())
goto start;
m_lexErrorMessage = "Multiline comment was not closed properly";
token = UNTERMINATED_MULTILINE_COMMENT_ERRORTOK;
goto returnError;
}
if (m_current == '=') {
......@@ -1581,6 +1582,7 @@ start:
if (parseOctal(tokenData->doubleValue)) {
if (strictMode) {
m_lexErrorMessage = "Octal escapes are forbidden in strict mode";
token = INVALID_OCTAL_NUMBER_ERRORTOK;
goto returnError;
}
token = NUMBER;
......@@ -1599,6 +1601,7 @@ inNumberAfterDecimalPoint:
if ((m_current | 0x20) == 'e') {
if (!parseNumberAfterExponentIndicator()) {
m_lexErrorMessage = "Non-number found after exponent indicator";
token = atEnd() ? UNTERMINATED_NUMERIC_LITERAL_ERRORTOK : INVALID_NUMERIC_LITERAL_ERRORTOK;
goto returnError;
}
}
......@@ -1611,17 +1614,24 @@ inNumberAfterDecimalPoint:
// No identifiers allowed directly after numeric literal, e.g. "3in" is bad.
if (UNLIKELY(isIdentStart(m_current))) {
m_lexErrorMessage = "At least one digit must occur after a decimal point";
token = atEnd() ? UNTERMINATED_NUMERIC_LITERAL_ERRORTOK : INVALID_NUMERIC_LITERAL_ERRORTOK;
goto returnError;
}
m_buffer8.resize(0);
break;
case CharacterQuote:
if (lexerFlags & LexerFlagsDontBuildStrings) {
if (UNLIKELY(!parseString<false>(tokenData, strictMode)))
StringParseResult result = parseString<false>(tokenData, strictMode);
if (UNLIKELY(result != StringParsedSuccessfully)) {
token = result == StringUnterminated ? UNTERMINATED_STRING_LITERAL_ERRORTOK : INVALID_STRING_LITERAL_ERRORTOK;
goto returnError;
}
} else {
if (UNLIKELY(!parseString<true>(tokenData, strictMode)))
StringParseResult result = parseString<true>(tokenData, strictMode);
if (UNLIKELY(result != StringParsedSuccessfully)) {
token = result == StringUnterminated ? UNTERMINATED_STRING_LITERAL_ERRORTOK : INVALID_STRING_LITERAL_ERRORTOK;
goto returnError;
}
}
shift();
token = STRING;
......@@ -1643,10 +1653,12 @@ inNumberAfterDecimalPoint:
goto start;
case CharacterInvalid:
m_lexErrorMessage = invalidCharacterMessage();
token = ERRORTOK;
goto returnError;
default:
RELEASE_ASSERT_NOT_REACHED();
m_lexErrorMessage = "Internal Error";
token = ERRORTOK;
goto returnError;
}
......@@ -1678,7 +1690,8 @@ returnError:
m_error = true;
tokenLocation->line = m_lineNumber;
tokenLocation->endOffset = currentOffset();
return ERRORTOK;
RELEASE_ASSERT(token & ErrorTokenFlag);
return token;
}
template <typename T>
......
......@@ -134,7 +134,36 @@ private:
ALWAYS_INLINE void shift();
ALWAYS_INLINE bool atEnd() const;
ALWAYS_INLINE T peek(int offset) const;
int parseFourDigitUnicodeHex();
struct UnicodeHexValue {
enum ValueType { ValidHex, IncompleteHex, InvalidHex };
explicit UnicodeHexValue(int value)
: m_value(value)
{
}
explicit UnicodeHexValue(ValueType type)
: m_value(type == IncompleteHex ? -2 : -1)
{
}
ValueType valueType() const
{
if (m_value >= 0)
return ValidHex;
return m_value == -2 ? IncompleteHex : InvalidHex;
}
bool isValid() const { return m_value >= 0; }
int value() const
{
ASSERT(m_value >= 0);
return m_value;
}
private:
int m_value;
};
UnicodeHexValue parseFourDigitUnicodeHex();
void shiftLineTerminator();
String invalidCharacterMessage() const;
......@@ -157,8 +186,13 @@ private:
template <bool shouldCreateIdentifier> ALWAYS_INLINE JSTokenType parseKeyword(JSTokenData*);
template <bool shouldBuildIdentifiers> ALWAYS_INLINE JSTokenType parseIdentifier(JSTokenData*, unsigned lexerFlags, bool strictMode);
template <bool shouldBuildIdentifiers> NEVER_INLINE JSTokenType parseIdentifierSlowCase(JSTokenData*, unsigned lexerFlags, bool strictMode);
template <bool shouldBuildStrings> ALWAYS_INLINE bool parseString(JSTokenData*, bool strictMode);
template <bool shouldBuildStrings> NEVER_INLINE bool parseStringSlowCase(JSTokenData*, bool strictMode);
enum StringParseResult {
StringParsedSuccessfully,
StringUnterminated,
StringCannotBeParsed
};
template <bool shouldBuildStrings> ALWAYS_INLINE StringParseResult parseString(JSTokenData*, bool strictMode);
template <bool shouldBuildStrings> NEVER_INLINE StringParseResult parseStringSlowCase(JSTokenData*, bool strictMode);
ALWAYS_INLINE void parseHex(double& returnValue);
ALWAYS_INLINE bool parseOctal(double& returnValue);
ALWAYS_INLINE bool parseDecimal(double& returnValue);
......
......@@ -705,9 +705,19 @@ private:
case RESERVED:
case NUMBER:
case IDENT:
case STRING:
case STRING:
case UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK:
case UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK:
case UNTERMINATED_MULTILINE_COMMENT_ERRORTOK:
case UNTERMINATED_NUMERIC_LITERAL_ERRORTOK:
case UNTERMINATED_STRING_LITERAL_ERRORTOK:
case INVALID_IDENTIFIER_ESCAPE_ERRORTOK:
case INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK:
case INVALID_NUMERIC_LITERAL_ERRORTOK:
case INVALID_OCTAL_NUMBER_ERRORTOK:
case INVALID_STRING_LITERAL_ERRORTOK:
case ERRORTOK:
case EOFTOK:
case EOFTOK:
return 0;
case LastUntaggedToken:
break;
......@@ -734,7 +744,36 @@ private:
case STRING:
m_errorMessage = "Unexpected string " + getToken();
return;
case ERRORTOK:
case UNTERMINATED_IDENTIFIER_ESCAPE_ERRORTOK:
case UNTERMINATED_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK:
m_errorMessage = "Incomplete unicode escape in identifier: '" + getToken() + '\'';
return;
case UNTERMINATED_MULTILINE_COMMENT_ERRORTOK:
m_errorMessage = "Unterminated multiline comment";
return;
case UNTERMINATED_NUMERIC_LITERAL_ERRORTOK:
m_errorMessage = "Unterminated numeric literal '" + getToken() + '\'';
return;
case UNTERMINATED_STRING_LITERAL_ERRORTOK:
m_errorMessage = "Unterminated string literal '" + getToken() + '\'';
return;
case INVALID_IDENTIFIER_ESCAPE_ERRORTOK:
m_errorMessage = "Invalid escape in identifier: '" + getToken() + '\'';
return;
case INVALID_IDENTIFIER_UNICODE_ESCAPE_ERRORTOK:
m_errorMessage = "Invalid unicode escape in identifier: '" + getToken() + '\'';
return;
case INVALID_NUMERIC_LITERAL_ERRORTOK:
m_errorMessage = "Invalid numeric literal: '" + getToken() + '\'';
return;
case INVALID_OCTAL_NUMBER_ERRORTOK:
m_errorMessage = "Invalid use of octal: '" + getToken() + '\'';
return;
case INVALID_STRING_LITERAL_ERRORTOK:
m_errorMessage = "Invalid string literal: '" + getToken() + '\'';
return;
case ERRORTOK:<