Commit 0fff87c0 authored by tsepez@chromium.org's avatar tsepez@chromium.org

decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).

https://bugs.webkit.org/show_bug.cgi?id=71316

Reviewed by Daniel Bates.

Source/WebCore:

Pass trailing unescaped bytes into the character set decoder to get correct
results in the presence of encodings which re-use ASCII values in sequences.

Tests: http/tests/navigation/anchor-frames-gbk.html
       http/tests/security/xssAuditor/iframe-onload-GBK-char.html
       http/tests/security/xssAuditor/img-onerror-GBK-char.html
       http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html
       http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html
       http/tests/security/xssAuditor/script-tag-Big5-char.html
       http/tests/security/xssAuditor/script-tag-Big5-char2.html

* platform/text/DecodeEscapeSequences.h:
(WebCore::Unicode16BitEscapeSequence::findInString):
(WebCore::Unicode16BitEscapeSequence::findEndOfRun):
(WebCore::Unicode16BitEscapeSequence::decodeRun):
(WebCore::URLEscapeSequence::findInString):
(WebCore::URLEscapeSequence::findEndOfRun):
(WebCore::URLEscapeSequence::decodeRun):
(WebCore::decodeEscapeSequences):

LayoutTests:

* http/tests/navigation/anchor-frames-gbk-expected.txt: Added.
* http/tests/navigation/anchor-frames-gbk.html: Added.
* http/tests/navigation/resources/frame-with-anchor-gbk.html: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char.html: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char.html: Added.
* http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl:
* http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2.html: Added.
* platform/chromium/test_expectations.txt:


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@105691 268f45cc-cd09-0410-ab3c-d52691b4dbfc
parent 129ec5f5
2012-01-23 Tom Sepez <tsepez@chromium.org>
decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
https://bugs.webkit.org/show_bug.cgi?id=71316
Reviewed by Daniel Bates.
* http/tests/navigation/anchor-frames-gbk-expected.txt: Added.
* http/tests/navigation/anchor-frames-gbk.html: Added.
* http/tests/navigation/resources/frame-with-anchor-gbk.html: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/iframe-onload-GBK-char.html: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char-expected.txt: Added.
* http/tests/security/xssAuditor/img-onerror-GBK-char.html: Added.
* http/tests/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl:
* http/tests/security/xssAuditor/script-tag-Big5-char-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char.html: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2-expected.txt: Added.
* http/tests/security/xssAuditor/script-tag-Big5-char2.html: Added.
* platform/chromium/test_expectations.txt:
2012-01-23 Zan Dobersek <zandobersek@gmail.com>
[GTK] editing/deleting/5408255.html results are incorrect
--------
Frame: 'main'
--------
Tests that loading a frame with a URL that contains a fragment pointed at a named anchor actually scrolls to that anchor.
On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
PASS document.body.offsetHeight > document.documentElement.clientHeight is true
PASS document.body.scrollTop > 0 is true
PASS document.body.scrollTop + document.documentElement.clientHeight > 2000 is true
PASS successfullyParsed is true
TEST COMPLETE
This is an anchor point named as the Unicode equivalent of the GBK sequence %a9g (test trailing low byte).
--------
Frame: 'footer'
--------
<!DOCTYPE html>
<html>
<meta http-equiv="Content-Type" content="text/html; charset=gbk"/>
<!-- See resources/frame-with-anchor-gbk.html for description of test -->
<!-- See also https://bugs.webkit.org/show_bug.cgi?id=71316 -->
<script>
if (window.layoutTestController)
layoutTestController.dumpChildFramesAsText();
</script>
<frameset rows="90%,10%">
<frame src="resources/frame-with-anchor-gbk.html#%89g" name="main">
<frame src="about:blank" name="footer">
</frameset>
</html>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gbk"/>
<script src="../../../js-test-resources/js-test-pre.js"></script>
<script>
function runTest() {
description('Tests that loading a frame with a URL that contains a fragment pointed at a named anchor actually scrolls to that anchor.');
// Check scroll position in a timeout to make sure that the anchor has
// been scrolled to.
setTimeout(function() {
// Make sure that the body is taller than the viewport (i.e. scrolling is
// required).
shouldBeTrue('document.body.offsetHeight > document.documentElement.clientHeight');
// We should be scrolled at least a little bit
shouldBeTrue('document.body.scrollTop > 0');
// And the bottom of the viewable area should be at least 2000 pixels from the top, due to the spacer element above.
shouldBeTrue('document.body.scrollTop + document.documentElement.clientHeight > 2000');
finishJSTest();
}, 0);
}
var jsTestIsAsync = true;
</script>
</head>
<body onload="runTest()">
<p id="description"></p>
<div id="console"></div>
<div style="height: 2000px">
<!-- Spacer to make sure that the named anchor below requires scrolling -->
</div>
<a name="&#x586f">This is an anchor point named as the Unicode equivalent of the GBK sequence %a9g (test trailing low byte)</a>.
<script src="../../../js-test-resources/js-test-post.js"></script>
</body>
</html>
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag.pl?charset=GBK&q=<iframe%20onload=%C7Ojavascript:alert(document.domain)></iframe>">
</iframe>
</body>
</html>
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag.pl?charset=GBK&q=<img%20src=%201%20onerror=%C7Ojavascript:alert(document.domain)>">
</iframe>
</body>
</html>
......@@ -67,7 +67,8 @@ sub decode16BitUnicodeEscapeSequences
return $result;
}
print "Content-Type: text/html; charset=UTF-8\n\n";
my $charsetToUse = $cgi->param('charset') ? $cgi->param('charset') : "UTF-8";
print "Content-Type: text/html; charset=$charsetToUse\n\n";
print "<!DOCTYPE html>\n";
print "<html>\n";
......
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag-decode-16bit-unicode.pl?charset=Big5&q=<script>alert(/XS%u00252581SS/)</script>">
</iframe>
</body>
</html>
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag.pl?charset=Big5&q=<script>alert(/XS%2581SS/)</script>">
</iframe>
</body>
</html>
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag.pl?charset=Big5&q=<script%20%89g>alert(location)</script>">
</iframe>
</body>
</html>
CONSOLE MESSAGE: Refused to execute a JavaScript script. Source code of script found within request.
<!DOCTYPE html>
<html>
<head>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.setXSSAuditorEnabled(true);
}
</script>
</head>
<body>
<iframe src="http://localhost:8000/security/xssAuditor/resources/echo-intertag.pl?charset=Big5&q=<script>alert(/XS%81SS/)</script>">
</iframe>
</body>
</html>
......@@ -1930,6 +1930,9 @@ BUG_JAPHET WIN : http/tests/xmlhttprequest/xmlhttprequest-50ms-download-dispatch
// Note: this test was also marked as flaky on WIN RELEASE above, BUGCR31342.
BUGCR39423 : security/block-test.html = TIMEOUT
// Due to the differences in handling text encodings in KURL and googleurl.
BUGWK20559 : http/tests/navigation/anchor-frames-gbk.html = TEXT
BUGWK36666 : storage/open-database-over-quota.html = TEXT
BUGWK37283 : fast/overflow/scrollbar-restored-and-then-locked.html = TEXT
......
2012-01-23 Tom Sepez <tsepez@chromium.org>
decodeEscapeSequences() not correct for some encodings (GBK, Big5, ...).
https://bugs.webkit.org/show_bug.cgi?id=71316
Reviewed by Daniel Bates.
Pass trailing unescaped bytes into the character set decoder to get correct
results in the presence of encodings which re-use ASCII values in sequences.
Tests: http/tests/navigation/anchor-frames-gbk.html
http/tests/security/xssAuditor/iframe-onload-GBK-char.html
http/tests/security/xssAuditor/img-onerror-GBK-char.html
http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode-16bit-unicode.html
http/tests/security/xssAuditor/script-tag-Big5-char-twice-url-encode.html
http/tests/security/xssAuditor/script-tag-Big5-char.html
http/tests/security/xssAuditor/script-tag-Big5-char2.html
* platform/text/DecodeEscapeSequences.h:
(WebCore::Unicode16BitEscapeSequence::findInString):
(WebCore::Unicode16BitEscapeSequence::findEndOfRun):
(WebCore::Unicode16BitEscapeSequence::decodeRun):
(WebCore::URLEscapeSequence::findInString):
(WebCore::URLEscapeSequence::findEndOfRun):
(WebCore::URLEscapeSequence::decodeRun):
(WebCore::decodeEscapeSequences):
2012-01-23 Adam Barth <abarth@webkit.org>
Fix a build break in a clean compile of the Chromium port (at least
/*
* Copyright (C) 2011 Daniel Bates (dbates@intudata.com). All Rights Reserved.
* Copyright (c) 2012 Google, inc. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
......@@ -9,6 +10,9 @@
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of Google Inc. nor the names of its
* contributors may be used to endorse or promote products derived from
* this software without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
* EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
......@@ -36,52 +40,81 @@ namespace WebCore {
// See <http://en.wikipedia.org/wiki/Percent-encoding#Non-standard_implementations>.
struct Unicode16BitEscapeSequence {
enum { size = 6 }; // e.g. %u26C4
static size_t findInString(const String& string, unsigned start = 0) { return string.find("%u", start); }
static bool matchStringPrefix(const String& string, unsigned start = 0)
enum { sequenceSize = 6 }; // e.g. %u26C4
static size_t findInString(const String& string, size_t startPosition) { return string.find("%u", startPosition); }
static size_t findEndOfRun(const String& string, size_t startPosition, size_t endPosition)
{
if (string.length() - start < size)
return false;
return string[start] == '%' && string[start + 1] == 'u'
&& isASCIIHexDigit(string[start + 2]) && isASCIIHexDigit(string[start + 3])
&& isASCIIHexDigit(string[start + 4]) && isASCIIHexDigit(string[start + 5]);
size_t runEnd = startPosition;
while (endPosition - runEnd >= sequenceSize && string[runEnd] == '%' && string[runEnd + 1] == 'u'
&& isASCIIHexDigit(string[runEnd + 2]) && isASCIIHexDigit(string[runEnd + 3])
&& isASCIIHexDigit(string[runEnd + 4]) && isASCIIHexDigit(string[runEnd + 5])) {
runEnd += sequenceSize;
}
return runEnd;
}
static String decodeRun(const UChar* run, size_t runLength, const TextEncoding&)
{
// Each %u-escape sequence represents a UTF-16 code unit.
// See <http://www.w3.org/International/iri-edit/draft-duerst-iri.html#anchor29>.
size_t numberOfSequences = runLength / size;
// For 16-bit escape sequences, we know that findEndOfRun() has given us a contiguous run of sequences
// without any intervening characters, so decode the run without additional checks.
size_t numberOfSequences = runLength / sequenceSize;
StringBuilder builder;
builder.reserveCapacity(numberOfSequences);
while (numberOfSequences--) {
UChar codeUnit = (toASCIIHexValue(run[2]) << 12) | (toASCIIHexValue(run[3]) << 8) | (toASCIIHexValue(run[4]) << 4) | toASCIIHexValue(run[5]);
builder.append(codeUnit);
run += size;
run += sequenceSize;
}
return builder.toString();
}
};
struct URLEscapeSequence {
enum { size = 3 }; // e.g. %41
static size_t findInString(const String& string, unsigned start = 0) { return string.find('%', start); }
static bool matchStringPrefix(const String& string, unsigned start = 0)
enum { sequenceSize = 3 }; // e.g. %41
static size_t findInString(const String& string, size_t startPosition) { return string.find('%', startPosition); }
static size_t findEndOfRun(const String& string, size_t startPosition, size_t endPosition)
{
if (string.length() - start < size)
return false;
return string[start] == '%' && isASCIIHexDigit(string[start + 1]) && isASCIIHexDigit(string[start + 2]);
// Make the simplifying assumption that supported encodings may have up to two unescaped characters
// in the range 0x40 - 0x7F as the trailing bytes of their sequences which need to be passed into the
// decoder as part of the run. In other words, we end the run at the first value outside of the
// 0x40 - 0x7F range, after two values in this range, or at a %-sign that does not introduce a valid
// escape sequence.
size_t runEnd = startPosition;
int numberOfTrailingCharacters = 0;
while (runEnd < endPosition) {
if (string[runEnd] == '%') {
if (endPosition - runEnd >= sequenceSize && isASCIIHexDigit(string[runEnd + 1]) && isASCIIHexDigit(string[runEnd + 2])) {
runEnd += sequenceSize;
numberOfTrailingCharacters = 0;
} else
break;
} else if (string[runEnd] >= 0x40 && string[runEnd] <= 0x7F && numberOfTrailingCharacters < 2) {
runEnd += 1;
numberOfTrailingCharacters += 1;
} else
break;
}
return runEnd;
}
static String decodeRun(const UChar* run, size_t runLength, const TextEncoding& encoding)
{
size_t numberOfSequences = runLength / size;
// For URL escape sequences, we know that findEndOfRun() has given us a run where every %-sign introduces
// a valid escape sequence, but there may be characters between the sequences.
Vector<char, 512> buffer;
buffer.resize(numberOfSequences);
buffer.resize(runLength); // Unescaping hex sequences only makes the length smaller.
char* p = buffer.data();
while (numberOfSequences--) {
*p++ = (toASCIIHexValue(run[1]) << 4) | toASCIIHexValue(run[2]);
run += size;
const UChar* runEnd = run + runLength;
while (run < runEnd) {
if (run[0] == '%') {
*p++ = (toASCIIHexValue(run[1]) << 4) | toASCIIHexValue(run[2]);
run += sequenceSize;
} else {
*p++ = run[0];
run += 1;
}
}
ASSERT(buffer.size() == static_cast<size_t>(p - buffer.data()));
ASSERT(buffer.size() >= static_cast<size_t>(p - buffer.data())); // Prove buffer not overrun.
return (encoding.isValid() ? encoding : UTF8Encoding()).decode(buffer.data(), p - buffer.data());
}
};
......@@ -95,9 +128,7 @@ String decodeEscapeSequences(const String& string, const TextEncoding& encoding)
size_t searchPosition = 0;
size_t encodedRunPosition;
while ((encodedRunPosition = EscapeSequence::findInString(string, searchPosition)) != notFound) {
unsigned encodedRunEnd = encodedRunPosition;
while (length - encodedRunEnd >= EscapeSequence::size && EscapeSequence::matchStringPrefix(string, encodedRunEnd))
encodedRunEnd += EscapeSequence::size;
size_t encodedRunEnd = EscapeSequence::findEndOfRun(string, encodedRunPosition, length);
searchPosition = encodedRunEnd;
if (encodedRunEnd == encodedRunPosition) {
++searchPosition;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment