2010-12-09 Jenn Braithwaite <jennb@chromium.org>

        Reviewed by Adam Barth.

        TextResourceDecoder::checkForHeadCharset can look way past the limit.
        https://bugs.webkit.org/show_bug.cgi?id=47397

        Replaced charset detection algorithm with real parser.
        Added tests for parser bugs mentioned in the thread for this bug report.
        Converted hixie's encoding parsing tests to a layout test.

        Tests: fast/encoding/bracket-in-script.html
               fast/encoding/bracket-in-tag.html
               fast/encoding/escaped-bracket.html
               fast/encoding/meta-in-body.html
               fast/encoding/meta-in-script.html
               fast/encoding/meta-in-title.html
               fast/encoding/mismatched-end-tag.html
               fast/encoding/namespace-meta.html
               fast/encoding/not-http-equiv-content.html
               fast/encoding/parser-tests.html
               fast/encoding/quotes-in-title.html
               fast/encoding/tag-name-digit.html
               http/tests/misc/charset-sniffer-end-sniffing.html

        * Android.mk:
        * CMakeLists.txt:
        * GNUmakefile.am:
        * WebCore.gypi:
        * WebCore.pro:
        * WebCore.vcproj/WebCore.vcproj:
        * WebCore.xcodeproj/project.pbxproj:
        * html/parser/HTMLMetaCharsetParser.cpp: Added.
        (WebCore::HTMLMetaCharsetParser::HTMLMetaCharsetParser):
        (WebCore::HTMLMetaCharsetParser::~HTMLMetaCharsetParser):
        (WebCore::HTMLMetaCharsetParser::extractCharset):
        (WebCore::HTMLMetaCharsetParser::processMeta):
        (WebCore::HTMLMetaCharsetParser::checkForMetaCharset):
        * html/parser/HTMLMetaCharsetParser.h: Added.
        (WebCore::HTMLMetaCharsetParser::create):
        (WebCore::HTMLMetaCharsetParser::encoding):
        * loader/TextResourceDecoder.cpp:
        (WebCore::TextResourceDecoder::checkForHeadCharset):
        (WebCore::TextResourceDecoder::checkForMetaCharset):
        * loader/TextResourceDecoder.h:
2010-12-09  Jenn Braithwaite  <jennb@chromium.org>

        Reviewed by Adam Barth.

        TextResourceDecoder::checkForHeadCharset can look way past the limit.
        https://bugs.webkit.org/show_bug.cgi?id=47397

        Replaced charset detection algorithm with real parser.
        Added tests for parser bugs mentioned in the thread for this bug report.
        Converted hixie's encoding parsing tests to a layout test.
        Added http-equiv attribute to meta tag in 2 existing tests.

        * fast/encoding/bracket-in-script-expected.txt: Added.
        * fast/encoding/bracket-in-script.html: Added.
        * fast/encoding/bracket-in-tag-expected.txt: Added.
        * fast/encoding/bracket-in-tag.html: Added.
        * fast/encoding/escaped-bracket-expected.txt: Added.
        * fast/encoding/escaped-bracket.html: Added.
        * fast/encoding/meta-in-body-expected.txt: Added.
        * fast/encoding/meta-in-body.html: Added.
        * fast/encoding/meta-in-script-expected.txt: Added.
        * fast/encoding/meta-in-script.html: Added.
        * fast/encoding/meta-in-title-expected.txt: Added.
        * fast/encoding/meta-in-title.html: Added.
        * fast/encoding/mismatched-end-tag-expected.txt: Added.
        * fast/encoding/mismatched-end-tag.html: Added.
        * fast/encoding/namespace-meta-expected.txt: Added.
        * fast/encoding/namespace-meta.html: Added.
        * fast/encoding/namespace-tolerance.html:
        * fast/encoding/not-http-equiv-content-expected.txt: Added.
        * fast/encoding/not-http-equiv-content.html: Added.
        * fast/encoding/parser-tests-expected.txt: Added.
        * fast/encoding/parser-tests.html: Added.
        * fast/encoding/quotes-in-title-expected.txt: Added.
        * fast/encoding/quotes-in-title.html: Added.
        * fast/encoding/resources/001.html: Added.
        * fast/encoding/resources/002.html: Added.
        * fast/encoding/resources/003.html: Added.
        * fast/encoding/resources/004.html: Added.
        * fast/encoding/resources/005.html: Added.
        * fast/encoding/resources/006.html: Added.
        * fast/encoding/resources/007.html: Added.
        * fast/encoding/resources/008.html: Added.
        * fast/encoding/resources/009.html: Added.
        * fast/encoding/resources/010.html: Added.
        * fast/encoding/resources/011.html: Added.
        * fast/encoding/resources/012.html: Added.
        * fast/encoding/resources/013.html: Added.
        * fast/encoding/resources/014.html: Added.
        * fast/encoding/resources/015.html: Added.
        * fast/encoding/resources/016.html: Added.
        * fast/encoding/resources/017.html: Added.
        * fast/encoding/resources/018.html: Added.
        * fast/encoding/resources/019.html: Added.
        * fast/encoding/resources/020.html: Added.
        * fast/encoding/resources/021.html: Added.
        * fast/encoding/resources/022.html: Added.
        * fast/encoding/resources/023.html: Added.
        * fast/encoding/resources/024.html: Added.
        * fast/encoding/resources/025.html: Added.
        * fast/encoding/resources/026.html: Added.
        * fast/encoding/resources/027.html: Added.
        * fast/encoding/resources/028.html: Added.
        * fast/encoding/resources/029.html: Added.
        * fast/encoding/resources/030.html: Added.
        * fast/encoding/resources/031.html: Added.
        * fast/encoding/resources/032.html: Added.
        * fast/encoding/resources/033.html: Added.
        * fast/encoding/resources/034.html: Added.
        * fast/encoding/resources/035.html: Added.
        * fast/encoding/resources/036.html: Added.
        * fast/encoding/resources/037.html: Added.
        * fast/encoding/resources/038.html: Added.
        * fast/encoding/resources/039.html: Added.
        * fast/encoding/resources/040.html: Added.
        * fast/encoding/resources/041.html: Added.
        * fast/encoding/resources/042.html: Added.
        * fast/encoding/resources/043.html: Added.
        * fast/encoding/resources/044.html: Added.
        * fast/encoding/resources/045.html: Added.
        * fast/encoding/resources/046.html: Added.
        * fast/encoding/resources/047.html: Added.
        * fast/encoding/resources/048.html: Added.
        * fast/encoding/resources/049.html: Added.
        * fast/encoding/resources/050.html: Added.
        * fast/encoding/resources/051.html: Added.
        * fast/encoding/resources/052.html: Added.
        * fast/encoding/resources/053.html: Added.
        * fast/encoding/resources/054.html: Added.
        * fast/encoding/resources/055.html: Added.
        * fast/encoding/resources/056.html: Added.
        * fast/encoding/resources/057.html: Added.
        * fast/encoding/resources/058.html: Added.
        * fast/encoding/resources/059.html: Added.
        * fast/encoding/resources/060.html: Added.
        * fast/encoding/resources/061.html: Added.
        * fast/encoding/resources/062.html: Added.
        * fast/encoding/resources/063.html: Added.
        * fast/encoding/resources/064.html: Added.
        * fast/encoding/resources/065.html: Added.
        * fast/encoding/resources/066.html: Added.
        * fast/encoding/resources/067.html: Added.
        * fast/encoding/resources/068.html: Added.
        * fast/encoding/resources/069.html: Added.
        * fast/encoding/resources/070.html: Added.
        * fast/encoding/resources/071.html: Added.
        * fast/encoding/resources/072.html: Added.
        * fast/encoding/resources/073.html: Added.
        * fast/encoding/resources/074.html: Added.
        * fast/encoding/resources/075.html: Added.
        * fast/encoding/resources/076.html: Added.
        * fast/encoding/resources/077.html: Added.
        * fast/encoding/resources/078.html: Added.
        * fast/encoding/resources/079.html: Added.
        * fast/encoding/resources/080.html: Added.
        * fast/encoding/resources/081.html: Added.
        * fast/encoding/resources/082.html: Added.
        * fast/encoding/resources/083.html: Added.
        * fast/encoding/resources/084.html: Added.
        * fast/encoding/resources/085.html: Added.
        * fast/encoding/resources/086.html: Added.
        * fast/encoding/resources/087.html: Added.
        * fast/encoding/resources/088.html: Added.
        * fast/encoding/resources/089.html: Added.
        * fast/encoding/resources/090.html: Added.
        * fast/encoding/resources/091.html: Added.
        * fast/encoding/resources/092.html: Added.
        * fast/encoding/resources/093.html: Added.
        * fast/encoding/resources/094.html: Added.
        * fast/encoding/resources/095.html: Added.
        * fast/encoding/resources/096.html: Added.
        * fast/encoding/resources/097.html: Added.
        * fast/encoding/resources/098.html: Added.
        * fast/encoding/resources/099.html: Added.
        * fast/encoding/resources/100.html: Added.
        * fast/encoding/resources/101.html: Added.
        * fast/encoding/resources/102.html: Added.
        * fast/encoding/resources/103.html: Added.
        * fast/encoding/resources/104.html: Added.
        * fast/encoding/resources/105.html: Added.
        * fast/encoding/resources/106.html: Added.
        * fast/encoding/resources/107.html: Added.
        * fast/encoding/resources/108.html: Added.
        * fast/encoding/resources/109.html: Added.
        * fast/encoding/resources/110.html: Added.
        * fast/encoding/resources/111.html: Added.
        * fast/encoding/resources/112.html: Added.
        * fast/encoding/resources/113.html: Added.
        * fast/encoding/resources/114.html: Added.
        * fast/encoding/resources/115.html: Added.
        * fast/encoding/resources/116.html: Added.
        * fast/encoding/resources/117.html: Added.
        * fast/encoding/resources/118.html: Added.
        * fast/encoding/resources/119.html: Added.
        * fast/encoding/resources/120.html: Added.
        * fast/encoding/resources/121.html: Added.
        * fast/encoding/resources/122.html: Added.
        * fast/encoding/resources/123.html: Added.
        * fast/encoding/tag-name-digit-expected.txt: Added.
        * fast/encoding/tag-name-digit.html: Added.
        * fast/text/international/bidi-innertext.html:
        * http/tests/misc/charset-sniffer-end-sniffing-expected.txt: Added.
        * http/tests/misc/charset-sniffer-end-sniffing.html: Added.
        * http/tests/misc/resources/charset-sniffer-end-sniffing.php: Added.


git-svn-id: http://svn.webkit.org/repository/webkit/trunk@73756 268f45cc-cd09-0410-ab3c-d52691b4dbfc
parent a8969aab
2010-12-09 Jenn Braithwaite <jennb@chromium.org>
Reviewed by Adam Barth.
TextResourceDecoder::checkForHeadCharset can look way past the limit.
https://bugs.webkit.org/show_bug.cgi?id=47397
Replaced charset detection algorithm with real parser.
Added tests for parser bugs mentioned in the thread for this bug report.
Converted hixie's encoding parsing tests to a layout test.
Added http-equiv attribute to meta tag in 2 existing tests.
* fast/encoding/bracket-in-script-expected.txt: Added.
* fast/encoding/bracket-in-script.html: Added.
* fast/encoding/bracket-in-tag-expected.txt: Added.
* fast/encoding/bracket-in-tag.html: Added.
* fast/encoding/escaped-bracket-expected.txt: Added.
* fast/encoding/escaped-bracket.html: Added.
* fast/encoding/meta-in-body-expected.txt: Added.
* fast/encoding/meta-in-body.html: Added.
* fast/encoding/meta-in-script-expected.txt: Added.
* fast/encoding/meta-in-script.html: Added.
* fast/encoding/meta-in-title-expected.txt: Added.
* fast/encoding/meta-in-title.html: Added.
* fast/encoding/mismatched-end-tag-expected.txt: Added.
* fast/encoding/mismatched-end-tag.html: Added.
* fast/encoding/namespace-meta-expected.txt: Added.
* fast/encoding/namespace-meta.html: Added.
* fast/encoding/namespace-tolerance.html:
* fast/encoding/not-http-equiv-content-expected.txt: Added.
* fast/encoding/not-http-equiv-content.html: Added.
* fast/encoding/parser-tests-expected.txt: Added.
* fast/encoding/parser-tests.html: Added.
* fast/encoding/quotes-in-title-expected.txt: Added.
* fast/encoding/quotes-in-title.html: Added.
* fast/encoding/resources/001.html: Added.
* fast/encoding/resources/002.html: Added.
* fast/encoding/resources/003.html: Added.
* fast/encoding/resources/004.html: Added.
* fast/encoding/resources/005.html: Added.
* fast/encoding/resources/006.html: Added.
* fast/encoding/resources/007.html: Added.
* fast/encoding/resources/008.html: Added.
* fast/encoding/resources/009.html: Added.
* fast/encoding/resources/010.html: Added.
* fast/encoding/resources/011.html: Added.
* fast/encoding/resources/012.html: Added.
* fast/encoding/resources/013.html: Added.
* fast/encoding/resources/014.html: Added.
* fast/encoding/resources/015.html: Added.
* fast/encoding/resources/016.html: Added.
* fast/encoding/resources/017.html: Added.
* fast/encoding/resources/018.html: Added.
* fast/encoding/resources/019.html: Added.
* fast/encoding/resources/020.html: Added.
* fast/encoding/resources/021.html: Added.
* fast/encoding/resources/022.html: Added.
* fast/encoding/resources/023.html: Added.
* fast/encoding/resources/024.html: Added.
* fast/encoding/resources/025.html: Added.
* fast/encoding/resources/026.html: Added.
* fast/encoding/resources/027.html: Added.
* fast/encoding/resources/028.html: Added.
* fast/encoding/resources/029.html: Added.
* fast/encoding/resources/030.html: Added.
* fast/encoding/resources/031.html: Added.
* fast/encoding/resources/032.html: Added.
* fast/encoding/resources/033.html: Added.
* fast/encoding/resources/034.html: Added.
* fast/encoding/resources/035.html: Added.
* fast/encoding/resources/036.html: Added.
* fast/encoding/resources/037.html: Added.
* fast/encoding/resources/038.html: Added.
* fast/encoding/resources/039.html: Added.
* fast/encoding/resources/040.html: Added.
* fast/encoding/resources/041.html: Added.
* fast/encoding/resources/042.html: Added.
* fast/encoding/resources/043.html: Added.
* fast/encoding/resources/044.html: Added.
* fast/encoding/resources/045.html: Added.
* fast/encoding/resources/046.html: Added.
* fast/encoding/resources/047.html: Added.
* fast/encoding/resources/048.html: Added.
* fast/encoding/resources/049.html: Added.
* fast/encoding/resources/050.html: Added.
* fast/encoding/resources/051.html: Added.
* fast/encoding/resources/052.html: Added.
* fast/encoding/resources/053.html: Added.
* fast/encoding/resources/054.html: Added.
* fast/encoding/resources/055.html: Added.
* fast/encoding/resources/056.html: Added.
* fast/encoding/resources/057.html: Added.
* fast/encoding/resources/058.html: Added.
* fast/encoding/resources/059.html: Added.
* fast/encoding/resources/060.html: Added.
* fast/encoding/resources/061.html: Added.
* fast/encoding/resources/062.html: Added.
* fast/encoding/resources/063.html: Added.
* fast/encoding/resources/064.html: Added.
* fast/encoding/resources/065.html: Added.
* fast/encoding/resources/066.html: Added.
* fast/encoding/resources/067.html: Added.
* fast/encoding/resources/068.html: Added.
* fast/encoding/resources/069.html: Added.
* fast/encoding/resources/070.html: Added.
* fast/encoding/resources/071.html: Added.
* fast/encoding/resources/072.html: Added.
* fast/encoding/resources/073.html: Added.
* fast/encoding/resources/074.html: Added.
* fast/encoding/resources/075.html: Added.
* fast/encoding/resources/076.html: Added.
* fast/encoding/resources/077.html: Added.
* fast/encoding/resources/078.html: Added.
* fast/encoding/resources/079.html: Added.
* fast/encoding/resources/080.html: Added.
* fast/encoding/resources/081.html: Added.
* fast/encoding/resources/082.html: Added.
* fast/encoding/resources/083.html: Added.
* fast/encoding/resources/084.html: Added.
* fast/encoding/resources/085.html: Added.
* fast/encoding/resources/086.html: Added.
* fast/encoding/resources/087.html: Added.
* fast/encoding/resources/088.html: Added.
* fast/encoding/resources/089.html: Added.
* fast/encoding/resources/090.html: Added.
* fast/encoding/resources/091.html: Added.
* fast/encoding/resources/092.html: Added.
* fast/encoding/resources/093.html: Added.
* fast/encoding/resources/094.html: Added.
* fast/encoding/resources/095.html: Added.
* fast/encoding/resources/096.html: Added.
* fast/encoding/resources/097.html: Added.
* fast/encoding/resources/098.html: Added.
* fast/encoding/resources/099.html: Added.
* fast/encoding/resources/100.html: Added.
* fast/encoding/resources/101.html: Added.
* fast/encoding/resources/102.html: Added.
* fast/encoding/resources/103.html: Added.
* fast/encoding/resources/104.html: Added.
* fast/encoding/resources/105.html: Added.
* fast/encoding/resources/106.html: Added.
* fast/encoding/resources/107.html: Added.
* fast/encoding/resources/108.html: Added.
* fast/encoding/resources/109.html: Added.
* fast/encoding/resources/110.html: Added.
* fast/encoding/resources/111.html: Added.
* fast/encoding/resources/112.html: Added.
* fast/encoding/resources/113.html: Added.
* fast/encoding/resources/114.html: Added.
* fast/encoding/resources/115.html: Added.
* fast/encoding/resources/116.html: Added.
* fast/encoding/resources/117.html: Added.
* fast/encoding/resources/118.html: Added.
* fast/encoding/resources/119.html: Added.
* fast/encoding/resources/120.html: Added.
* fast/encoding/resources/121.html: Added.
* fast/encoding/resources/122.html: Added.
* fast/encoding/resources/123.html: Added.
* fast/encoding/tag-name-digit-expected.txt: Added.
* fast/encoding/tag-name-digit.html: Added.
* fast/text/international/bidi-innertext.html:
* http/tests/misc/charset-sniffer-end-sniffing-expected.txt: Added.
* http/tests/misc/charset-sniffer-end-sniffing.html: Added.
* http/tests/misc/resources/charset-sniffer-end-sniffing.php: Added.
2010-12-10 Mihai Parparita <mihaip@chromium.org>
Unreviewed Chromium test expectation update.
......
PASS: windows-1255
This test checks that charset sniffer does not get confused by the left angle bracket that is not part of a tag. There was a bug where the less-than caused all text after it until the next closing bracket to be consumed as the tag, resulting in the closing script tag being missed by the charset sniffer.
The charset sniffer would think it was still in the script mode and ignore the meta tag. This test relies on the charset sniffer ignoring meta tags inside script and checking at least 1024 bytes of data for a meta tag.
<html>
<head>
<script>
if (2 < 1) foo = bar;
</script>
</head>
<body>
<meta charset=windows-1255>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "windows-1255")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>This test checks that charset sniffer does not get confused by the left angle bracket that is not part of a tag. There was a bug where the less-than caused all text after it until the next closing bracket to be consumed as the tag, resulting in the closing script tag being missed by the charset sniffer.</p>
<p>The charset sniffer would think it was still in the script mode and ignore the meta tag. This test relies on the charset sniffer ignoring meta tags inside script and checking at least 1024 bytes of data for a meta tag.</p>
</body>
</html>
PASS: ISO-8859-1
This test baselines charset sniffer behavior where the opening bracket inside a tag is consumed as part of the tag data, causing the meta tag to be missed.
<html>
<head>
<foo<meta charset=windows-1255>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "windows-1255")
log("FAIL: " + document.inputEncoding);
else
log("PASS: " + document.inputEncoding);
</script>
<p>This test baselines charset sniffer behavior where the opening bracket inside a tag is consumed as part of the tag data, causing the meta tag to be missed.
</body>
</html>
PASS: KOI8-R
This test checks whether charset sniffer skips over escaped characters correctly.
<html>
<head>
<title>
foo="\</title\>"
<meta charset=windows-1255>
</title>
<meta charset=KOI8-R>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "KOI8-R")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>
This test checks whether charset sniffer skips over escaped characters correctly.
</body>
</html>
PASS: ISO-2022-JP
This test checks that the charset sniffer scans at least 1024 bytes of data to find a meta tag, even if it is not in the head section.
<html>
<head>
</head>
<body>
<meta charset=ISO-2022-JP>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "ISO-2022-JP")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>
This test checks that the charset sniffer scans at least 1024 bytes of data to find a meta tag, even if it is not in the head section.
</body>
</html>
CONSOLE MESSAGE: line 4: SyntaxError: Parse error
PASS: windows-1255
This test passes if the charset is parsed from the meta tag outside the script.
<html>
<head>
<script>
<meta charset=koi8-r>
</script>
<meta charset=windows-1255>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "windows-1255")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>This test passes if the charset is parsed from the meta tag outside the script.
</body>
</html>
PASS: KOI8-R
This test verifies that charset sniffer ignores meta tag in a title.
<html>
<head>
<title>
<meta charset=windows-1255>
</title>
<meta charset=KOI8-R>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "KOI8-R")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>
This test verifies that charset sniffer ignores meta tag in a title.
</body>
</html>
PASS: windows-1255
This test checks that charset sniffer does not get confused by the extraneous end script tag and ignore the meta tag, thinking it is inside a script.
<html>
<head>
</script>
<meta charset=windows-1255>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "windows-1255")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>This test checks that charset sniffer does not get confused by the extraneous end script tag and ignore the meta tag, thinking it is inside a script.</p>
</body>
</html>
This test checks that the charset sniffer ignores namespaced meta tags.
PASS: ISO-2022-JP
<html>
<head>
<testme:meta charset=windows-1252>
<meta charset=ISO-2022-JP>
</head>
<body>
This test checks that the charset sniffer ignores namespaced meta tags.
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "ISO-2022-JP")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
</body>
</html>
<xhtml:html xmlns:xhtml="">
<meta content="charset=UTF-8">
<meta content="charset=UTF-8" http-equiv="Content-Type">
This test ensures a UTF-8 encoding is properly set on documents that:
<br>
......
PASS: windows-1255
This test checks that charset sniffer does not get confused by the text that contains charset wording.
<html>
<head>
<meta name="description" content="how to set charset=utf-8 on HTML docs">
<meta http-equiv="content-type" content="text/html; charset=windows-1255">
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "windows-1255")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>This test checks that charset sniffer does not get confused by the text that contains charset wording.</p>
</body>
</html>
This test suite was converted from http://www.hixie.ch/tests/adhoc/html/parsing/encoding/all.html
Expected failures:
56, 57, 58, 59 - we do not run scripts during encoding detection phase and parser treats meta inside a script as text, not a tag.
60 - parser treats meta inside style as text, not a tag.
97, 99, 102 - we do not run scripts during encoding detection.
Status: Tests ran.
Serious failures:
test 056: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script>document.write('<meta charset="ISO-8859-' + '9">')</script>
test 057: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script>var s = '9">'; document.write('<meta charset="ISO-8859-' + s)</script>
test 058: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script>document.write('<meta charset="ISO-8859-9">')</script>
test 059: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script type="text/plain"><meta charset="ISO-8859-9"></script>
test 060: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<style type="text/plain"><meta charset="ISO-8859-9"></style>
test 097: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script>document.write(atob('PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7Y2hhcnNldD1JU08tODg1OS05Ij4='))</script>
test 099: expected Windows-1252; used Windows-1254
<!DOCTYPE HTML>
<script>document.write(atob('PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7Y2hhcnNldD1JU08tODg1OS0xIj4='))</script>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-9">
test 102: expected Windows-1254; used Windows-1252
<!DOCTYPE HTML>
<script>document.write(atob('PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7Y2hhcnNldD1JU08tODg1OS05Ij4='))</script>
<script>document.write(atob('PG1ldGEgaHR0cC1lcXVpdj0iQ29udGVudC1UeXBlIiBjb250ZW50PSJ0ZXh0L2h0bWw7Y2hhcnNldD1JU08tODg1OS0xIj4='))</script>
(Tests are considered to pass even if they treat Win1254 and ISO-8859-4 as separate.)
<!DOCTYPE HTML>
<html>
<head>
<title>Harness</title>
</head>
<body onload="runtests()">
<p>This test suite was converted from http://www.hixie.ch/tests/adhoc/html/parsing/encoding/all.html</p>
<ul>Expected failures:
<li>56, 57, 58, 59 - we do not run scripts during encoding detection phase and parser treats meta inside a script as text, not a tag.</li>
<li>60 - parser treats meta inside style as text, not a tag.</li>
<li>97, 99, 102 - we do not run scripts during encoding detection.</li>
</ul>
<p>Status: <span id="status">Tests did not run.</span></p>
<p>Serious failures:</p>
<ol id="failures">
</ol>
<p>(Tests are considered to pass even if they treat Win1254 and ISO-8859-4 as separate.)</p>
<p><iframe id="test"></iframe></p>
<script>
if (window.layoutTestController) {
layoutTestController.dumpAsText();
layoutTestController.waitUntilDone();
}
var frame = document.getElementById('test');
var current = 1;
var max = 123;
function dotest() {
var s = current + '';
while (s.length < 3) s = '0' + s;
frame.src = 'resources/' + s + '.html';
window.receivedResults = function () {
var want = frame.contentWindow.document.getElementById('expected').firstChild.data;
var have = frame.contentWindow.document.getElementById('encoding').firstChild.data;
if (have == 'ISO-8859-9')
have = 'Windows-1254';
if (want != have) {
var li = document.createElement('li');
var a = document.createElement('a');
a.appendChild(document.createTextNode('test ' + s));
a.href = s + '.html';
li.appendChild(a);
li.appendChild(document.createTextNode(': expected ' + want + '; used ' + have));
var pre = document.createElement('pre');
pre.appendChild(document.createTextNode(frame.contentWindow.document.getElementsByTagName('pre')[0].firstChild.data));
li.appendChild(pre);
li.value = current;
document.getElementById('failures').appendChild(li);
}
current += 1;
if (current <= max)
dotest();
else {
frame.parentNode.removeChild(frame);
document.getElementById('status').innerText = "Tests ran.";
if (window.layoutTestController)
layoutTestController.notifyDone();
}
};
}
function runtests() {
dotest();
}
function alert() { }
</script>
</body>
</html>
PASS: KOI8-R
This test checks whether charset sniffer skips over quoted elements in a title tag correctly. Tests a bug in the charset sniffer that would consume all characters after the bracket in the quoted text until the next closing bracket, causing the closing title tag to be missed.
<html>
<head>
<title>
foo="<a"
</title>
<meta charset=KOI8-R>
</head>
<body>
<pre id="log"></pre>
<script>
function log(message)
{
document.getElementById("log").innerText += message + "\n";
}
if (window.layoutTestController)
layoutTestController.dumpAsText();
if (document.inputEncoding == "KOI8-R")
log("PASS: " + document.inputEncoding);
else
log("FAIL: " + document.inputEncoding);
</script>
<p>
This test checks whether charset sniffer skips over quoted elements in a title tag correctly. Tests a bug in the charset sniffer that would consume all characters after the bracket in the quoted text until the next closing bracket, causing the closing title tag to be missed.
</body>
</html>
<!DOCTYPE HTML>
<!-- (control test - for the other tests to work, this should pass - you may have to set your defaults appropriately) -->
<p>Test:
<pre>&lt;!DOCTYPE HTML>
&lt;!-- (control test - for the other tests to work, this should pass - you may have to set your defaults appropriately) --></pre>
<p>Expected result: <span id="expected">Windows-1252</span>
<div>
<style scoped>
.pass { background: green; color: white; padding: 0.5em; font-weight: bold; }
.fail { background: red; color: yellow; padding: 0.5em; font-weight: bold; }