Skip to content
  • darin's avatar
    LayoutTests: · b3547a37
    darin authored
            Reviewed by Alexey.
    
            - test for bug where the charset in a link element for a CSS stylesheet is ignored.
              I noticed this while working on new text encoding machinery.
    
            * fast/encoding/css-link-charset-expected.txt: Added.
            * fast/encoding/css-link-charset.css: Added.
            * fast/encoding/css-link-charset.html: Added.
    
            - test for http://bugzilla.opendarwin.org/show_bug.cgi?id=10681
              REGRESSION: Reproducible crash at Wikipedia
              (Alexey wrote this one, I reviewed.)
    
            * fast/forms/form-data-encoding-normalization-overrun-expected.txt: Added.
            * fast/forms/form-data-encoding-normalization-overrun.html: Added.
    
            - and a tweak to an existing test
    
            * fast/forms/form-data-encoding.html: Changed to dump encoded URL so it's easier to
            diagnose this when it fails.
    
    WebCore:
    
            Reviewed by Alexey.
    
            - http://bugzilla.opendarwin.org/show_bug.cgi?id=10728
              text encodings should work without a numeric ID
    
            - includes a fix for http://bugzilla.opendarwin.org/show_bug.cgi?id=10681
              REGRESSION: Reproducible crash at Wikipedia
    
            - fixed a bug where link elements would not set the charset properly for
              CSS stylesheets they loaded
    
            - converted DeprecatedString code paths that are related to decoding web
              pages to use String instead, to ensure that conversion back and forth won't
              hurt performance
    
            Test: fast/encoding/css-link-charset.html
            Test: fast/forms/form-data-encoding-normalization-overrun.html
    
            Coverage for encoding issues is pretty good, so we probably don't need more
            tests to land this. Our existing tests did find issues with this patch while
            it was under development. And I suppose it would be nice to have even more tests.
    
            * platform/TextEncoding.h:
            * platform/TextEncoding.cpp:
            (WebCore::addEncodingName): Added. Used to build up the set used by isJapanese.
            (WebCore::TextEncoding::TextEncoding): Removed boolean "eight bit only" parameter and
            added an overload for String as well as const char*. Simplified because now the only
            data member is m_name -- calls the registry's atomicCanonicalTextEncodingName function
            to make the name canonical (resolve aliases) and atomic (use a single pointer for each
            encoding name so we can compare and hash efficiently).
            (WebCore::TextEncoding::decode): Renamed from toUnicode. Just a simple wrapper on top
            of TextDecoder that can be used when the data to decode is all present at once.
            (WebCore::TextEncoding::encode): Renamed from fromUnicode. Handles the normalization and
            then uses the registry to get a codec to handle the rest.
            (WebCore::TextEncoding::usesVisualOrdering): New implementation that compares with the
            name of the only encoding that uses visual ordering. We blur the concepts a bit so that
            we treat the visual ordering and logical ordering variations as two separate encodings.
            (WebCore::TextEncoding::isJapanese): New implementation that uses a set to efficiently
            determine if an encoding is Japanese.
            (WebCore::TextEncoding::backslashAsCurrencySymbol): New implementation that compares
            with the names of the two encodings that have the strange backslash.
            (WebCore::TextEncoding::closest8BitEquivalent): Added. Replaces the old "eight bit only"
            boolean parameter to the constructor.
            (WebCore::ASCIIEncoding): Added.
            (WebCore::Latin1Encoding): Added.
            (WebCore::UTF16BigEndianEncoding): Added.
            (WebCore::UTF16LittleEndianEncoding): Added.
            (WebCore::UTF8Encoding): Added.
            (WebCore::WindowsLatin1Encoding): Added.
    
            * platform/TextEncodingRegistry.h: Added.
            * platform/TextEncodingRegistry.cpp: Added. Keeps a table of all the character set
            aliases and names and another of all the codecs and parameters for each name.
    
            * platform/TextDecoder.h: Added.
            * platform/TextDecoder.cpp: Added. Contains logic to look for a BOM and hand the data
            to the proper codec, based on code that used to be in both the ICU and Mac codecs.
    
            * platform/StreamingTextDecoder.h: Renamed class to TextCodec. We'll rename
            the files in a later check-in. Moved creation functions into TextEncodingRegistry.h.
            Added typedefs of the registrar function types so classes derived from this one
            can use them without including the TextEncodingRegistry header. Renamed toUnicode
            and fromUnicode to decode and encode. Changed the parameter and return types so that
            the parameters are pointers to the data and the return types are String and CString.
            * platform/StreamingTextDecoder.cpp:
            (WebCore::TextCodec::appendOmittingBOM): Added. Helper function used by multiple
            classes derived from this one.
    
            * platform/TextCodecLatin1.h: Added.
            * platform/TextCodecLatin1.cpp: Added. Contains logic to handle encoding and decoding
            Windows Latin-1, based on code that used to be in both the ICU and Mac codecs.
    
            * platform/TextCodecUTF16.h: Added.
            * platform/TextCodecUTF16.cpp: Added. Contains logic to handle encoding and decoding
            UTF-16, based on code that used to be in both the ICU and Mac codecs.
    
            * platform/StreamingTextDecoderICU.h: Renamed class to TextCodecICU. We'll rename
            the files in a later check-in. Removed all the functions having to do with handling
            BOM, UTF-16, and Latin-1; those are now handled elsewhere. Removed textEncodingSupported
            because that's superseded by the registry. Added registry hook functions.
            * platform/StreamingTextDecoderICU.cpp:
            (WebCore::TextCodecICU::registerEncodingNames): Added. Registers all encodings that
            ICU can handle with the "IANA" standard. Also includes a special case for a particular
            type of encoding for Hebrew that uses logical ordering. Also includes aliases that are
            not in ICU but that were historically known to WebKit for encodings that ICU handles. 
            (WebCore::newTextCodecICU): Added. Used by registerCodecs.
            (WebCore::TextCodecICU::registerCodecs): Added. Registers codecs for the same encodings
            as above.
            (WebCore::TextCodecICU::TextCodecICU): Much simplified since this now only handles the
            actual ICU encoding and decoding.
            (WebCore::TextCodecICU::~TextCodecICU): Renamed.
            (WebCore::TextCodecICU::releaseICUConverter): Changed to be a const member function.
            (WebCore::TextCodecICU::createICUConverter): Changed to be a const member function and
            to check if the cached converter can be reused in a simpler way.
            (WebCore::TextCodecICU::decode): Updated for changes to types.
            (WebCore::TextCodecICU::encode): Updated for changes to types, and removed normalization
            since this is now handled by the caller.
    
            * platform/mac/StreamingTextDecoderMac.h: Renamed class to TextCodecMac. We'll rename
            the files in a later check-in. Removed all the functions having to do with handling
            BOM, UTF-16, and Latin-1; those are now handled elsewhere. Removed textEncodingSupported
            because that's superseded by the registry. Added registry hook functions.
            * platform/mac/StreamingTextDecoderMac.cpp:
            (WebCore::TextCodecMac::registerEncodingNames): Added. Registers encodings based on
            the charset table generated by the make-charset-table.pl perl script.
            (WebCore::newTextCodecMac): Added. Used by registerCodecs.
            (WebCore::TextCodecMac::registerCodecs): Added. Registers codecs for the same encodings
            as above.
            (WebCore::TextCodecMac::TextCodecMac): Much simplified since this now only handles the
            actual TEC/CF encoding and decoding.
            (WebCore::TextCodecMac::~TextCodecMac): Renamed.
            (WebCore::TextCodecMac::releaseTECConverter): Changed to be a const member function.
            (WebCore::TextCodecMac::createTECConverter): Changed to be a const member function.
            (WebCore::TextCodecMac::decode): Updated for changes to types.
            (WebCore::TextCodecMac::encode): Updated for changes to types, and removed normalization
            since this is now handled by the caller.
    
            * platform/mac/mac-encodings.txt: Removed most of the names in this file. This now
            only includes encodings where we want to use Mac OS X Carbon Text Encoding Converter,
            which is only encodings that are not supported by ICU.
            * platform/make-charset-table.pl: Removed flags from output. We don't use them any more.
            * platform/CharsetData.h: Changed from a platform-independent header into a
            Macintosh-specific one. A later patch should move this and rename it. Also
            subsumes ExtraCFEncodings.h.
    
            * WebCore.xcodeproj/project.pbxproj: Added new files. Changed the prefix on the
            "make character sets" rule to be kTextEncoding instead of kCFStringEncoding.
    
            * loader/Decoder.h: Change the default encoding parameter to the constructor to be
            a TextEncoding object. Renamed setEncodingName to setEncoding, and made it take a
            TextEncoding for the encoding. Removed the encodingName and visuallyOrdered functions,
            since TextEncoding supports both directly in a straightforward way. Changed both
            decode and flush functions to return String instead of DeprecatedString. Added a
            number of private functions to factor this class a bit more so it's easier to read.
            Got rid of a number of redundant data members. Changed the buffer to a Vector<char>.
            * loader/Decoder.cpp:
            (WebCore::Decoder::determineContentType): Added. Used by constructor to determine
            the content type based on the passed-in MIME type.
            (WebCore::Decoder::defaultEncoding): Added. Used by constructor to determine the
            default encoding based on the passed in default and the content type.
            (WebCore::Decoder::Decoder): Changed to use the functions above. Also renamed
            m_reachedBody to m_checkedForHeadCharset.
            (WebCore::Decoder::setEncoding): Renamed and changed to take an encoding rather
            than an encoding name.
            (WebCore::Decoder::checkForBOM): Factored out of decode.
            (WebCore::Decoder::checkForCSSCharset): Factored out of decode.
            (WebCore::Decoder::checkForHeadCharset): Factored out of decode.
            (WebCore::Decoder::detectJapaneseEncoding): Factored out of decode.
            (WebCore::Decoder::decode): Refactored so it's no longer one huge function.
            Changed to use the new Vector<char> and the new API for TextDecoder.
            (WebCore::Decoder::flush): Added code to empty out the buffer. Not an issue in
            practice since we don't re-use the decoder after flushing it.
    
            * platform/UChar.h: Added. Has the type named WebCore::UChar that we'll be switching
            to. We'll switch away from the ICU ::UChar type, because we don't want to be so
            closely tied to ICU -- include this instead of <unicode/umachine.h>.
    
            * platform/PlatformString.h:
            * platform/String.cpp:
            (WebCore::String::latin1): Updated for changes to TextEncoding.
            (WebCore::String::utf8): Ditto.
            (WebCore::String::newUninitialized): Added. Gives a way to create a String and
            then write directly into its buffer.
    
            * platform/StringImpl.h: Changed return value for charactersWithNullTermination to
            be a const UChar*. While it's true that this function changes the underlying
            StringImpl, the characters still shouldn't be modified with the returned pointer.
            * platform/StringImpl.cpp:
            (WebCore::StringImpl::charactersWithNullTermination): Updated for change above.
            (WebCore::StringImpl::newUninitialized): Added. Gives a way to create a StringImpl
            and then write directly into its buffer.
    
            * platform/CString.h:
            * platform/CString.cpp: (WebCore::CString::newUninitialized): Added. Gives a way
            to create a CString and then write directly into its buffer.
    
            * bridge/mac/WebCoreFrameBridge.h: Removed textEncoding method, and replaced
            +[WebCoreFrameBridge stringWithData:textEncoding:] with
            -[WebCoreFrameBridge stringWithData:] to avoid having to pass text encoding
            IDs around.
            * bridge/mac/WebCoreFrameBridge.mm:
            (-[WebCoreFrameBridge setEncoding:userChosen:]): Removed now-unneeded conversion
            to DeprecatedString.
            (-[WebCoreFrameBridge stringByEvaluatingJavaScriptFromString:forceUserGesture:]):
            Ditto.
            (-[WebCoreFrameBridge aeDescByEvaluatingJavaScriptFromString:]): Ditto.
            (-[WebCoreFrameBridge referrer]): Removed now-unneeded call to getNSString.
            (-[WebCoreFrameBridge stringWithData:]): Added. Asks the document's decoder
            what its encoding is, and decodes using that.
            (+[WebCoreFrameBridge stringWithData:textEncodingName:]): Simplified so it
            no longer involved a text encoding ID number.
            (-[WebCoreFrameBridge smartInsertForString:replacingRange:beforeString:afterString:]):
            Changed to use UChar instead of DeprecatedChar.
            (-[WebCoreFrameBridge documentFragmentWithMarkupString:baseURLString:]): Removed
            now-unneeded conversion to DeprecatedString.
            (-[WebCoreFrameBridge documentFragmentWithText:inContext:]): Ditto.
    
            * html/HTMLFormElement.cpp:
            (WebCore::encodeCString): Changed parameter to CString.
            (WebCore::HTMLFormElement::formData): Updated code for improvements to TextEncoding.
    
            * loader/CachedCSSStyleSheet.h:
            * loader/CachedCSSStyleSheet.cpp:
            (WebCore::CachedCSSStyleSheet::CachedCSSStyleSheet): Fixed mistake where the
            decoder was created without passing in the character set. Also changed from
            DeprecatedString to String.
            (WebCore::CachedCSSStyleSheet::setCharset): More of the same.
    
            * bindings/js/kjs_window.h: (KJS::ScheduledAction::ScheduledAction): Changed
            to use String instead of DeprecatedString, UChar instead of DeprecatedChar,
            CString instead of DeprecatedCString, etc.
            * bridge/mac/FormDataMac.mm: (WebCore::arrayFromFormData): Ditto.
            * bridge/mac/FrameMac.h: Ditto.
            * bridge/mac/FrameMac.mm: (WebCore::FrameMac::isCharacterSmartReplaceExempt):
            Ditto.
            * bridge/mac/WebCoreAXObject.mm:
            (-[WebCoreAXObject helpText]): Ditto.
            (-[WebCoreAXObject value]): Ditto.
            (-[WebCoreAXObject accessibilityDescription]): Ditto.
            (-[WebCoreAXObject doAXStringForTextMarkerRange:]): Ditto.
            * bridge/mac/WebCoreEncodings.mm: (+[WebCoreEncodings decodeData:]): Ditto.
            Also fixed code that does a deref without a ref to use RefPtr instead.
            * bridge/mac/WebCoreScriptDebugger.mm:
            (-[WebCoreScriptCallFrame evaluateWebScript:]): Ditto.
            * bridge/mac/WebCoreSettings.mm:
            (-[WebCoreSettings setDefaultTextEncoding:]): Ditto.
            * css/CSSImportRule.cpp: (WebCore::CSSImportRule::insertedIntoParent): Ditto.
            * css/cssparser.cpp: (WebCore::CSSParser::lex): Ditto.
            * dom/Document.h:
            * dom/Document.cpp:
            (WebCore::Document::setCharset): Ditto.
            (WebCore::Document::write): Ditto.
            (WebCore::Document::determineParseMode): Ditto.
            * dom/ProcessingInstruction.cpp:
            (WebCore::ProcessingInstruction::checkStyleSheet): Ditto.
            * dom/XMLTokenizer.h:
            * dom/XMLTokenizer.cpp:
            (WebCore::shouldAllowExternalLoad): Ditto.
            (WebCore::createStringParser): Ditto.
            (WebCore::XMLTokenizer::write): Ditto.
            (WebCore::toString): Ditto.
            (WebCore::handleElementAttributes): Ditto.
            (WebCore::XMLTokenizer::startElementNs): Ditto.
            (WebCore::XMLTokenizer::endElementNs): Ditto.
            (WebCore::XMLTokenizer::characters): Ditto.
            (WebCore::XMLTokenizer::processingInstruction): Ditto.
            (WebCore::XMLTokenizer::cdataBlock): Ditto.
            (WebCore::XMLTokenizer::comment): Ditto.
            (WebCore::XMLTokenizer::internalSubset): Ditto.
            (WebCore::getXHTMLEntity): Ditto.
            (WebCore::externalSubsetHandler): Ditto.
            (WebCore::XMLTokenizer::initializeParserContext): Ditto.
            (WebCore::XMLTokenizer::notifyFinished): Ditto.
            (WebCore::xmlDocPtrForString): Ditto.
            (WebCore::parseXMLDocumentFragment): Ditto.
            (WebCore::attributesStartElementNsHandler): Ditto.
            (WebCore::parseAttributes): Ditto.
            * html/FormDataList.h:
            * html/FormDataList.cpp:
            (WebCore::FormDataList::appendString): Ditto. Also changed to call the
            encoding function by its new name and with new parameters.
            (WebCore::FormDataList::appendFile): Ditto.
            * html/HTMLDocument.h:
            * html/HTMLDocument.cpp:
            (WebCore::parseDocTypePart): Ditto.
            (WebCore::containsString): Ditto.
            (WebCore::parseDocTypeDeclaration): Ditto.
            (WebCore::HTMLDocument::determineParseMode): Ditto.
            * html/HTMLInputElement.cpp: (WebCore::HTMLInputElement::appendFormData): Ditto.
            * html/HTMLScriptElement.cpp:
            (WebCore::HTMLScriptElement::parseMappedAttribute): Ditto.
            * html/HTMLTokenizer.h:
            * html/HTMLTokenizer.cpp:
            (WebCore::HTMLTokenizer::scriptHandler): Ditto.
            (WebCore::HTMLTokenizer::parseTag): Ditto.
            (WebCore::HTMLTokenizer::write): Ditto.
            (WebCore::HTMLTokenizer::finish): Ditto.
            (WebCore::parseHTMLDocumentFragment): Ditto.
            * loader/Cache.h:
            * loader/Cache.cpp:
            (WebCore::Cache::requestStyleSheet): Ditto.
            (WebCore::Cache::requestScript): Ditto.
            * loader/CachedResource.h: Ditto.
            * loader/CachedScript.h:
            * loader/CachedScript.cpp:
            (WebCore::CachedScript::CachedScript): Ditto.
            (WebCore::CachedScript::ref): Ditto.
            (WebCore::CachedScript::deref): Ditto.
            (WebCore::CachedScript::setCharset): Ditto.
            (WebCore::CachedScript::data): Ditto.
            (WebCore::CachedScript::checkNotify): Ditto.
            * loader/CachedXBLDocument.h:
            * loader/CachedXBLDocument.cpp:
            (WebCore::CachedXBLDocument::setCharset): Ditto.
            * loader/CachedXSLStyleSheet.h:
            * loader/CachedXSLStyleSheet.cpp:
            (WebCore::CachedXSLStyleSheet::setCharset): Ditto.
            * loader/DocLoader.cpp:
            (WebCore::DocLoader::requestStyleSheet): Ditto.
            (WebCore::DocLoader::requestScript): Ditto.
            * loader/DocLoader.h: Ditto.
            * loader/FormData.h:
            * loader/FormData.cpp:
            (WebCore::FormData::FormData): Ditto.
            (WebCore::FormData::appendFile): Ditto.
            (WebCore::FormData::flattenToString): Ditto.
            * page/Frame.h:
            * page/FramePrivate.h:
            * page/Frame.cpp:
            (WebCore::UserStyleSheetLoader::setStyleSheet): Ditto.
            (WebCore::getString): Ditto.
            (WebCore::Frame::replaceContentsWithScriptResult): Ditto.
            (WebCore::Frame::executeScript): Ditto.
            (WebCore::Frame::clear): Ditto.
            (WebCore::Frame::write): Ditto.
            (WebCore::Frame::endIfNotLoading): Ditto.
            (WebCore::Frame::baseTarget): Ditto.
            (WebCore::Frame::scheduleRedirection): Ditto.
            (WebCore::Frame::scheduleLocationChange): Ditto.
            (WebCore::Frame::scheduleHistoryNavigation): Ditto.
            (WebCore::Frame::changeLocation): Ditto.
            (WebCore::Frame::redirectionTimerFired): Ditto.
            (WebCore::Frame::encoding): Ditto.
            (WebCore::Frame::submitForm): Ditto.
            (WebCore::Frame::referrer): Ditto.
            (WebCore::Frame::isCharacterSmartReplaceExempt): Ditto.
            (WebCore::Frame::setEncoding): Ditto.
            * page/Settings.h: Ditto.
            * platform/SegmentedString.h: Ditto.
            * platform/SegmentedString.cpp: Ditto.
            * xml/XSLStyleSheet.cpp: (WebCore::XSLStyleSheet::parseString): Ditto.
            * xml/XSLTProcessor.cpp:
            (WebCore::transformTextStringToXHTMLDocumentString): Ditto.
            (WebCore::XSLTProcessor::createDocumentFromSource): Ditto.
            * xml/xmlhttprequest.h:
            * xml/xmlhttprequest.cpp:
            (WebCore::XMLHttpRequest::open): Ditto.
            (WebCore::XMLHttpRequest::send): Ditto.
            (WebCore::XMLHttpRequest::receivedData): Ditto.
    
            * platform/DeprecatedString.cpp:
            (WebCore::DeprecatedString::fromUtf8): Updated for changes to TextEncoding.
            (WebCore::DeprecatedString::utf8): Ditto.
    
            * platform/KURL.h:
            * platform/KURL.cpp:
            (WebCore::KURL::KURL): Updated to overload based on presence or absence of
            TextEncoding rather than having a default.
            (WebCore::KURL::init): Moved body of constructor in here. Updated to use
            the new TextEncoding interface.
            (WebCore::KURL::decode_string): Updated to overload based on presence or
            absence of TextEncoding rather than having a default. Updated to use
            the new TextEncoding interface.
            (WebCore::encodeRelativeString): Updated to use the new TextEncoding interface.
    
            * platform/Font.cpp: (WebCore::WidthIterator::normalizeVoicingMarks): Fixed
            code to use U_ZERO_ERROR instead of a typecast.
    
            * bindings/js/kjs_proxy.h: Removed unneeded declaration of DeprecatedString.
            * platform/GraphicsContext.h: Ditto.
    
            * platform/GraphicsContext.cpp: Removed unneeded include of "DeprecatedString.h".
            * rendering/break_lines.cpp: Ditto.
            * xml/XMLSerializer.cpp: Ditto.
    
            * platform/mac/FontDataMac.mm: Removed unneeded include of <unicode/unorm.h>.
    
            * platform/CharsetNames.h: Emptied out this file. A later patch could remove it.
            * platform/CharsetNames.cpp: Ditto.
            * platform/mac/ExtraCFEncodings.h: Ditto.
    
    WebKit:
    
            Reviewed by Alexey.
    
            - WebKit side of changes to encoding
    
            * WebView/WebHTMLRepresentation.m: (-[WebHTMLRepresentation documentSource]):
            Changed to call new -[WebCoreFrameBridge stringWithData:] instead of the calling
            the old methods that used a CFStringEncoding: -[WebCoreFrameBridge textEncoding]
            and +[WebCoreFrameBridge stringWithData:textEncoding:].
    
            * WebView/WebResource.m: (-[WebResource _stringValue]): Removed special case for
            nil encoding name. The bridge itself now has the rule that "nil encoding name
            means Latin-1", so we don't need to check for nil.
    
            * WebView/WebFrame.m: (-[WebFrame _checkLoadComplete]): Retain the frame until
            we get the parent frame while walking up parent frames, because it's possible
            for _checkLoadCompleteForThisFrame to release the last reference to the frame.
            (Not reviewed; needed to run performance tests successfully.)
    
    
    
    git-svn-id: http://svn.webkit.org/repository/webkit/trunk@16245 268f45cc-cd09-0410-ab3c-d52691b4dbfc
    b3547a37