Skip to content
  • fpizlo@apple.com's avatar
    Infer constant global variables · 33961712
    fpizlo@apple.com authored
    https://bugs.webkit.org/show_bug.cgi?id=124464
    
    Source/JavaScriptCore: 
    
    Reviewed by Sam Weinig.
            
    All global variables that are candidates for watchpoint-based constant inference (i.e.
    not 'const' variables) will now have WatchpointSet's associated with them and those
    are used to drive the inference by tracking three states of each variable:
            
    Uninitialized: the variable's value is Undefined and the WatchpointSet state is
        ClearWatchpoint.
            
    Initialized: the variable's value was set to something (could even be explicitly set
        to Undefined) and the WatchpointSet state is IsWatching.
            
    Invalidated: the variable's value was set to something else (could even be the same
        thing as before but the point is that a put operation did execute again) and the
        WatchpointSet is IsInvalidated.
            
    If the compiler tries to compile a GetGlobalVar and the WatchpointSet state is
    IsWatching, then the current value of the variable can be folded in place of the get,
    and a watchpoint on the variable can be registered.
            
    We handle race conditions between the mutator and compiler by mandating that:
            
    - The mutator changes the WatchpointSet state after executing the put.
            
    - There is no opportunity to install code or call functions between when the mutator
      executes a put and changes the WatchpointSet state.
            
    - The compiler checks the WatchpointSet state prior to reading the value.
            
    The concrete algorithm used by the mutator is:
            
        1. Store the new value into the variable.
        --- Execute a store-store fence.
        2. Bump the state (ClearWatchpoing becomes IsWatching, IsWatching becomes
           IsInvalidated); the IsWatching->IsInvalidated transition may end up firing
           watchpoints.
            
    The concrete algorithm that the compiler uses is:
            
        1. Load the state. If it's *not* IsWatching, then give up on constant inference.
        --- Execute a load-load fence.
        2. Load the value of the variable and use that for folding, while also registering
           a DesiredWatchpoint. The various parts of this step can be done in any order.
            
    The desired watchpoint registration will fail if the watchpoint set is already
    invalidated. Now consider the following interesting interleavings:
            
    Uninitialized->M1->M2->C1->C2: Compiler sees IsWatching because of the mutator's store
        operation, and the variable is folded. The fencing ensures that C2 sees the value
        stored in M1 - i.e. we fold on the value that will actually be watchpointed. If
        before the compilation is installed the mutator executes another store then we
        will be sure that it will be a complete sequence of M1+M2 since compilations get
        installed at safepoints and never "in the middle" of a put_to_scope. Hence that
        compilation installation will be invalidated. If the M1+M2 sequence happens after
        the code is installed, then the code will be invalidated by triggering a jettison.
            
    Uninitialized->M1->C1->C2->M2: Compiler sees Uninitialized and will not fold. This is
        a sensible outcome since if the compiler read the variable's value, it would have
        seen Undefined.
            
    Uninitialized->C1->C2->M1->M2: Compiler sees Uninitialized and will not fold.
    Uninitialized->C1->M1->C2->M2: Compiler sees Uninitialized and will not fold.
    Uninitialized->C1->M1->M2->C2: Compiler sees Uninitialized and will not fold.
    Uninitialized->M1->C1->M2->C2: Compiler sees Uninitialized and will not fold.
            
    IsWatched->M1->M2->C1->C2: Compiler sees IsInvalidated and will not fold.
            
    IsWatched->M1->C1->C2->M2: Compiler will fold, but will also register a desired
        watchpoint, and that watchpoint will get invalidated before the code is installed.
            
    IsWatched->M1->C1->M2->C2: As above, will fold but the code will get invalidated.
    IsWatched->C1->C2->M1->M2: As above, will fold but the code will get invalidated.
    IsWatched->C1->M1->C2->M2: As above, will fold but the code will get invalidated.
    IsWatched->C1->M1->M2->C2: As above, will fold but the code will get invalidated.
            
    Note that this kind of reasoning shows why having the mutator first bump the state and
    then store the new value would be wrong. If we had done that (M1 = bump state, M2 =
    execute put) then we could have the following deadly interleavings:
            
    Uninitialized->M1->C1->C2->M2:
    Uninitialized->M1->C1->M2->C2: Mutator bumps the state to IsWatched and then the
        compiler folds Undefined, since M2 hasn't executed yet. Although C2 will set the
        watchpoint, M1 didn't notify it - it mearly initiated watching. M2 then stores a
        value other than Undefined, and you're toast.
            
    You could fix this sort of thing by making the Desired Watchpoints machinery more
    sophisticated, for example having it track the value that was folded; if the global
    variable's value was later found to be different then we could invalidate the
    compilation. You could also fix it by having the compiler also check that the value of
    the variable is not Undefined before folding. While those all sound great, I decided
    to instead just use the right interleaving since that results in less code and feels
    more intuitive.
            
    This is a 0.5% speed-up on SunSpider, mostly due to a 20% speed-up on math-cordic.
    It's a 0.6% slow-down on LongSpider, mostly due to a 25% slow-down on 3d-cube. This is
    because 3d-cube takes global variable assignment slow paths very often. Note that this
    3d-cube slow-down doesn't manifest as much in SunSpider (only 6% there). This patch is
    also a 1.5% speed-up on V8v7 and a 2.8% speed-up on Octane v1, mostly due to deltablue
    (3.7%), richards (4%), and mandreel (26%). This is a 2% speed-up on Kraken, mostly due
    to a 17.5% speed-up on imaging-gaussian-blur. Something that really illustrates the
    slam-dunk-itude of this patch is the wide range of speed-ups on JSRegress. Casual JS
    programming often leads to global-var-based idioms and those variables tend to be
    assigned once, leading to excellent constant folding opportunities in an optimizing
    JIT. This is very evident in the speed-ups on JSRegress.
    
    * assembler/ARM64Assembler.h:
    (JSC::ARM64Assembler::dmbSY):
    * assembler/ARMv7Assembler.h:
    (JSC::ARMv7Assembler::dmbSY):
    * assembler/MacroAssemblerARM64.h:
    (JSC::MacroAssemblerARM64::memfence):
    * assembler/MacroAssemblerARMv7.h:
    (JSC::MacroAssemblerARMv7::load8):
    (JSC::MacroAssemblerARMv7::memfence):
    * assembler/MacroAssemblerX86.h:
    (JSC::MacroAssemblerX86::load8):
    (JSC::MacroAssemblerX86::store8):
    * assembler/MacroAssemblerX86Common.h:
    (JSC::MacroAssemblerX86Common::getUnusedRegister):
    (JSC::MacroAssemblerX86Common::store8):
    (JSC::MacroAssemblerX86Common::memoryFence):
    * assembler/MacroAssemblerX86_64.h:
    (JSC::MacroAssemblerX86_64::load8):
    (JSC::MacroAssemblerX86_64::store8):
    * assembler/X86Assembler.h:
    (JSC::X86Assembler::movb_rm):
    (JSC::X86Assembler::movzbl_mr):
    (JSC::X86Assembler::mfence):
    (JSC::X86Assembler::X86InstructionFormatter::threeByteOp):
    (JSC::X86Assembler::X86InstructionFormatter::oneByteOp8):
    * bytecode/CodeBlock.cpp:
    (JSC::CodeBlock::CodeBlock):
    * bytecode/Watchpoint.cpp:
    (JSC::WatchpointSet::WatchpointSet):
    (JSC::WatchpointSet::add):
    (JSC::WatchpointSet::notifyWriteSlow):
    * bytecode/Watchpoint.h:
    (JSC::WatchpointSet::state):
    (JSC::WatchpointSet::isStillValid):
    (JSC::WatchpointSet::addressOfSetIsNotEmpty):
    * dfg/DFGAbstractInterpreterInlines.h:
    (JSC::DFG::::executeEffects):
    * dfg/DFGByteCodeParser.cpp:
    (JSC::DFG::ByteCodeParser::getJSConstantForValue):
    (JSC::DFG::ByteCodeParser::getJSConstant):
    (JSC::DFG::ByteCodeParser::parseBlock):
    * dfg/DFGClobberize.h:
    (JSC::DFG::clobberize):
    * dfg/DFGFixupPhase.cpp:
    (JSC::DFG::FixupPhase::fixupNode):
    * dfg/DFGNode.h:
    (JSC::DFG::Node::isStronglyProvedConstantIn):
    (JSC::DFG::Node::hasIdentifierNumberForCheck):
    (JSC::DFG::Node::hasRegisterPointer):
    * dfg/DFGNodeFlags.h:
    * dfg/DFGNodeType.h:
    * dfg/DFGOperations.cpp:
    * dfg/DFGOperations.h:
    * dfg/DFGPredictionPropagationPhase.cpp:
    (JSC::DFG::PredictionPropagationPhase::propagate):
    * dfg/DFGSafeToExecute.h:
    (JSC::DFG::safeToExecute):
    * dfg/DFGSpeculativeJIT.cpp:
    (JSC::DFG::SpeculativeJIT::compileNotifyPutGlobalVar):
    * dfg/DFGSpeculativeJIT.h:
    (JSC::DFG::SpeculativeJIT::callOperation):
    * dfg/DFGSpeculativeJIT32_64.cpp:
    (JSC::DFG::SpeculativeJIT::compile):
    * dfg/DFGSpeculativeJIT64.cpp:
    (JSC::DFG::SpeculativeJIT::compile):
    * ftl/FTLAbbreviatedTypes.h:
    * ftl/FTLAbbreviations.h:
    (JSC::FTL::buildFence):
    * ftl/FTLCapabilities.cpp:
    (JSC::FTL::canCompile):
    * ftl/FTLIntrinsicRepository.h:
    * ftl/FTLLowerDFGToLLVM.cpp:
    (JSC::FTL::LowerDFGToLLVM::compileNode):
    (JSC::FTL::LowerDFGToLLVM::compileNotifyPutGlobalVar):
    * ftl/FTLOutput.h:
    (JSC::FTL::Output::fence):
    * jit/JIT.h:
    * jit/JITOperations.h:
    * jit/JITPropertyAccess.cpp:
    (JSC::JIT::emitPutGlobalVar):
    (JSC::JIT::emit_op_put_to_scope):
    (JSC::JIT::emitSlow_op_put_to_scope):
    * jit/JITPropertyAccess32_64.cpp:
    (JSC::JIT::emitPutGlobalVar):
    (JSC::JIT::emit_op_put_to_scope):
    (JSC::JIT::emitSlow_op_put_to_scope):
    * llint/LowLevelInterpreter32_64.asm:
    * llint/LowLevelInterpreter64.asm:
    * llvm/LLVMAPIFunctions.h:
    * offlineasm/arm.rb:
    * offlineasm/arm64.rb:
    * offlineasm/cloop.rb:
    * offlineasm/instructions.rb:
    * offlineasm/x86.rb:
    * runtime/JSGlobalObject.cpp:
    (JSC::JSGlobalObject::addGlobalVar):
    (JSC::JSGlobalObject::addFunction):
    * runtime/JSGlobalObject.h:
    (JSC::JSGlobalObject::addVar):
    (JSC::JSGlobalObject::addConst):
    * runtime/JSScope.cpp:
    (JSC::abstractAccess):
    * runtime/JSSymbolTableObject.h:
    (JSC::symbolTablePut):
    (JSC::symbolTablePutWithAttributes):
    * runtime/SymbolTable.cpp:
    (JSC::SymbolTableEntry::couldBeWatched):
    (JSC::SymbolTableEntry::prepareToWatch):
    (JSC::SymbolTableEntry::notifyWriteSlow):
    * runtime/SymbolTable.h:
    
    LayoutTests: 
    
    Reviewed by Sam Weinig.
    
    * js/regress/global-var-const-infer-expected.txt: Added.
    * js/regress/global-var-const-infer-fire-from-opt-expected.txt: Added.
    * js/regress/global-var-const-infer-fire-from-opt.html: Added.
    * js/regress/global-var-const-infer.html: Added.
    * js/regress/script-tests/global-var-const-infer-fire-from-opt.js: Added.
    (foo):
    (setA):
    (setB):
    (check):
    * js/regress/script-tests/global-var-const-infer.js: Added.
    (foo):
    (check):
    
    
    
    git-svn-id: http://svn.webkit.org/repository/webkit/trunk@159545 268f45cc-cd09-0410-ab3c-d52691b4dbfc
    33961712