1. 04 Sep, 2013 1 commit
    • fpizlo@apple.com's avatar
      The DFG should be able to tier-up and OSR enter into the FTL · 532f1e51
      fpizlo@apple.com authored
      https://bugs.webkit.org/show_bug.cgi?id=112838
      
      Source/JavaScriptCore: 
      
      Reviewed by Mark Hahnenberg.
              
      This adds the ability for the DFG to tier-up into the FTL. This works in both
      of the expected tier-up modes:
              
      Replacement: frequently called functions eventually have their entrypoint
      replaced with one that goes into FTL-compiled code. Note, this will be a
      slow-down for now since we don't yet have LLVM calling convention integration.
              
      OSR entry: code stuck in hot loops gets OSR'd into the FTL from the DFG.
              
      This means that if the DFG detects that a function is an FTL candidate, it
      inserts execution counting code similar to the kind that the baseline JIT
      would use. If you trip on a loop count in a loop header that is an OSR
      candidate (it's not an inlined loop), we do OSR; otherwise we do replacement.
      OSR almost always also implies future replacement.
              
      OSR entry into the FTL is really cool. It uses a specialized FTL compile of
      the code, where early in the DFG pipeline we replace the original root block
      with an OSR entrypoint block that jumps to the pre-header of the hot loop.
      The OSR entrypoint loads all live state at the loop pre-header using loads
      from a scratch buffer, which gets populated by the runtime's OSR entry
      preparation code (FTL::prepareOSREntry()). This approach appears to work well
      with all of our subsequent optimizations, including prediction propagation,
      CFA, and LICM. LLVM seems happy with it, too. Best of all, it works naturally
      with concurrent compilation: when we hit the tier-up trigger we spawn a
      compilation plan at the bytecode index from which we triggered; once the
      compilation finishes the next trigger will try to enter, at that bytecode
      index. If it can't - for example because the code has moved on to another
      loop - then we just try again. Loops that get hot enough for OSR entry (about
      25,000 iterations) will probably still be running when a concurrent compile
      finishes, so this doesn't appear to be a big problem.
              
      This immediately gives us a 70% speed-up on imaging-gaussian-blur. We could
      get a bigger speed-up by adding some more intelligence and tweaking LLVM to
      compile code faster. Those things will happen eventually but this is a good
      start. Probably this code will see more tuning as we get more coverage in the
      FTL JIT, but I'll worry about that in future patches.
      
      * CMakeLists.txt:
      * GNUmakefile.list.am:
      * JavaScriptCore.xcodeproj/project.pbxproj:
      * Target.pri:
      * bytecode/CodeBlock.cpp:
      (JSC::CodeBlock::CodeBlock):
      (JSC::CodeBlock::hasOptimizedReplacement):
      (JSC::CodeBlock::setOptimizationThresholdBasedOnCompilationResult):
      * bytecode/CodeBlock.h:
      * dfg/DFGAbstractInterpreterInlines.h:
      (JSC::DFG::::executeEffects):
      * dfg/DFGByteCodeParser.cpp:
      (JSC::DFG::ByteCodeParser::parseBlock):
      (JSC::DFG::ByteCodeParser::parse):
      * dfg/DFGCFGSimplificationPhase.cpp:
      (JSC::DFG::CFGSimplificationPhase::run):
      * dfg/DFGClobberize.h:
      (JSC::DFG::clobberize):
      * dfg/DFGDriver.cpp:
      (JSC::DFG::compileImpl):
      (JSC::DFG::compile):
      * dfg/DFGDriver.h:
      * dfg/DFGFixupPhase.cpp:
      (JSC::DFG::FixupPhase::fixupNode):
      * dfg/DFGGraph.cpp:
      (JSC::DFG::Graph::dump):
      (JSC::DFG::Graph::killBlockAndItsContents):
      (JSC::DFG::Graph::killUnreachableBlocks):
      * dfg/DFGGraph.h:
      * dfg/DFGInPlaceAbstractState.cpp:
      (JSC::DFG::InPlaceAbstractState::initialize):
      * dfg/DFGJITCode.cpp:
      (JSC::DFG::JITCode::reconstruct):
      (JSC::DFG::JITCode::checkIfOptimizationThresholdReached):
      (JSC::DFG::JITCode::optimizeNextInvocation):
      (JSC::DFG::JITCode::dontOptimizeAnytimeSoon):
      (JSC::DFG::JITCode::optimizeAfterWarmUp):
      (JSC::DFG::JITCode::optimizeSoon):
      (JSC::DFG::JITCode::forceOptimizationSlowPathConcurrently):
      (JSC::DFG::JITCode::setOptimizationThresholdBasedOnCompilationResult):
      * dfg/DFGJITCode.h:
      * dfg/DFGJITFinalizer.cpp:
      (JSC::DFG::JITFinalizer::finalize):
      (JSC::DFG::JITFinalizer::finalizeFunction):
      (JSC::DFG::JITFinalizer::finalizeCommon):
      * dfg/DFGLoopPreHeaderCreationPhase.cpp:
      (JSC::DFG::createPreHeader):
      (JSC::DFG::LoopPreHeaderCreationPhase::run):
      * dfg/DFGLoopPreHeaderCreationPhase.h:
      * dfg/DFGNode.h:
      (JSC::DFG::Node::hasUnlinkedLocal):
      (JSC::DFG::Node::unlinkedLocal):
      * dfg/DFGNodeType.h:
      * dfg/DFGOSREntry.cpp:
      (JSC::DFG::prepareOSREntry):
      * dfg/DFGOSREntrypointCreationPhase.cpp: Added.
      (JSC::DFG::OSREntrypointCreationPhase::OSREntrypointCreationPhase):
      (JSC::DFG::OSREntrypointCreationPhase::run):
      (JSC::DFG::performOSREntrypointCreation):
      * dfg/DFGOSREntrypointCreationPhase.h: Added.
      * dfg/DFGOperations.cpp:
      * dfg/DFGOperations.h:
      * dfg/DFGPlan.cpp:
      (JSC::DFG::Plan::Plan):
      (JSC::DFG::Plan::compileInThread):
      (JSC::DFG::Plan::compileInThreadImpl):
      * dfg/DFGPlan.h:
      * dfg/DFGPredictionInjectionPhase.cpp:
      (JSC::DFG::PredictionInjectionPhase::run):
      * dfg/DFGPredictionPropagationPhase.cpp:
      (JSC::DFG::PredictionPropagationPhase::propagate):
      * dfg/DFGSafeToExecute.h:
      (JSC::DFG::safeToExecute):
      * dfg/DFGSpeculativeJIT32_64.cpp:
      (JSC::DFG::SpeculativeJIT::compile):
      * dfg/DFGSpeculativeJIT64.cpp:
      (JSC::DFG::SpeculativeJIT::compile):
      * dfg/DFGTierUpCheckInjectionPhase.cpp: Added.
      (JSC::DFG::TierUpCheckInjectionPhase::TierUpCheckInjectionPhase):
      (JSC::DFG::TierUpCheckInjectionPhase::run):
      (JSC::DFG::performTierUpCheckInjection):
      * dfg/DFGTierUpCheckInjectionPhase.h: Added.
      * dfg/DFGToFTLDeferredCompilationCallback.cpp: Added.
      (JSC::DFG::ToFTLDeferredCompilationCallback::ToFTLDeferredCompilationCallback):
      (JSC::DFG::ToFTLDeferredCompilationCallback::~ToFTLDeferredCompilationCallback):
      (JSC::DFG::ToFTLDeferredCompilationCallback::create):
      (JSC::DFG::ToFTLDeferredCompilationCallback::compilationDidBecomeReadyAsynchronously):
      (JSC::DFG::ToFTLDeferredCompilationCallback::compilationDidComplete):
      * dfg/DFGToFTLDeferredCompilationCallback.h: Added.
      * dfg/DFGToFTLForOSREntryDeferredCompilationCallback.cpp: Added.
      (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::ToFTLForOSREntryDeferredCompilationCallback):
      (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::~ToFTLForOSREntryDeferredCompilationCallback):
      (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::create):
      (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidBecomeReadyAsynchronously):
      (JSC::DFG::ToFTLForOSREntryDeferredCompilationCallback::compilationDidComplete):
      * dfg/DFGToFTLForOSREntryDeferredCompilationCallback.h: Added.
      * dfg/DFGWorklist.cpp:
      (JSC::DFG::globalWorklist):
      * dfg/DFGWorklist.h:
      * ftl/FTLCapabilities.cpp:
      (JSC::FTL::canCompile):
      * ftl/FTLCapabilities.h:
      * ftl/FTLForOSREntryJITCode.cpp: Added.
      (JSC::FTL::ForOSREntryJITCode::ForOSREntryJITCode):
      (JSC::FTL::ForOSREntryJITCode::~ForOSREntryJITCode):
      (JSC::FTL::ForOSREntryJITCode::ftlForOSREntry):
      (JSC::FTL::ForOSREntryJITCode::initializeEntryBuffer):
      * ftl/FTLForOSREntryJITCode.h: Added.
      (JSC::FTL::ForOSREntryJITCode::entryBuffer):
      (JSC::FTL::ForOSREntryJITCode::setBytecodeIndex):
      (JSC::FTL::ForOSREntryJITCode::bytecodeIndex):
      (JSC::FTL::ForOSREntryJITCode::countEntryFailure):
      (JSC::FTL::ForOSREntryJITCode::entryFailureCount):
      * ftl/FTLJITFinalizer.cpp:
      (JSC::FTL::JITFinalizer::finalizeFunction):
      * ftl/FTLLink.cpp:
      (JSC::FTL::link):
      * ftl/FTLLowerDFGToLLVM.cpp:
      (JSC::FTL::LowerDFGToLLVM::compileBlock):
      (JSC::FTL::LowerDFGToLLVM::compileNode):
      (JSC::FTL::LowerDFGToLLVM::compileExtractOSREntryLocal):
      (JSC::FTL::LowerDFGToLLVM::compileGetLocal):
      (JSC::FTL::LowerDFGToLLVM::addWeakReference):
      * ftl/FTLOSREntry.cpp: Added.
      (JSC::FTL::prepareOSREntry):
      * ftl/FTLOSREntry.h: Added.
      * ftl/FTLOutput.h:
      (JSC::FTL::Output::crashNonTerminal):
      (JSC::FTL::Output::crash):
      * ftl/FTLState.cpp:
      (JSC::FTL::State::State):
      * interpreter/Register.h:
      (JSC::Register::unboxedDouble):
      * jit/JIT.cpp:
      (JSC::JIT::emitEnterOptimizationCheck):
      * jit/JITCode.cpp:
      (JSC::JITCode::ftlForOSREntry):
      * jit/JITCode.h:
      * jit/JITStubs.cpp:
      (JSC::DEFINE_STUB_FUNCTION):
      * runtime/Executable.cpp:
      (JSC::ScriptExecutable::newReplacementCodeBlockFor):
      * runtime/Options.h:
      * runtime/VM.cpp:
      (JSC::VM::ensureWorklist):
      * runtime/VM.h:
      
      LayoutTests: 
      
      Reviewed by Mark Hahnenberg.
              
      Fix marsaglia to check the result instead of printing, and add a second
      version that relies on OSR entry.
      
      * fast/js/regress/marsaglia-osr-entry-expected.txt: Added.
      * fast/js/regress/marsaglia-osr-entry.html: Added.
      * fast/js/regress/script-tests/marsaglia-osr-entry.js: Added.
      (marsaglia):
      * fast/js/regress/script-tests/marsaglia.js:
      
      
      
      git-svn-id: http://svn.webkit.org/repository/webkit/trunk@155023 268f45cc-cd09-0410-ab3c-d52691b4dbfc
      532f1e51
  2. 25 Jul, 2013 2 commits
    • oliver@apple.com's avatar
      fourthTier: Add a phase to create loop pre-headers · 5663e31c
      oliver@apple.com authored
      https://bugs.webkit.org/show_bug.cgi?id=118778
      
      Reviewed by Oliver Hunt.
      
      Add a loop pre-header creation phase. Any loop that doesn't already have
      just one predecessor that isn't part of the loop has a pre-header
      prepended. All non-loop predecessors then jump to that pre-header.
      
      Also fix a handful of bugs:
      
      - DFG::Analysis should set m_valid before running the analysis, since that
        makes it easier to use ASSERT(m_valid) in the analysis' methods, which
        may be called by the analysis before the analysis completes. NaturalLoops
        does this with loopsOf().
      
      - NaturalLoops::headerOf() was missing a check for innerMostLoopOf()
        returning 0, since that'll happen if the block isn't in any loop.
      
      - Change BlockInsertionSet to dethread the graph, since anyone using it
        will want to do so.
      
      - Change dethreading to ignore SSA form graphs.
      
      This also adds NaturalLoops::belongsTo(), which I always used in the
      pre-header creation phase. I didn't end up using it but I'll probably use
      it in the near future.
      
      * JavaScriptCore.xcodeproj/project.pbxproj:
      * dfg/DFGAnalysis.h:
      (JSC::DFG::Analysis::computeIfNecessary):
      * dfg/DFGBlockInsertionSet.cpp:
      (JSC::DFG::BlockInsertionSet::execute):
      * dfg/DFGCriticalEdgeBreakingPhase.cpp:
      (JSC::DFG::CriticalEdgeBreakingPhase::breakCriticalEdge):
      * dfg/DFGGraph.cpp:
      (JSC::DFG::Graph::dethread):
      * dfg/DFGLoopPreHeaderCreationPhase.cpp: Added.
      (DFG):
      (LoopPreHeaderCreationPhase):
      (JSC::DFG::LoopPreHeaderCreationPhase::LoopPreHeaderCreationPhase):
      (JSC::DFG::LoopPreHeaderCreationPhase::run):
      (JSC::DFG::performLoopPreHeaderCreation):
      * dfg/DFGLoopPreHeaderCreationPhase.h: Added.
      (DFG):
      * dfg/DFGNaturalLoops.h:
      (NaturalLoop):
      (JSC::DFG::NaturalLoops::headerOf):
      (JSC::DFG::NaturalLoops::innerMostLoopOf):
      (JSC::DFG::NaturalLoops::innerMostOuterLoop):
      (JSC::DFG::NaturalLoops::belongsTo):
      (NaturalLoops):
      * dfg/DFGPlan.cpp:
      (JSC::DFG::Plan::compileInThreadImpl):
      
      Conflicts:
      	Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj
      
      git-svn-id: http://svn.webkit.org/repository/webkit/trunk@153279 268f45cc-cd09-0410-ab3c-d52691b4dbfc
      5663e31c
    • oliver@apple.com's avatar
      fourthTier: DFG should know how to find natural loops · fc3f1177
      oliver@apple.com authored
      https://bugs.webkit.org/show_bug.cgi?id=118152
      
      Reviewed by Mark Hahnenberg.
      
      There are a bunch of things we can do when we know where the loops are.
      Previously we didn't. With this patch, we do.
      
      This patch adds the classic dominator based natural loop finder.
      
      The only client of this right now is the DFG::Disassembler. It prints out
      a summary of the analysis for each block.
      
      This will become more important when I do
      https://bugs.webkit.org/show_bug.cgi?id=118151, which definitely requires
      this kind of analysis, at least if we want to do the optimization over
      DFG IR (and I'm pretty sure we do).
      
      * JavaScriptCore.xcodeproj/project.pbxproj:
      * dfg/DFGAnalysis.h: Added.
      (DFG):
      (Analysis):
      (JSC::DFG::Analysis::Analysis):
      (JSC::DFG::Analysis::invalidate):
      (JSC::DFG::Analysis::computeIfNecessary):
      (JSC::DFG::Analysis::isValid):
      * dfg/DFGCFGSimplificationPhase.cpp:
      (JSC::DFG::CFGSimplificationPhase::run):
      * dfg/DFGDisassembler.cpp:
      (JSC::DFG::Disassembler::createDumpList):
      * dfg/DFGDominators.cpp:
      (JSC::DFG::Dominators::Dominators):
      (JSC::DFG::Dominators::compute):
      * dfg/DFGDominators.h:
      (Dominators):
      * dfg/DFGGraph.cpp:
      (JSC::DFG::Graph::dumpBlockHeader):
      (JSC::DFG::Graph::invalidateCFG):
      (DFG):
      * dfg/DFGGraph.h:
      (Graph):
      * dfg/DFGNaturalLoops.cpp: Added.
      (DFG):
      (JSC::DFG::NaturalLoop::dump):
      (JSC::DFG::NaturalLoops::NaturalLoops):
      (JSC::DFG::NaturalLoops::~NaturalLoops):
      (JSC::DFG::NaturalLoops::compute):
      (JSC::DFG::NaturalLoops::loopsOf):
      (JSC::DFG::NaturalLoops::dump):
      * dfg/DFGNaturalLoops.h: Added.
      (DFG):
      (NaturalLoop):
      (JSC::DFG::NaturalLoop::NaturalLoop):
      (JSC::DFG::NaturalLoop::addBlock):
      (JSC::DFG::NaturalLoop::header):
      (JSC::DFG::NaturalLoop::size):
      (JSC::DFG::NaturalLoop::at):
      (JSC::DFG::NaturalLoop::operator[]):
      (JSC::DFG::NaturalLoop::contains):
      (NaturalLoops):
      (JSC::DFG::NaturalLoops::numLoops):
      (JSC::DFG::NaturalLoops::loop):
      (JSC::DFG::NaturalLoops::headerOf):
      
      git-svn-id: http://svn.webkit.org/repository/webkit/trunk@153257 268f45cc-cd09-0410-ab3c-d52691b4dbfc
      fc3f1177
  3. 22 May, 2012 1 commit
    • fpizlo@apple.com's avatar
      DFG should be able to compute dominators · ba79d1f9
      fpizlo@apple.com authored
      https://bugs.webkit.org/show_bug.cgi?id=85269
      
      Source/JavaScriptCore: 
      
      Reviewed by Oliver Hunt.
              
      Merged r115754 from dfgopt.
              
      Implements a naive dominator calculator, which is currently just used to
      print information in graph dumps. I've enabled it by default mainly to
      be able to track its performance impact. So far it appears that there is
      none, which is unsurprising given that the number of basic blocks in most
      procedures is small.
              
      Also tweaked bytecode dumping to reveal more useful information about the
      nature of the code block.
      
      * CMakeLists.txt:
      * GNUmakefile.list.am:
      * JavaScriptCore.xcodeproj/project.pbxproj:
      * Target.pri:
      * bytecode/CodeBlock.cpp:
      (JSC::CodeBlock::dump):
      * dfg/DFGDominators.cpp: Added.
      (DFG):
      (JSC::DFG::Dominators::Dominators):
      (JSC::DFG::Dominators::~Dominators):
      (JSC::DFG::Dominators::compute):
      (JSC::DFG::Dominators::iterateForBlock):
      * dfg/DFGDominators.h: Added.
      (DFG):
      (Dominators):
      (JSC::DFG::Dominators::invalidate):
      (JSC::DFG::Dominators::computeIfNecessary):
      (JSC::DFG::Dominators::isValid):
      (JSC::DFG::Dominators::dominates):
      * dfg/DFGDriver.cpp:
      (JSC::DFG::compile):
      * dfg/DFGGraph.cpp:
      (JSC::DFG::Graph::dump):
      * dfg/DFGGraph.h:
      (Graph):
      
      Source/WTF: 
      
      Reviewed by Oliver Hunt.
              
      Merged r115754 from dfgopt.
              
      Added a bitvector class suitable for cheap static analysis. This class
      differs from BitVector in that instead of optimizing for space, it
      optimizes for execution time. Its API is also somewhat less friendly,
      which is intentional; it's meant to be used in places where you know
      up front how bit your bitvectors are going to be.
      
      * GNUmakefile.list.am:
      * WTF.vcproj/WTF.vcproj:
      * WTF.xcodeproj/project.pbxproj:
      * wtf/FastBitVector.h: Added.
      (WTF):
      (FastBitVector):
      (WTF::FastBitVector::FastBitVector):
      (WTF::FastBitVector::operator=):
      (WTF::FastBitVector::numBits):
      (WTF::FastBitVector::resize):
      (WTF::FastBitVector::setAll):
      (WTF::FastBitVector::clearAll):
      (WTF::FastBitVector::set):
      (WTF::FastBitVector::setAndCheck):
      (WTF::FastBitVector::equals):
      (WTF::FastBitVector::merge):
      (WTF::FastBitVector::filter):
      (WTF::FastBitVector::exclude):
      (WTF::FastBitVector::clear):
      (WTF::FastBitVector::get):
      (WTF::FastBitVector::arrayLength):
      
      
      
      git-svn-id: http://svn.webkit.org/repository/webkit/trunk@117861 268f45cc-cd09-0410-ab3c-d52691b4dbfc
      ba79d1f9