-
oliver@apple.com authored
https://bugs.webkit.org/show_bug.cgi?id=112839 Source/JavaScriptCore: Reviewed by Geoffrey Garen. This is the final bit of concurrent JITing. The idea is that there is a single global worklist, and a single global thread, that does all optimizing compilation. This is the DFG::Worklist. It contains a queue of DFG::Plans, and a map from CodeBlock* (the baseline code block we're trying to optimize) to DFG::Plan. If the DFGDriver tries to concurrently compile something, it puts the Plan on the Worklist. The Worklist's thread will compile that Plan eventually, and when it's done, it will signal its completion by (1) notifying anyone waiting for the Worklist to be done, and (2) forcing the CodeBlock::m_jitExecuteCounter to take slow path. The next Baseline JIT cti_optimize call will then install all ready (i.e. compiled) Plans for that VM. Note that (1) is only for the GC and VM shutdown, which will want to ensure that there aren't any outstanding async compilations before proceeding. They do so by simply waiting for all of the plans for the current VM to complete. (2) is the actual way that code typically gets installed. This is all very racy by design. For example, just as we try to force the execute counter to take slow path, the main thread may be setting the execute counter to some other value. The main thread must set it to another value because (a) JIT code is constantly incrementing the counter in a racy way, (b) the cti_optimize slow path will set it to some large-ish negative value to ensure that cti_optimize isn't called repeatedly, and (c) OSR exits from previously jettisoned code blocks may still want to reset the counter values. This "race" is made benign, by ensuring that while there is an asynchronous compilation, we at worse set the counter to optimizeAfterWarmUp and never to deferIndefinitely. Hence if the race happens then the worst case is that we wait another ~1000 counts before installing the optimized code. Another defense is that if any CodeBlock calls into cti_optimize, then it will check for all ready plans for the VM - so even if a code block has to wait another ~1000 executions before it calls cti_optimize to do the installation, it may actually end up being installed sooner because a different code block had called cti_optimize, potentially for an unrelated reason. Special care is taken to ensure that installing plans informs the GC about the increased memory usage, but also ensures that we don't recurse infinitely - since at start of GC we try to install outstanding plans. This is done by introducing a new GC deferral mechanism (the DeferGC block-scoped thingy), which will ensure that GCs don't happen in the scope but are allowed to happen after. This still leaves the strange corner case that cti_optimize may install outstanding plans, then GC, and that GC may jettison the code block that was installed. This, and the fact that the plan that we took slow path to install could have been a failed or invalid compile, mean that we have to take special precautions in cti_optimize. This patch also fixes a number of small concurrency bugs that I found when things started running. There are probably more of those bugs still left to fix. This patch just fixes the ones I know about. Concurrent compilation is right now only enabled on X86_64 Mac. We need platforms that are sufficiently CAStastic so that we can do the various memory fence and CAS tricks that make this safe. We also need a platform that uses JSVALUE64. And we need pthread_once. So, that pretty much means just X64_64 for now. Enabling Linux-64_64 should be a breeze, but I'll leave that up to the Qt and GTK+ ports to do at their discretion. This is a solid speed-up on SunSpider (8-9%) and V8Spider (16%), our two main compile-time benchmarks. Most peculiarly, this also appears to reduce measurement noise, rather than increasing it as you would have expected. I don't understand that result but I like it anyway. On the other hand, this is a slight (1%) slow-down on V8v7. I will continue to investigate this but I think that the results are already good enough that we should land this as-is. So far, it appears that the slow-down is due to this breaking the don't-compile-inlineables heuristics. See investigation in https://bugs.webkit.org/show_bug.cgi?id=116556 and the bug https://bugs.webkit.org/show_bug.cgi?id=116557. * JavaScriptCore.xcodeproj/project.pbxproj: * bytecode/CodeBlock.cpp: (JSC): (JSC::CodeBlock::finalizeUnconditionally): (JSC::CodeBlock::resetStubInternal): (JSC::CodeBlock::baselineVersion): (JSC::CodeBlock::hasOptimizedReplacement): (JSC::CodeBlock::optimizationThresholdScalingFactor): (JSC::CodeBlock::checkIfOptimizationThresholdReached): (JSC::CodeBlock::optimizeNextInvocation): (JSC::CodeBlock::dontOptimizeAnytimeSoon): (JSC::CodeBlock::optimizeAfterWarmUp): (JSC::CodeBlock::optimizeAfterLongWarmUp): (JSC::CodeBlock::optimizeSoon): (JSC::CodeBlock::forceOptimizationSlowPathConcurrently): (JSC::CodeBlock::setOptimizationThresholdBasedOnCompilationResult): (JSC::CodeBlock::updateAllPredictionsAndCountLiveness): (JSC::CodeBlock::updateAllArrayPredictions): (JSC::CodeBlock::shouldOptimizeNow): * bytecode/CodeBlock.h: (CodeBlock): (JSC::CodeBlock::jitCompile): * bytecode/CodeBlockLock.h: (JSC): * bytecode/ExecutionCounter.cpp: (JSC::ExecutionCounter::forceSlowPathConcurrently): (JSC): (JSC::ExecutionCounter::setThreshold): * bytecode/ExecutionCounter.h: (ExecutionCounter): * debugger/Debugger.cpp: (JSC::Debugger::recompileAllJSFunctions): * dfg/DFGByteCodeParser.cpp: (JSC::DFG::ByteCodeParser::injectLazyOperandSpeculation): (JSC::DFG::ByteCodeParser::getArrayMode): (JSC::DFG::ByteCodeParser::getArrayModeAndEmitChecks): * dfg/DFGCommon.h: (JSC::DFG::enableConcurrentJIT): (DFG): * dfg/DFGDriver.cpp: (JSC::DFG::compile): * dfg/DFGGraph.cpp: (JSC::DFG::Graph::Graph): * dfg/DFGGraph.h: (Graph): * dfg/DFGOSREntry.cpp: (JSC::DFG::prepareOSREntry): * dfg/DFGOperations.cpp: * dfg/DFGPlan.cpp: (JSC::DFG::Plan::Plan): (JSC::DFG::Plan::compileInThread): (JSC::DFG::Plan::key): (DFG): * dfg/DFGPlan.h: (DFG): (Plan): * dfg/DFGWorklist.cpp: Added. (DFG): (JSC::DFG::Worklist::Worklist): (JSC::DFG::Worklist::~Worklist): (JSC::DFG::Worklist::finishCreation): (JSC::DFG::Worklist::create): (JSC::DFG::Worklist::enqueue): (JSC::DFG::Worklist::compilationState): (JSC::DFG::Worklist::waitUntilAllPlansForVMAreReady): (JSC::DFG::Worklist::removeAllReadyPlansForVM): (JSC::DFG::Worklist::completeAllReadyPlansForVM): (JSC::DFG::Worklist::completeAllPlansForVM): (JSC::DFG::Worklist::queueLength): (JSC::DFG::Worklist::dump): (JSC::DFG::Worklist::runThread): (JSC::DFG::Worklist::threadFunction): (JSC::DFG::initializeGlobalWorklistOnce): (JSC::DFG::globalWorklist): * dfg/DFGWorklist.h: Added. (DFG): (Worklist): * heap/CopiedSpaceInlines.h: (JSC::CopiedSpace::allocateBlock): * heap/DeferGC.h: Added. (JSC): (DeferGC): (JSC::DeferGC::DeferGC): (JSC::DeferGC::~DeferGC): * heap/Heap.cpp: (JSC::Heap::Heap): (JSC::Heap::reportExtraMemoryCostSlowCase): (JSC::Heap::collectAllGarbage): (JSC::Heap::collect): (JSC::Heap::collectIfNecessaryOrDefer): (JSC): (JSC::Heap::incrementDeferralDepth): (JSC::Heap::decrementDeferralDepthAndGCIfNeeded): * heap/Heap.h: (Heap): (JSC::Heap::isCollecting): (JSC): * heap/MarkedAllocator.cpp: (JSC::MarkedAllocator::allocateSlowCase): * jit/JIT.cpp: (JSC::JIT::privateCompile): * jit/JIT.h: * jit/JITStubs.cpp: (JSC::DEFINE_STUB_FUNCTION): * llint/LLIntSlowPaths.cpp: (JSC::LLInt::jitCompileAndSetHeuristics): (JSC::LLInt::entryOSR): (JSC::LLInt::LLINT_SLOW_PATH_DECL): * profiler/ProfilerBytecodes.h: * runtime/ConcurrentJITLock.h: Added. (JSC): * runtime/ExecutionHarness.h: (JSC::replaceWithDeferredOptimizedCode): * runtime/JSSegmentedVariableObject.cpp: (JSC::JSSegmentedVariableObject::findRegisterIndex): (JSC::JSSegmentedVariableObject::addRegisters): * runtime/JSSegmentedVariableObject.h: (JSSegmentedVariableObject): * runtime/Options.h: (JSC): * runtime/Structure.h: (Structure): * runtime/StructureInlines.h: (JSC::Structure::propertyTable): * runtime/SymbolTable.h: (SymbolTable): * runtime/VM.cpp: (JSC::VM::VM): (JSC::VM::~VM): (JSC::VM::prepareToDiscardCode): (JSC): (JSC::VM::discardAllCode): (JSC::VM::releaseExecutableMemory): * runtime/VM.h: (DFG): (VM): Source/WTF: Reviewed by Geoffrey Garen. * wtf/ByteSpinLock.h: Make it non-copyable. We previously had bugs where we used ByteSpinLock as a locker. Clearly that's bad. * wtf/MetaAllocatorHandle.h: Make it thread-safe ref-counted, since we may now be passing them between the concurrent JIT thread and the main thread. * wtf/Vector.h: (WTF::Vector::takeLast): I've wanted this method for ages, and now I finally added. git-svn-id: http://svn.webkit.org/repository/webkit/trunk@153169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
284cc3d6