Commit 284cc3d6 authored by oliver@apple.com's avatar oliver@apple.com

fourthTier: DFG should be able to run on a separate thread

https://bugs.webkit.org/show_bug.cgi?id=112839

Source/JavaScriptCore:

Reviewed by Geoffrey Garen.

This is the final bit of concurrent JITing. The idea is that there is a
single global worklist, and a single global thread, that does all
optimizing compilation. This is the DFG::Worklist. It contains a queue of
DFG::Plans, and a map from CodeBlock* (the baseline code block we're
trying to optimize) to DFG::Plan. If the DFGDriver tries to concurrently
compile something, it puts the Plan on the Worklist. The Worklist's
thread will compile that Plan eventually, and when it's done, it will
signal its completion by (1) notifying anyone waiting for the Worklist to
be done, and (2) forcing the CodeBlock::m_jitExecuteCounter to take slow
path. The next Baseline JIT cti_optimize call will then install all ready
(i.e. compiled) Plans for that VM. Note that (1) is only for the GC and
VM shutdown, which will want to ensure that there aren't any outstanding
async compilations before proceeding. They do so by simply waiting for
all of the plans for the current VM to complete. (2) is the actual way
that code typically gets installed.

This is all very racy by design. For example, just as we try to force the
execute counter to take slow path, the main thread may be setting the
execute counter to some other value. The main thread must set it to
another value because (a) JIT code is constantly incrementing the counter
in a racy way, (b) the cti_optimize slow path will set it to some
large-ish negative value to ensure that cti_optimize isn't called
repeatedly, and (c) OSR exits from previously jettisoned code blocks may
still want to reset the counter values. This "race" is made benign, by
ensuring that while there is an asynchronous compilation, we at worse set
the counter to optimizeAfterWarmUp and never to deferIndefinitely. Hence
if the race happens then the worst case is that we wait another ~1000
counts before installing the optimized code. Another defense is that if
any CodeBlock calls into cti_optimize, then it will check for all ready
plans for the VM - so even if a code block has to wait another ~1000
executions before it calls cti_optimize to do the installation, it may
actually end up being installed sooner because a different code block had
called cti_optimize, potentially for an unrelated reason.

Special care is taken to ensure that installing plans informs the GC
about the increased memory usage, but also ensures that we don't recurse
infinitely - since at start of GC we try to install outstanding plans.
This is done by introducing a new GC deferral mechanism (the DeferGC
block-scoped thingy), which will ensure that GCs don't happen in the
scope but are allowed to happen after. This still leaves the strange
corner case that cti_optimize may install outstanding plans, then GC, and
that GC may jettison the code block that was installed. This, and the
fact that the plan that we took slow path to install could have been a
failed or invalid compile, mean that we have to take special precautions
in cti_optimize.

This patch also fixes a number of small concurrency bugs that I found
when things started running. There are probably more of those bugs still
left to fix. This patch just fixes the ones I know about.

Concurrent compilation is right now only enabled on X86_64 Mac. We need
platforms that are sufficiently CAStastic so that we can do the various
memory fence and CAS tricks that make this safe. We also need a platform
that uses JSVALUE64. And we need pthread_once. So, that pretty much means
just X64_64 for now. Enabling Linux-64_64 should be a breeze, but I'll
leave that up to the Qt and GTK+ ports to do at their discretion.

This is a solid speed-up on SunSpider (8-9%) and V8Spider (16%), our two
main compile-time benchmarks. Most peculiarly, this also appears to
reduce measurement noise, rather than increasing it as you would have
expected. I don't understand that result but I like it anyway. On the
other hand, this is a slight (1%) slow-down on V8v7. I will continue to
investigate this but I think that the results are already good enough
that we should land this as-is. So far, it appears that the slow-down is
due to this breaking the don't-compile-inlineables heuristics. See
investigation in https://bugs.webkit.org/show_bug.cgi?id=116556 and the
bug https://bugs.webkit.org/show_bug.cgi?id=116557.

* JavaScriptCore.xcodeproj/project.pbxproj:
* bytecode/CodeBlock.cpp:
(JSC):
(JSC::CodeBlock::finalizeUnconditionally):
(JSC::CodeBlock::resetStubInternal):
(JSC::CodeBlock::baselineVersion):
(JSC::CodeBlock::hasOptimizedReplacement):
(JSC::CodeBlock::optimizationThresholdScalingFactor):
(JSC::CodeBlock::checkIfOptimizationThresholdReached):
(JSC::CodeBlock::optimizeNextInvocation):
(JSC::CodeBlock::dontOptimizeAnytimeSoon):
(JSC::CodeBlock::optimizeAfterWarmUp):
(JSC::CodeBlock::optimizeAfterLongWarmUp):
(JSC::CodeBlock::optimizeSoon):
(JSC::CodeBlock::forceOptimizationSlowPathConcurrently):
(JSC::CodeBlock::setOptimizationThresholdBasedOnCompilationResult):
(JSC::CodeBlock::updateAllPredictionsAndCountLiveness):
(JSC::CodeBlock::updateAllArrayPredictions):
(JSC::CodeBlock::shouldOptimizeNow):
* bytecode/CodeBlock.h:
(CodeBlock):
(JSC::CodeBlock::jitCompile):
* bytecode/CodeBlockLock.h:
(JSC):
* bytecode/ExecutionCounter.cpp:
(JSC::ExecutionCounter::forceSlowPathConcurrently):
(JSC):
(JSC::ExecutionCounter::setThreshold):
* bytecode/ExecutionCounter.h:
(ExecutionCounter):
* debugger/Debugger.cpp:
(JSC::Debugger::recompileAllJSFunctions):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::injectLazyOperandSpeculation):
(JSC::DFG::ByteCodeParser::getArrayMode):
(JSC::DFG::ByteCodeParser::getArrayModeAndEmitChecks):
* dfg/DFGCommon.h:
(JSC::DFG::enableConcurrentJIT):
(DFG):
* dfg/DFGDriver.cpp:
(JSC::DFG::compile):
* dfg/DFGGraph.cpp:
(JSC::DFG::Graph::Graph):
* dfg/DFGGraph.h:
(Graph):
* dfg/DFGOSREntry.cpp:
(JSC::DFG::prepareOSREntry):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::Plan):
(JSC::DFG::Plan::compileInThread):
(JSC::DFG::Plan::key):
(DFG):
* dfg/DFGPlan.h:
(DFG):
(Plan):
* dfg/DFGWorklist.cpp: Added.
(DFG):
(JSC::DFG::Worklist::Worklist):
(JSC::DFG::Worklist::~Worklist):
(JSC::DFG::Worklist::finishCreation):
(JSC::DFG::Worklist::create):
(JSC::DFG::Worklist::enqueue):
(JSC::DFG::Worklist::compilationState):
(JSC::DFG::Worklist::waitUntilAllPlansForVMAreReady):
(JSC::DFG::Worklist::removeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllPlansForVM):
(JSC::DFG::Worklist::queueLength):
(JSC::DFG::Worklist::dump):
(JSC::DFG::Worklist::runThread):
(JSC::DFG::Worklist::threadFunction):
(JSC::DFG::initializeGlobalWorklistOnce):
(JSC::DFG::globalWorklist):
* dfg/DFGWorklist.h: Added.
(DFG):
(Worklist):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::allocateBlock):
* heap/DeferGC.h: Added.
(JSC):
(DeferGC):
(JSC::DeferGC::DeferGC):
(JSC::DeferGC::~DeferGC):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::reportExtraMemoryCostSlowCase):
(JSC::Heap::collectAllGarbage):
(JSC::Heap::collect):
(JSC::Heap::collectIfNecessaryOrDefer):
(JSC):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
* heap/Heap.h:
(Heap):
(JSC::Heap::isCollecting):
(JSC):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::allocateSlowCase):
* jit/JIT.cpp:
(JSC::JIT::privateCompile):
* jit/JIT.h:
* jit/JITStubs.cpp:
(JSC::DEFINE_STUB_FUNCTION):
* llint/LLIntSlowPaths.cpp:
(JSC::LLInt::jitCompileAndSetHeuristics):
(JSC::LLInt::entryOSR):
(JSC::LLInt::LLINT_SLOW_PATH_DECL):
* profiler/ProfilerBytecodes.h:
* runtime/ConcurrentJITLock.h: Added.
(JSC):
* runtime/ExecutionHarness.h:
(JSC::replaceWithDeferredOptimizedCode):
* runtime/JSSegmentedVariableObject.cpp:
(JSC::JSSegmentedVariableObject::findRegisterIndex):
(JSC::JSSegmentedVariableObject::addRegisters):
* runtime/JSSegmentedVariableObject.h:
(JSSegmentedVariableObject):
* runtime/Options.h:
(JSC):
* runtime/Structure.h:
(Structure):
* runtime/StructureInlines.h:
(JSC::Structure::propertyTable):
* runtime/SymbolTable.h:
(SymbolTable):
* runtime/VM.cpp:
(JSC::VM::VM):
(JSC::VM::~VM):
(JSC::VM::prepareToDiscardCode):
(JSC):
(JSC::VM::discardAllCode):
(JSC::VM::releaseExecutableMemory):
* runtime/VM.h:
(DFG):
(VM):

Source/WTF:

Reviewed by Geoffrey Garen.

* wtf/ByteSpinLock.h:
Make it non-copyable. We previously had bugs where we used ByteSpinLock as a locker.
Clearly that's bad.

* wtf/MetaAllocatorHandle.h:
Make it thread-safe ref-counted, since we may now be passing them between the
concurrent JIT thread and the main thread.

* wtf/Vector.h:
(WTF::Vector::takeLast):
I've wanted this method for ages, and now I finally added.

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@153169 268f45cc-cd09-0410-ab3c-d52691b4dbfc
parent 7b2d139b
2013-05-20 Filip Pizlo <fpizlo@apple.com>
fourthTier: DFG should be able to run on a separate thread
https://bugs.webkit.org/show_bug.cgi?id=112839
Reviewed by Geoffrey Garen.
This is the final bit of concurrent JITing. The idea is that there is a
single global worklist, and a single global thread, that does all
optimizing compilation. This is the DFG::Worklist. It contains a queue of
DFG::Plans, and a map from CodeBlock* (the baseline code block we're
trying to optimize) to DFG::Plan. If the DFGDriver tries to concurrently
compile something, it puts the Plan on the Worklist. The Worklist's
thread will compile that Plan eventually, and when it's done, it will
signal its completion by (1) notifying anyone waiting for the Worklist to
be done, and (2) forcing the CodeBlock::m_jitExecuteCounter to take slow
path. The next Baseline JIT cti_optimize call will then install all ready
(i.e. compiled) Plans for that VM. Note that (1) is only for the GC and
VM shutdown, which will want to ensure that there aren't any outstanding
async compilations before proceeding. They do so by simply waiting for
all of the plans for the current VM to complete. (2) is the actual way
that code typically gets installed.
This is all very racy by design. For example, just as we try to force the
execute counter to take slow path, the main thread may be setting the
execute counter to some other value. The main thread must set it to
another value because (a) JIT code is constantly incrementing the counter
in a racy way, (b) the cti_optimize slow path will set it to some
large-ish negative value to ensure that cti_optimize isn't called
repeatedly, and (c) OSR exits from previously jettisoned code blocks may
still want to reset the counter values. This "race" is made benign, by
ensuring that while there is an asynchronous compilation, we at worse set
the counter to optimizeAfterWarmUp and never to deferIndefinitely. Hence
if the race happens then the worst case is that we wait another ~1000
counts before installing the optimized code. Another defense is that if
any CodeBlock calls into cti_optimize, then it will check for all ready
plans for the VM - so even if a code block has to wait another ~1000
executions before it calls cti_optimize to do the installation, it may
actually end up being installed sooner because a different code block had
called cti_optimize, potentially for an unrelated reason.
Special care is taken to ensure that installing plans informs the GC
about the increased memory usage, but also ensures that we don't recurse
infinitely - since at start of GC we try to install outstanding plans.
This is done by introducing a new GC deferral mechanism (the DeferGC
block-scoped thingy), which will ensure that GCs don't happen in the
scope but are allowed to happen after. This still leaves the strange
corner case that cti_optimize may install outstanding plans, then GC, and
that GC may jettison the code block that was installed. This, and the
fact that the plan that we took slow path to install could have been a
failed or invalid compile, mean that we have to take special precautions
in cti_optimize.
This patch also fixes a number of small concurrency bugs that I found
when things started running. There are probably more of those bugs still
left to fix. This patch just fixes the ones I know about.
Concurrent compilation is right now only enabled on X86_64 Mac. We need
platforms that are sufficiently CAStastic so that we can do the various
memory fence and CAS tricks that make this safe. We also need a platform
that uses JSVALUE64. And we need pthread_once. So, that pretty much means
just X64_64 for now. Enabling Linux-64_64 should be a breeze, but I'll
leave that up to the Qt and GTK+ ports to do at their discretion.
This is a solid speed-up on SunSpider (8-9%) and V8Spider (16%), our two
main compile-time benchmarks. Most peculiarly, this also appears to
reduce measurement noise, rather than increasing it as you would have
expected. I don't understand that result but I like it anyway. On the
other hand, this is a slight (1%) slow-down on V8v7. I will continue to
investigate this but I think that the results are already good enough
that we should land this as-is. So far, it appears that the slow-down is
due to this breaking the don't-compile-inlineables heuristics. See
investigation in https://bugs.webkit.org/show_bug.cgi?id=116556 and the
bug https://bugs.webkit.org/show_bug.cgi?id=116557.
* JavaScriptCore.xcodeproj/project.pbxproj:
* bytecode/CodeBlock.cpp:
(JSC):
(JSC::CodeBlock::finalizeUnconditionally):
(JSC::CodeBlock::resetStubInternal):
(JSC::CodeBlock::baselineVersion):
(JSC::CodeBlock::hasOptimizedReplacement):
(JSC::CodeBlock::optimizationThresholdScalingFactor):
(JSC::CodeBlock::checkIfOptimizationThresholdReached):
(JSC::CodeBlock::optimizeNextInvocation):
(JSC::CodeBlock::dontOptimizeAnytimeSoon):
(JSC::CodeBlock::optimizeAfterWarmUp):
(JSC::CodeBlock::optimizeAfterLongWarmUp):
(JSC::CodeBlock::optimizeSoon):
(JSC::CodeBlock::forceOptimizationSlowPathConcurrently):
(JSC::CodeBlock::setOptimizationThresholdBasedOnCompilationResult):
(JSC::CodeBlock::updateAllPredictionsAndCountLiveness):
(JSC::CodeBlock::updateAllArrayPredictions):
(JSC::CodeBlock::shouldOptimizeNow):
* bytecode/CodeBlock.h:
(CodeBlock):
(JSC::CodeBlock::jitCompile):
* bytecode/CodeBlockLock.h:
(JSC):
* bytecode/ExecutionCounter.cpp:
(JSC::ExecutionCounter::forceSlowPathConcurrently):
(JSC):
(JSC::ExecutionCounter::setThreshold):
* bytecode/ExecutionCounter.h:
(ExecutionCounter):
* debugger/Debugger.cpp:
(JSC::Debugger::recompileAllJSFunctions):
* dfg/DFGByteCodeParser.cpp:
(JSC::DFG::ByteCodeParser::injectLazyOperandSpeculation):
(JSC::DFG::ByteCodeParser::getArrayMode):
(JSC::DFG::ByteCodeParser::getArrayModeAndEmitChecks):
* dfg/DFGCommon.h:
(JSC::DFG::enableConcurrentJIT):
(DFG):
* dfg/DFGDriver.cpp:
(JSC::DFG::compile):
* dfg/DFGGraph.cpp:
(JSC::DFG::Graph::Graph):
* dfg/DFGGraph.h:
(Graph):
* dfg/DFGOSREntry.cpp:
(JSC::DFG::prepareOSREntry):
* dfg/DFGOperations.cpp:
* dfg/DFGPlan.cpp:
(JSC::DFG::Plan::Plan):
(JSC::DFG::Plan::compileInThread):
(JSC::DFG::Plan::key):
(DFG):
* dfg/DFGPlan.h:
(DFG):
(Plan):
* dfg/DFGWorklist.cpp: Added.
(DFG):
(JSC::DFG::Worklist::Worklist):
(JSC::DFG::Worklist::~Worklist):
(JSC::DFG::Worklist::finishCreation):
(JSC::DFG::Worklist::create):
(JSC::DFG::Worklist::enqueue):
(JSC::DFG::Worklist::compilationState):
(JSC::DFG::Worklist::waitUntilAllPlansForVMAreReady):
(JSC::DFG::Worklist::removeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllReadyPlansForVM):
(JSC::DFG::Worklist::completeAllPlansForVM):
(JSC::DFG::Worklist::queueLength):
(JSC::DFG::Worklist::dump):
(JSC::DFG::Worklist::runThread):
(JSC::DFG::Worklist::threadFunction):
(JSC::DFG::initializeGlobalWorklistOnce):
(JSC::DFG::globalWorklist):
* dfg/DFGWorklist.h: Added.
(DFG):
(Worklist):
* heap/CopiedSpaceInlines.h:
(JSC::CopiedSpace::allocateBlock):
* heap/DeferGC.h: Added.
(JSC):
(DeferGC):
(JSC::DeferGC::DeferGC):
(JSC::DeferGC::~DeferGC):
* heap/Heap.cpp:
(JSC::Heap::Heap):
(JSC::Heap::reportExtraMemoryCostSlowCase):
(JSC::Heap::collectAllGarbage):
(JSC::Heap::collect):
(JSC::Heap::collectIfNecessaryOrDefer):
(JSC):
(JSC::Heap::incrementDeferralDepth):
(JSC::Heap::decrementDeferralDepthAndGCIfNeeded):
* heap/Heap.h:
(Heap):
(JSC::Heap::isCollecting):
(JSC):
* heap/MarkedAllocator.cpp:
(JSC::MarkedAllocator::allocateSlowCase):
* jit/JIT.cpp:
(JSC::JIT::privateCompile):
* jit/JIT.h:
* jit/JITStubs.cpp:
(JSC::DEFINE_STUB_FUNCTION):
* llint/LLIntSlowPaths.cpp:
(JSC::LLInt::jitCompileAndSetHeuristics):
(JSC::LLInt::entryOSR):
(JSC::LLInt::LLINT_SLOW_PATH_DECL):
* profiler/ProfilerBytecodes.h:
* runtime/ConcurrentJITLock.h: Added.
(JSC):
* runtime/ExecutionHarness.h:
(JSC::replaceWithDeferredOptimizedCode):
* runtime/JSSegmentedVariableObject.cpp:
(JSC::JSSegmentedVariableObject::findRegisterIndex):
(JSC::JSSegmentedVariableObject::addRegisters):
* runtime/JSSegmentedVariableObject.h:
(JSSegmentedVariableObject):
* runtime/Options.h:
(JSC):
* runtime/Structure.h:
(Structure):
* runtime/StructureInlines.h:
(JSC::Structure::propertyTable):
* runtime/SymbolTable.h:
(SymbolTable):
* runtime/VM.cpp:
(JSC::VM::VM):
(JSC::VM::~VM):
(JSC::VM::prepareToDiscardCode):
(JSC):
(JSC::VM::discardAllCode):
(JSC::VM::releaseExecutableMemory):
* runtime/VM.h:
(DFG):
(VM):
2013-05-17 Mark Hahnenberg <mhahnenberg@apple.com>
CheckArrays should be hoisted
......
......@@ -75,6 +75,7 @@
0F0CD4C415F6B6BB0032F1C0 /* SparseArrayValueMap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F0CD4C315F6B6B50032F1C0 /* SparseArrayValueMap.cpp */; };
0F0D85B21723455400338210 /* CodeBlockLock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F0D85B11723455100338210 /* CodeBlockLock.h */; settings = {ATTRIBUTES = (Private, ); }; };
0F0FC45A14BD15F500B81154 /* LLIntCallLinkInfo.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F0FC45814BD15F100B81154 /* LLIntCallLinkInfo.h */; settings = {ATTRIBUTES = (Private, ); }; };
0F136D4D174AD69E0075B354 /* DeferGC.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F136D4B174AD69B0075B354 /* DeferGC.h */; settings = {ATTRIBUTES = (Private, ); }; };
0F13912916771C33009CCB07 /* ProfilerBytecodeSequence.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F13912416771C30009CCB07 /* ProfilerBytecodeSequence.cpp */; };
0F13912A16771C36009CCB07 /* ProfilerBytecodeSequence.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F13912516771C30009CCB07 /* ProfilerBytecodeSequence.h */; settings = {ATTRIBUTES = (Private, ); }; };
0F13912B16771C3A009CCB07 /* ProfilerProfiledBytecodes.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F13912616771C30009CCB07 /* ProfilerProfiledBytecodes.cpp */; };
......@@ -314,6 +315,9 @@
0FD82E86141F3FF100179C94 /* SpeculatedType.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FD82E84141F3FDA00179C94 /* SpeculatedType.cpp */; };
0FDB2CC9173DA520007B3C1B /* FTLAbbreviatedTypes.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDB2CC7173DA51E007B3C1B /* FTLAbbreviatedTypes.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FDB2CCA173DA523007B3C1B /* FTLValueFromBlock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDB2CC8173DA51E007B3C1B /* FTLValueFromBlock.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FDB2CE7174830A2007B3C1B /* DFGWorklist.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FDB2CE5174830A2007B3C1B /* DFGWorklist.cpp */; };
0FDB2CE8174830A2007B3C1B /* DFGWorklist.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDB2CE6174830A2007B3C1B /* DFGWorklist.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FDB2CEA174896C7007B3C1B /* ConcurrentJITLock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDB2CE9174896C7007B3C1B /* ConcurrentJITLock.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FDDBFB51666EED800C55FEF /* DFGVariableAccessDataDump.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0FDDBFB21666EED500C55FEF /* DFGVariableAccessDataDump.cpp */; };
0FDDBFB61666EEDA00C55FEF /* DFGVariableAccessDataDump.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FDDBFB31666EED500C55FEF /* DFGVariableAccessDataDump.h */; settings = {ATTRIBUTES = (Private, ); }; };
0FE228ED1436AB2700196C48 /* Options.h in Headers */ = {isa = PBXBuildFile; fileRef = 0FE228EB1436AB2300196C48 /* Options.h */; settings = {ATTRIBUTES = (Private, ); }; };
......@@ -1081,6 +1085,7 @@
0F0CD4C315F6B6B50032F1C0 /* SparseArrayValueMap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SparseArrayValueMap.cpp; sourceTree = "<group>"; };
0F0D85B11723455100338210 /* CodeBlockLock.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; path = CodeBlockLock.h; sourceTree = "<group>"; };
0F0FC45814BD15F100B81154 /* LLIntCallLinkInfo.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = LLIntCallLinkInfo.h; sourceTree = "<group>"; };
0F136D4B174AD69B0075B354 /* DeferGC.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = DeferGC.h; sourceTree = "<group>"; };
0F13912416771C30009CCB07 /* ProfilerBytecodeSequence.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = ProfilerBytecodeSequence.cpp; path = profiler/ProfilerBytecodeSequence.cpp; sourceTree = "<group>"; };
0F13912516771C30009CCB07 /* ProfilerBytecodeSequence.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ProfilerBytecodeSequence.h; path = profiler/ProfilerBytecodeSequence.h; sourceTree = "<group>"; };
0F13912616771C30009CCB07 /* ProfilerProfiledBytecodes.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = ProfilerProfiledBytecodes.cpp; path = profiler/ProfilerProfiledBytecodes.cpp; sourceTree = "<group>"; };
......@@ -1334,6 +1339,9 @@
0FD82E84141F3FDA00179C94 /* SpeculatedType.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = SpeculatedType.cpp; sourceTree = "<group>"; };
0FDB2CC7173DA51E007B3C1B /* FTLAbbreviatedTypes.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = FTLAbbreviatedTypes.h; path = ftl/FTLAbbreviatedTypes.h; sourceTree = "<group>"; };
0FDB2CC8173DA51E007B3C1B /* FTLValueFromBlock.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = FTLValueFromBlock.h; path = ftl/FTLValueFromBlock.h; sourceTree = "<group>"; };
0FDB2CE5174830A2007B3C1B /* DFGWorklist.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGWorklist.cpp; path = dfg/DFGWorklist.cpp; sourceTree = "<group>"; };
0FDB2CE6174830A2007B3C1B /* DFGWorklist.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGWorklist.h; path = dfg/DFGWorklist.h; sourceTree = "<group>"; };
0FDB2CE9174896C7007B3C1B /* ConcurrentJITLock.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = ConcurrentJITLock.h; sourceTree = "<group>"; };
0FDDBFB21666EED500C55FEF /* DFGVariableAccessDataDump.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = DFGVariableAccessDataDump.cpp; path = dfg/DFGVariableAccessDataDump.cpp; sourceTree = "<group>"; };
0FDDBFB31666EED500C55FEF /* DFGVariableAccessDataDump.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGVariableAccessDataDump.h; path = dfg/DFGVariableAccessDataDump.h; sourceTree = "<group>"; };
0FE228EA1436AB2300196C48 /* Options.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Options.cpp; sourceTree = "<group>"; };
......@@ -2374,6 +2382,7 @@
C2239D1316262BDD005AC5FD /* CopyVisitor.h */,
C2239D1416262BDD005AC5FD /* CopyVisitorInlines.h */,
C218D13F1655CFD50062BB81 /* CopyWorkList.h */,
0F136D4B174AD69B0075B354 /* DeferGC.h */,
0F2C556D14738F2E00121E4F /* DFGCodeBlocks.cpp */,
0F2C556E14738F2E00121E4F /* DFGCodeBlocks.h */,
BCBE2CAD14E985AA000593AD /* GCAssertions.h */,
......@@ -2681,6 +2690,7 @@
0F15F15D14B7A73A005DE37D /* CommonSlowPaths.h */,
969A09220ED1E09C00F1F681 /* Completion.cpp */,
F5BB2BC5030F772101FCFE1D /* Completion.h */,
0FDB2CE9174896C7007B3C1B /* ConcurrentJITLock.h */,
BCA62DFF0E2826310004F30D /* ConstructData.cpp */,
BC8F3CCF0DAF17BA00577A80 /* ConstructData.h */,
BCD203450E17135E002C7E82 /* DateConstructor.cpp */,
......@@ -3079,6 +3089,8 @@
0F85A31E16AB76AE0077571E /* DFGVariadicFunction.h */,
0FFFC95314EF909500C72532 /* DFGVirtualRegisterAllocationPhase.cpp */,
0FFFC95414EF909500C72532 /* DFGVirtualRegisterAllocationPhase.h */,
0FDB2CE5174830A2007B3C1B /* DFGWorklist.cpp */,
0FDB2CE6174830A2007B3C1B /* DFGWorklist.h */,
);
name = dfg;
sourceTree = "<group>";
......@@ -3533,6 +3545,7 @@
0F5EF91F16878F7D003E5C25 /* JITThunks.h in Headers */,
A76F54A313B28AAB00EF2BCE /* JITWriteBarrier.h in Headers */,
BC18C4160E16F5CD00B34460 /* JSActivation.h in Headers */,
0FDB2CEA174896C7007B3C1B /* ConcurrentJITLock.h in Headers */,
840480131021A1D9008E7F01 /* JSAPIValueWrapper.h in Headers */,
C2CF39C216E15A8100DD69BE /* JSAPIWrapperObject.h in Headers */,
BC18C4170E16F5CD00B34460 /* JSArray.h in Headers */,
......@@ -3655,6 +3668,7 @@
BC18C43F0E16F5CD00B34460 /* Nodes.h in Headers */,
BC18C4410E16F5CD00B34460 /* NumberConstructor.h in Headers */,
BC18C4420E16F5CD00B34460 /* NumberConstructor.lut.h in Headers */,
0FDB2CE8174830A2007B3C1B /* DFGWorklist.h in Headers */,
BC18C4430E16F5CD00B34460 /* NumberObject.h in Headers */,
BC18C4440E16F5CD00B34460 /* NumberPrototype.h in Headers */,
142D3939103E4560007DCB52 /* NumericStrings.h in Headers */,
......@@ -3804,6 +3818,7 @@
86704B8812DBA33700A9FE7B /* YarrParser.h in Headers */,
86704B8A12DBA33700A9FE7B /* YarrPattern.h in Headers */,
86704B4312DB8A8100A9FE7B /* YarrSyntaxChecker.h in Headers */,
0F136D4D174AD69E0075B354 /* DeferGC.h in Headers */,
);
runOnlyForDeploymentPostprocessing = 0;
};
......@@ -4433,6 +4448,7 @@
14469DE8107EC7E700650446 /* PropertySlot.cpp in Sources */,
ADE39FFF16DD144B0003CD4A /* PropertyTable.cpp in Sources */,
1474C33C16AA2D9B0062F01D /* PrototypeMap.cpp in Sources */,
0FDB2CE7174830A2007B3C1B /* DFGWorklist.cpp in Sources */,
0F9332A314CA7DD70085F3C6 /* PutByIdStatus.cpp in Sources */,
0FF60AC316740F8800029779 /* ReduceWhitespace.cpp in Sources */,
14280841107EC0930013E7B2 /* RegExp.cpp in Sources */,
......
......@@ -36,6 +36,7 @@
#include "DFGCommon.h"
#include "DFGNode.h"
#include "DFGRepatch.h"
#include "DFGWorklist.h"
#include "Debugger.h"
#include "FTLJITCode.h"
#include "Interpreter.h"
......@@ -2187,12 +2188,6 @@ void CodeBlock::visitWeakReferences(SlotVisitor& visitor)
performTracingFixpointIteration(visitor);
}
#if ENABLE(JIT_VERBOSE_OSR)
static const bool verboseUnlinking = true;
#else
static const bool verboseUnlinking = false;
#endif
void CodeBlock::finalizeUnconditionally()
{
#if ENABLE(LLINT)
......@@ -2208,7 +2203,7 @@ void CodeBlock::finalizeUnconditionally()
case op_put_by_id_out_of_line:
if (!curInstruction[4].u.structure || Heap::isMarked(curInstruction[4].u.structure.get()))
break;
if (verboseUnlinking)
if (Options::verboseOSR())
dataLogF("Clearing LLInt property access with structure %p.\n", curInstruction[4].u.structure.get());
curInstruction[4].u.structure.clear();
curInstruction[5].u.operand = 0;
......@@ -2221,7 +2216,7 @@ void CodeBlock::finalizeUnconditionally()
&& Heap::isMarked(curInstruction[6].u.structure.get())
&& Heap::isMarked(curInstruction[7].u.structureChain.get()))
break;
if (verboseUnlinking) {
if (Options::verboseOSR()) {
dataLogF("Clearing LLInt put transition with structures %p -> %p, chain %p.\n",
curInstruction[4].u.structure.get(),
curInstruction[6].u.structure.get(),
......@@ -2241,7 +2236,7 @@ void CodeBlock::finalizeUnconditionally()
for (unsigned i = 0; i < m_llintCallLinkInfos.size(); ++i) {
if (m_llintCallLinkInfos[i].isLinked() && !Heap::isMarked(m_llintCallLinkInfos[i].callee.get())) {
if (verboseUnlinking)
if (Options::verboseOSR())
dataLog("Clearing LLInt call from ", *this, "\n");
m_llintCallLinkInfos[i].unlink();
}
......@@ -2254,7 +2249,7 @@ void CodeBlock::finalizeUnconditionally()
#if ENABLE(DFG_JIT)
// Check if we're not live. If we are, then jettison.
if (!(shouldImmediatelyAssumeLivenessDuringScan() || m_jitCode->dfgCommon()->livenessHasBeenProved)) {
if (verboseUnlinking)
if (Options::verboseOSR())
dataLog(*this, " has dead weak references, jettisoning during GC.\n");
if (DFG::shouldShowDisassembly()) {
......@@ -2284,7 +2279,7 @@ void CodeBlock::finalizeUnconditionally()
for (size_t size = m_putToBaseOperations.size(), i = 0; i < size; ++i) {
if (m_putToBaseOperations[i].m_structure && !Heap::isMarked(m_putToBaseOperations[i].m_structure.get())) {
if (verboseUnlinking)
if (Options::verboseOSR())
dataLog("Clearing putToBase info in ", *this, "\n");
m_putToBaseOperations[i].m_structure.clear();
}
......@@ -2298,7 +2293,7 @@ void CodeBlock::finalizeUnconditionally()
#endif
m_resolveOperations[i].last().m_structure.clear();
if (m_resolveOperations[i].last().m_structure && !Heap::isMarked(m_resolveOperations[i].last().m_structure.get())) {
if (verboseUnlinking)
if (Options::verboseOSR())
dataLog("Clearing resolve info in ", *this, "\n");
m_resolveOperations[i].last().m_structure.clear();
}
......@@ -2313,7 +2308,7 @@ void CodeBlock::finalizeUnconditionally()
if (ClosureCallStubRoutine* stub = callLinkInfo(i).stub.get()) {
if (!Heap::isMarked(stub->structure())
|| !Heap::isMarked(stub->executable())) {
if (verboseUnlinking) {
if (Options::verboseOSR()) {
dataLog(
"Clearing closure call from ", *this, " to ",
stub->executable()->hashFor(callLinkInfo(i).specializationKind()),
......@@ -2322,7 +2317,7 @@ void CodeBlock::finalizeUnconditionally()
callLinkInfo(i).unlink(*m_vm, repatchBuffer);
}
} else if (!Heap::isMarked(callLinkInfo(i).callee.get())) {
if (verboseUnlinking) {
if (Options::verboseOSR()) {
dataLog(
"Clearing call from ", *this, " to ",
RawPointer(callLinkInfo(i).callee.get()), " (",
......@@ -2363,7 +2358,7 @@ void CodeBlock::resetStubInternal(RepatchBuffer& repatchBuffer, StructureStubInf
{
AccessType accessType = static_cast<AccessType>(stubInfo.accessType);
if (verboseUnlinking)
if (Options::verboseOSR())
dataLog("Clearing structure cache (kind ", static_cast<int>(stubInfo.accessType), ") in ", *this, ".\n");
switch (getJITType()) {
......@@ -2434,6 +2429,42 @@ void CodeBlock::stronglyVisitWeakReferences(SlotVisitor& visitor)
#endif
}
CodeBlock* CodeBlock::baselineVersion()
{
#if ENABLE(JIT)
// When we're initializing the original baseline code block, we won't be able
// to get its replacement. But we'll know that it's the original baseline code
// block because it won't have JIT code yet and it won't have an alternative.
if (getJITType() == JITCode::None && !alternative())
return this;
CodeBlock* result = replacement();
ASSERT(result);
while (result->alternative())
result = result->alternative();
ASSERT(result);
ASSERT(JITCode::isBaselineCode(result->getJITType()));
return result;
#else
return this;
#endif
}
#if ENABLE(JIT)
bool CodeBlock::hasOptimizedReplacement()
{
ASSERT(JITCode::isBaselineCode(getJITType()));
bool result = JITCode::isHigherTier(replacement()->getJITType(), getJITType());
if (result)
ASSERT(JITCode::isOptimizingJIT(replacement()->getJITType()));
else {
ASSERT(JITCode::isBaselineCode(replacement()->getJITType()));
ASSERT(replacement() == this);
}
return result;
}
#endif
HandlerInfo* CodeBlock::handlerForBytecodeOffset(unsigned bytecodeOffset)
{
RELEASE_ASSERT(bytecodeOffset < instructions().size());
......@@ -3015,9 +3046,12 @@ double CodeBlock::optimizationThresholdScalingFactor()
ASSERT(instructionCount); // Make sure this is called only after we have an instruction stream; otherwise it'll just return the value of d, which makes no sense.
double result = d + a * sqrt(instructionCount + b) + c * instructionCount;
#if ENABLE(JIT_VERBOSE_OSR)
dataLog(*this, ": instruction count is ", instructionCount, ", scaling execution counter by ", result, " * ", codeTypeThresholdMultiplier(), "\n");
#endif
if (Options::verboseOSR()) {
dataLog(
*this, ": instruction count is ", instructionCount,
", scaling execution counter by ", result, " * ", codeTypeThresholdMultiplier(),
"\n");
}
return result * codeTypeThresholdMultiplier();
}
......@@ -3058,34 +3092,90 @@ int32_t CodeBlock::counterValueForOptimizeSoon()
bool CodeBlock::checkIfOptimizationThresholdReached()
{
#if ENABLE(DFG_JIT)
if (m_vm->worklist
&& m_vm->worklist->compilationState(this) == DFG::Worklist::Compiled) {
optimizeNextInvocation();
return true;
}
#endif
return m_jitExecuteCounter.checkIfThresholdCrossedAndSet(this);
}
void CodeBlock::optimizeNextInvocation()
{
if (Options::verboseOSR())
dataLog(*this, ": Optimizing next invocation.\n");
m_jitExecuteCounter.setNewThreshold(0, this);
}
void CodeBlock::dontOptimizeAnytimeSoon()
{
if (Options::verboseOSR())
dataLog(*this, ": Not optimizing anytime soon.\n");
m_jitExecuteCounter.deferIndefinitely();
}
void CodeBlock::optimizeAfterWarmUp()
{
if (Options::verboseOSR())
dataLog(*this, ": Optimizing after warm-up.\n");
m_jitExecuteCounter.setNewThreshold(counterValueForOptimizeAfterWarmUp(), this);
}
void CodeBlock::optimizeAfterLongWarmUp()
{
if (Options::verboseOSR())
dataLog(*this, ": Optimizing after long warm-up.\n");
m_jitExecuteCounter.setNewThreshold(counterValueForOptimizeAfterLongWarmUp(), this);
}
void CodeBlock::optimizeSoon()
{
if (Options::verboseOSR())
dataLog(*this, ": Optimizing soon.\n");
m_jitExecuteCounter.setNewThreshold(counterValueForOptimizeSoon(), this);
}
void CodeBlock::forceOptimizationSlowPathConcurrently()
{
if (Options::verboseOSR())
dataLog(*this, ": Forcing slow path concurrently.\n");
m_jitExecuteCounter.forceSlowPathConcurrently();
}
void CodeBlock::setOptimizationThresholdBasedOnCompilationResult(CompilationResult result)
{
RELEASE_ASSERT(getJITType() == JITCode::BaselineJIT);
RELEASE_ASSERT((result == CompilationSuccessful) == (replacement() != this));
switch (result) {
case CompilationSuccessful:
RELEASE_ASSERT(JITCode::isOptimizingJIT(replacement()->getJITType()));
optimizeNextInvocation();
break;
case CompilationFailed:
dontOptimizeAnytimeSoon();
break;
case CompilationDeferred:
// We'd like to do dontOptimizeAnytimeSoon() but we cannot because
// forceOptimizationSlowPathConcurrently() is inherently racy. It won't
// necessarily guarantee anything. So, we make sure that even if that
// function ends up being a no-op, we still eventually retry and realize
// that we have optimized code ready.
optimizeAfterWarmUp();
break;
case CompilationInvalidated:
// Retry with exponential backoff.
countReoptimization();
optimizeAfterWarmUp();
break;
default:
RELEASE_ASSERT_NOT_REACHED();
break;
}
}
#if ENABLE(JIT)
uint32_t CodeBlock::adjustedExitCountThreshold(uint32_t desiredThreshold)
{
......@@ -3144,7 +3234,7 @@ ArrayProfile* CodeBlock::getOrAddArrayProfile(unsigned bytecodeOffset)
void CodeBlock::updateAllPredictionsAndCountLiveness(
OperationInProgress operation, unsigned& numberOfLiveNonArgumentValueProfiles, unsigned& numberOfSamplesInProfiles)
{
CodeBlockLock locker(m_lock);
CodeBlockLocker locker(m_lock);
numberOfLiveNonArgumentValueProfiles = 0;
numberOfSamplesInProfiles = 0; // If this divided by ValueProfile::numberOfBuckets equals numberOfValueProfiles() then value profiles are full.
......@@ -3176,7 +3266,7 @@ void CodeBlock::updateAllValueProfilePredictions(OperationInProgress operation)
void CodeBlock::updateAllArrayPredictions(OperationInProgress operation)
{
CodeBlockLock locker(m_lock);
CodeBlockLocker locker(m_lock);
for (unsigned i = m_arrayProfiles.size(); i--;)
m_arrayProfiles[i].computeUpdatedPrediction(locker, this, operation);
......@@ -3194,9 +3284,8 @@ void CodeBlock::updateAllPredictions(OperationInProgress operation)
bool CodeBlock::shouldOptimizeNow()
{
#if ENABLE(JIT_VERBOSE_OSR)
dataLog("Considering optimizing ", *this, "...\n");
#endif
if (Options::verboseOSR())
dataLog("Considering optimizing ", *this, "...\n");
#if ENABLE(VERBOSE_VALUE_PROFILE)
dumpValueProfiles();
......@@ -3211,9 +3300,14 @@ bool CodeBlock::shouldOptimizeNow()
unsigned numberOfSamplesInProfiles;
updateAllPredictionsAndCountLiveness(NoOperation, numberOfLiveNonArgumentValueProfiles, numberOfSamplesInProfiles);
#if ENABLE(JIT_VERBOSE_OSR)
dataLogF("Profile hotness: %lf (%u / %u), %lf (%u / %u)\n", (double)numberOfLiveNonArgumentValueProfiles / numberOfValueProfiles(), numberOfLiveNonArgumentValueProfiles, numberOfValueProfiles(), (double)numberOfSamplesInProfiles / ValueProfile::numberOfBuckets / numberOfValueProfiles(), numberOfSamplesInProfiles, ValueProfile::numberOfBuckets * numberOfValueProfiles());
#endif
if (Options::verboseOSR()) {
dataLogF(
"Profile hotness: %lf (%u / %u), %lf (%u / %u)\n",
(double)numberOfLiveNonArgumentValueProfiles / numberOfValueProfiles(),
numberOfLiveNonArgumentValueProfiles, numberOfValueProfiles(),
(double)numberOfSamplesInProfiles / ValueProfile::numberOfBuckets / numberOfValueProfiles(),
numberOfSamplesInProfiles, ValueProfile::numberOfBuckets * numberOfValueProfiles());
}