Commit 28e8eb90 authored by darin@apple.com's avatar darin@apple.com

Reviewed by Maciej.

        - http://bugs.webkit.org/show_bug.cgi?id=16438
        - removed some more unused code
        - changed quite a few more names to WebKit-style
        - moved more things out of pcre_internal.h
        - changed some indentation to WebKit-style
        - improved design of the functions for reading and writing
          2-byte values from the opcode stream (in pcre_internal.h)

        * pcre/dftables.cpp:
        (main): Added the kjs prefix a normal way in lieu of using macros.

        * pcre/pcre_compile.cpp: Moved some definitions here from pcre_internal.h.
        (errorText): Name changes, fewer typedefs.
        (checkEscape): Ditto. Changed uppercase conversion to use toASCIIUpper.
        (isCountedRepeat): Name change.
        (readRepeatCounts): Name change.
        (firstSignificantOpcode): Got rid of the use of OP_lengths, which is
        very lightly used here. Hard-coded the length of OP_BRANUMBER.
        (firstSignificantOpcodeSkippingAssertions): Ditto. Also changed to
        use the advanceToEndOfBracket function.
        (getOthercaseRange): Name changes.
        (encodeUTF8): Ditto.
        (compileBranch): Name changes. Removed unused after_manual_callout and
        the code to handle it. Removed code to handle OP_ONCE since we never
        emit this opcode. Changed to use advanceToEndOfBracket in more places.
        (compileBracket): Name changes.
        (branchIsAnchored): Removed code to handle OP_ONCE since we never emit
        this opcode.
        (bracketIsAnchored): Name changes.
        (branchNeedsLineStart): More fo the same.
        (bracketNeedsLineStart): Ditto.
        (branchFindFirstAssertedCharacter): Removed OP_ONCE code.
        (bracketFindFirstAssertedCharacter): More of the same.
        (calculateCompiledPatternLengthAndFlags): Ditto.
        (returnError): Name changes.
        (jsRegExpCompile): Ditto.

        * pcre/pcre_exec.cpp: Moved some definitions here from pcre_internal.h.
        (matchRef): Updated names.
        Improved macros to use the do { } while(0) idiom so they expand to single
        statements rather than to blocks or multiple statements. And refeactored
        the recursive match macros.
        (MatchStack::pushNewFrame): Name changes.
        (getUTF8CharAndIncrementLength): Name changes.
        (match): Name changes. Removed the ONCE opcode.
        (jsRegExpExecute): Name changes.

        * pcre/pcre_internal.h: Removed quite a few unneeded includes. Rewrote
        quite a few comments. Removed the macros that add kjs prefixes to the
        functions with external linkage; instead renamed the functions. Removed
        the unneeded typedefs pcre_uint16, pcre_uint32, and uschar. Removed the
        dead and not-all-working code for LINK_SIZE values other than 2, although
        we aim to keep the abstraction working. Removed the OP_LENGTHS macro.
        (put2ByteValue): Replaces put2ByteOpcodeValueAtOffset.
        (get2ByteValue): Replaces get2ByteOpcodeValueAtOffset.
        (put2ByteValueAndAdvance): Replaces put2ByteOpcodeValueAtOffsetAndAdvance.
        (putLinkValueAllowZero): Replaces putOpcodeValueAtOffset; doesn't do the
        addition, since a comma is really no better than a plus sign. Added an
        assertion to catch out of range values and changed the parameter type to
        int rather than unsigned.
        (getLinkValueAllowZero): Replaces getOpcodeValueAtOffset.
        (putLinkValue): New function that most former callers of the
        putOpcodeValueAtOffset function can use; asserts the value that is
        being stored is non-zero and then calls putLinkValueAllowZero.
        (getLinkValue): Ditto.
        (putLinkValueAndAdvance): Replaces putOpcodeValueAtOffsetAndAdvance. No
        caller was using an offset, which makes sense given the advancing behavior.
        (putLinkValueAllowZeroAndAdvance): Ditto.
        (isBracketOpcode): Added. For use in an assertion.
        (advanceToEndOfBracket): Renamed from moveOpcodePtrPastAnyAlternateBranches,
        and removed comments about how it's not well designed. This function takes
        a pointer to the beginning of a bracket and advances to the end of the
        bracket.

        * pcre/pcre_tables.cpp: Updated names.
        * pcre/pcre_ucp_searchfuncs.cpp:
        (kjs_pcre_ucp_othercase): Ditto.
        * pcre/pcre_xclass.cpp:
        (getUTF8CharAndAdvancePointer): Ditto.
        (kjs_pcre_xclass): Ditto.
        * pcre/ucpinternal.h: Ditto.

        * wtf/ASCIICType.h:
        (WTF::isASCIIAlpha): Added an int overload, like the one we already have for
        isASCIIDigit.
        (WTF::isASCIIAlphanumeric): Ditto.
        (WTF::isASCIIHexDigit): Ditto.
        (WTF::isASCIILower): Ditto.
        (WTF::isASCIISpace): Ditto.
        (WTF::toASCIILower): Ditto.
        (WTF::toASCIIUpper): Ditto.



git-svn-id: http://svn.webkit.org/repository/webkit/trunk@28793 268f45cc-cd09-0410-ab3c-d52691b4dbfc
parent 4fd3685a
2007-12-16 Darin Adler <darin@apple.com>
Reviewed by Maciej.
- http://bugs.webkit.org/show_bug.cgi?id=16438
- removed some more unused code
- changed quite a few more names to WebKit-style
- moved more things out of pcre_internal.h
- changed some indentation to WebKit-style
- improved design of the functions for reading and writing
2-byte values from the opcode stream (in pcre_internal.h)
* pcre/dftables.cpp:
(main): Added the kjs prefix a normal way in lieu of using macros.
* pcre/pcre_compile.cpp: Moved some definitions here from pcre_internal.h.
(errorText): Name changes, fewer typedefs.
(checkEscape): Ditto. Changed uppercase conversion to use toASCIIUpper.
(isCountedRepeat): Name change.
(readRepeatCounts): Name change.
(firstSignificantOpcode): Got rid of the use of OP_lengths, which is
very lightly used here. Hard-coded the length of OP_BRANUMBER.
(firstSignificantOpcodeSkippingAssertions): Ditto. Also changed to
use the advanceToEndOfBracket function.
(getOthercaseRange): Name changes.
(encodeUTF8): Ditto.
(compileBranch): Name changes. Removed unused after_manual_callout and
the code to handle it. Removed code to handle OP_ONCE since we never
emit this opcode. Changed to use advanceToEndOfBracket in more places.
(compileBracket): Name changes.
(branchIsAnchored): Removed code to handle OP_ONCE since we never emit
this opcode.
(bracketIsAnchored): Name changes.
(branchNeedsLineStart): More fo the same.
(bracketNeedsLineStart): Ditto.
(branchFindFirstAssertedCharacter): Removed OP_ONCE code.
(bracketFindFirstAssertedCharacter): More of the same.
(calculateCompiledPatternLengthAndFlags): Ditto.
(returnError): Name changes.
(jsRegExpCompile): Ditto.
* pcre/pcre_exec.cpp: Moved some definitions here from pcre_internal.h.
(matchRef): Updated names.
Improved macros to use the do { } while(0) idiom so they expand to single
statements rather than to blocks or multiple statements. And refeactored
the recursive match macros.
(MatchStack::pushNewFrame): Name changes.
(getUTF8CharAndIncrementLength): Name changes.
(match): Name changes. Removed the ONCE opcode.
(jsRegExpExecute): Name changes.
* pcre/pcre_internal.h: Removed quite a few unneeded includes. Rewrote
quite a few comments. Removed the macros that add kjs prefixes to the
functions with external linkage; instead renamed the functions. Removed
the unneeded typedefs pcre_uint16, pcre_uint32, and uschar. Removed the
dead and not-all-working code for LINK_SIZE values other than 2, although
we aim to keep the abstraction working. Removed the OP_LENGTHS macro.
(put2ByteValue): Replaces put2ByteOpcodeValueAtOffset.
(get2ByteValue): Replaces get2ByteOpcodeValueAtOffset.
(put2ByteValueAndAdvance): Replaces put2ByteOpcodeValueAtOffsetAndAdvance.
(putLinkValueAllowZero): Replaces putOpcodeValueAtOffset; doesn't do the
addition, since a comma is really no better than a plus sign. Added an
assertion to catch out of range values and changed the parameter type to
int rather than unsigned.
(getLinkValueAllowZero): Replaces getOpcodeValueAtOffset.
(putLinkValue): New function that most former callers of the
putOpcodeValueAtOffset function can use; asserts the value that is
being stored is non-zero and then calls putLinkValueAllowZero.
(getLinkValue): Ditto.
(putLinkValueAndAdvance): Replaces putOpcodeValueAtOffsetAndAdvance. No
caller was using an offset, which makes sense given the advancing behavior.
(putLinkValueAllowZeroAndAdvance): Ditto.
(isBracketOpcode): Added. For use in an assertion.
(advanceToEndOfBracket): Renamed from moveOpcodePtrPastAnyAlternateBranches,
and removed comments about how it's not well designed. This function takes
a pointer to the beginning of a bracket and advances to the end of the
bracket.
* pcre/pcre_tables.cpp: Updated names.
* pcre/pcre_ucp_searchfuncs.cpp:
(kjs_pcre_ucp_othercase): Ditto.
* pcre/pcre_xclass.cpp:
(getUTF8CharAndAdvancePointer): Ditto.
(kjs_pcre_xclass): Ditto.
* pcre/ucpinternal.h: Ditto.
* wtf/ASCIICType.h:
(WTF::isASCIIAlpha): Added an int overload, like the one we already have for
isASCIIDigit.
(WTF::isASCIIAlphanumeric): Ditto.
(WTF::isASCIIHexDigit): Ditto.
(WTF::isASCIILower): Ditto.
(WTF::isASCIISpace): Ditto.
(WTF::toASCIILower): Ditto.
(WTF::toASCIIUpper): Ditto.
2007-12-16 Darin Adler <darin@apple.com>
Reviewed by Maciej.
......@@ -1170,7 +1266,9 @@
Reviewed by Maciej.
Centralize code for subjectPtr adjustments using inlines, only ever check for a single trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char classes and garbled UTF16 strings.
Centralize code for subjectPtr adjustments using inlines, only ever check for a single
trailing surrogate (as UTF16 only allows one), possibly fix PCRE bugs involving char
classes and garbled UTF16 strings.
* pcre/pcre_exec.cpp:
(match):
......
......@@ -78,7 +78,7 @@ fprintf(f,
"This file contains the default tables for characters with codes less than\n"
"128 (ASCII characters). These tables are used when no external tables are\n"
"passed to PCRE. */\n\n"
"const unsigned char _pcre_default_tables[%d] = {\n\n"
"const unsigned char kjs_pcre_default_tables[%d] = {\n\n"
"/* This table is a lower casing table. */\n\n", tables_length);
if (lcc_offset != 0)
......
This diff is collapsed.
This diff is collapsed.
......@@ -76,34 +76,14 @@ total length. */
#pragma warning(disable: 4244)
#endif
#include "pcre.h"
/* The value of LINK_SIZE determines the number of bytes used to store links as
offsets within the compiled regex. The default is 2, which allows for compiled
patterns up to 64K long. This covers the vast majority of cases. However, PCRE
can also be compiled to use 3 or 4 bytes instead. This allows for longer
patterns in extreme cases. On systems that support it, "configure" can be used
to override this default. */
patterns up to 64K long. */
#define LINK_SIZE 2
/* The below limit restricts the number of recursive match calls in order to
limit the maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
value of MATCH_LIMIT_RECURSION applies only to recursive calls of match().
This limit is tied to the size of MatchFrame. Right now we allow PCRE to allocate up
to MATCH_LIMIT_RECURSION - 16 * sizeof(MatchFrame) bytes of "stack" space before we give up.
Currently that's 100000 - 16 * (23 * 4) ~ 90MB
*/
#define MATCH_LIMIT_RECURSION 100000
#define _pcre_default_tables kjs_pcre_default_tables
#define _pcre_ord2utf8 kjs_pcre_ord2utf8
#define _pcre_utf8_table1 kjs_pcre_utf8_table1
#define _pcre_utf8_table2 kjs_pcre_utf8_table2
#define _pcre_utf8_table3 kjs_pcre_utf8_table3
#define _pcre_utf8_table4 kjs_pcre_utf8_table4
#define _pcre_xclass kjs_pcre_xclass
/* Define DEBUG to get debugging output on stdout. */
#if 0
......@@ -121,118 +101,72 @@ all, it had only been about 10 years then... */
#define DPRINTF(p) /*nothing*/
#endif
/* Standard C headers plus the external interface definition. The only time
setjmp and stdarg are used is when NO_RECURSE is set. */
#include <ctype.h>
#include <limits.h>
#include <setjmp.h>
#include <stdarg.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* Include the public PCRE header and the definitions of UCP character property
values. */
#include "pcre.h"
typedef unsigned short pcre_uint16;
typedef unsigned pcre_uint32;
typedef unsigned char uschar;
/* PCRE keeps offsets in its compiled code as 2-byte quantities (always stored
in big-endian order) by default. These are used, for example, to link from the
start of a subpattern to its alternatives and its end. The use of 2 bytes per
offset limits the size of the compiled regex to around 64K, which is big enough
for almost everybody. However, I received a request for an even bigger limit.
For this reason, and also to make the code easier to maintain, the storing and
loading of offsets from the byte string is now handled by the macros that are
defined here.
The macros are controlled by the value of LINK_SIZE. This defaults to 2 in
the config.h file, but can be overridden by using -D on the command line. This
is automated on Unix systems via the "configure" command. */
loading of offsets from the byte string is now handled by the functions that are
defined here. */
#if LINK_SIZE == 2
static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
{
opcodePtr[offset] = value >> 8;
opcodePtr[offset + 1] = value & 255;
}
/* PCRE uses some other 2-byte quantities that do not change when the size of
offsets changes. There are used for repeat counts and for other things such as
capturing parenthesis numbers in back references. */
static inline short getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
static inline void put2ByteValue(unsigned char* opcodePtr, int value)
{
return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
ASSERT(value >= 0 && value <= 0xFFFF);
opcodePtr[0] = value >> 8;
opcodePtr[1] = value;
}
#define MAX_PATTERN_SIZE (1 << 16)
#elif LINK_SIZE == 3
static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
static inline int get2ByteValue(const unsigned char* opcodePtr)
{
ASSERT(!(value & 0xFF000000)); // This function only allows values < 2^24
opcodePtr[offset] = value >> 16;
opcodePtr[offset + 1] = value >> 8;
opcodePtr[offset + 2] = value & 255;
return (opcodePtr[0] << 8) | opcodePtr[1];
}
static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
static inline void put2ByteValueAndAdvance(unsigned char*& opcodePtr, int value)
{
return ((opcodePtr[offset] << 16) | (opcodePtr[offset + 1] << 8) | opcodePtr[offset + 2]);
put2ByteValue(opcodePtr, value);
opcodePtr += 2;
}
#define MAX_PATTERN_SIZE (1 << 24)
#elif LINK_SIZE == 4
static inline void putOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned value)
static inline void putLinkValueAllowZero(unsigned char* opcodePtr, int value)
{
opcodePtr[offset] = value >> 24;
opcodePtr[offset + 1] = value >> 16;
opcodePtr[offset + 2] = value >> 8;
opcodePtr[offset + 3] = value & 255;
put2ByteValue(opcodePtr, value);
}
static inline int getOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
static inline int getLinkValueAllowZero(const unsigned char* opcodePtr)
{
return ((opcodePtr[offset] << 24) | (opcodePtr[offset + 1] << 16) | (opcodePtr[offset + 2] << 8) | opcodePtr[offset + 3]);
return get2ByteValue(opcodePtr);
}
#define MAX_PATTERN_SIZE (1 << 30) /* Keep it positive */
#else
#error LINK_SIZE must be either 2, 3, or 4
#endif
#define MAX_PATTERN_SIZE (1 << 16)
static inline void putOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
static inline void putLinkValue(unsigned char* opcodePtr, int value)
{
putOpcodeValueAtOffset(opcodePtr, offset, value);
opcodePtr += LINK_SIZE;
ASSERT(value);
putLinkValueAllowZero(opcodePtr, value);
}
/* PCRE uses some other 2-byte quantities that do not change when the size of
offsets changes. There are used for repeat counts and for other things such as
capturing parenthesis numbers in back references. */
static inline void put2ByteOpcodeValueAtOffset(uschar* opcodePtr, size_t offset, unsigned short value)
static inline int getLinkValue(const unsigned char* opcodePtr)
{
opcodePtr[offset] = value >> 8;
opcodePtr[offset + 1] = value & 255;
int value = getLinkValueAllowZero(opcodePtr);
ASSERT(value);
return value;
}
static inline short get2ByteOpcodeValueAtOffset(const uschar* opcodePtr, size_t offset)
static inline void putLinkValueAndAdvance(unsigned char*& opcodePtr, int value)
{
return ((opcodePtr[offset] << 8) | opcodePtr[offset + 1]);
putLinkValue(opcodePtr, value);
opcodePtr += LINK_SIZE;
}
static inline void put2ByteOpcodeValueAtOffsetAndAdvance(uschar*& opcodePtr, size_t offset, unsigned short value)
static inline void putLinkValueAllowZeroAndAdvance(unsigned char*& opcodePtr, int value)
{
put2ByteOpcodeValueAtOffset(opcodePtr, offset, value);
opcodePtr += 2;
putLinkValueAllowZero(opcodePtr, value);
opcodePtr += LINK_SIZE;
}
// FIXME: These are really more of a "compiled regexp state" than "regexp options"
......@@ -245,16 +179,6 @@ enum RegExpOptions {
MatchAcrossMultipleLinesOption = 0x00000002
};
/* Negative values for the firstchar and reqchar variables */
#define REQ_UNSET (-2)
#define REQ_NONE (-1)
/* The maximum remaining length of subject we are prepared to search for a
req_byte match. */
#define REQ_BYTE_MAX 1000
/* Flags added to firstbyte or reqbyte; a "non-literal" item is either a
variable-length repeat, or a anything other than literal characters. */
......@@ -366,8 +290,6 @@ must also be updated to match. */
macro(ASSERT) \
macro(ASSERT_NOT) \
\
macro(ONCE) \
\
macro(BRAZERO) \
macro(BRAMINZERO) \
macro(BRANUMBER) \
......@@ -381,57 +303,15 @@ study.c that all opcodes are less than 128 in value. This makes handling UTF-8
character sequences easier. */
/* The highest extraction number before we have to start using additional
bytes. (Originally PCRE didn't have support for extraction counts highter than
bytes. (Originally PCRE didn't have support for extraction counts higher than
this number.) The value is limited by the number of opcodes left after OP_BRA,
i.e. 255 - OP_BRA. We actually set it a bit lower to leave room for additional
opcodes. */
#define EXTRACT_BASIC_MAX 100
/* This macro defines the length of fixed length operations in the compiled
regex. The lengths are used when searching for specific things, and also in the
debugging printing of a compiled regex. We use a macro so that it can be
defined close to the definitions of the opcodes themselves.
As things have been extended, some of these are no longer fixed lenths, but are
minima instead. For example, the length of a single-character repeat may vary
in UTF-8 mode. The code that uses this table must know about such things. */
#define OP_LENGTHS \
1, /* End */ \
1, 1, 1, 1, 1, 1, 1, 1, /* \B, \b, \D, \d, \S, \s, \W, \w */ \
1, /* Any */ \
1, 1, /* ^, $ */ \
2, 2, /* Char, Charnc - minimum lengths */ \
2, 2, /* ASCII char or non-cased */ \
2, /* not */ \
/* Positive single-char repeats ** These are */ \
2, 2, 2, 2, 2, 2, /* *, *?, +, +?, ?, ?? ** minima in */ \
4, 4, 4, /* upto, minupto, exact ** UTF-8 mode */ \
/* Negative single-char repeats - only for chars < 256 */ \
2, 2, 2, 2, 2, 2, /* NOT *, *?, +, +?, ?, ?? */ \
4, 4, 4, /* NOT upto, minupto, exact */ \
/* Positive type repeats */ \
2, 2, 2, 2, 2, 2, /* Type *, *?, +, +?, ?, ?? */ \
4, 4, 4, /* Type upto, minupto, exact */ \
/* Character class & ref repeats */ \
1, 1, 1, 1, 1, 1, /* *, *?, +, +?, ?, ?? */ \
5, 5, /* CRRANGE, CRMINRANGE */ \
33, /* CLASS */ \
33, /* NCLASS */ \
0, /* XCLASS - variable length */ \
3, /* REF */ \
1 + LINK_SIZE, /* Alt */ \
1 + LINK_SIZE, /* Ket */ \
1 + LINK_SIZE, /* KetRmax */ \
1 + LINK_SIZE, /* KetRmin */ \
1 + LINK_SIZE, /* Assert */ \
1 + LINK_SIZE, /* Assert not */ \
1 + LINK_SIZE, /* Once */ \
1, 1, /* BRAZERO, BRAMINZERO */ \
3, /* BRANUMBER */ \
1 + LINK_SIZE /* BRA */ \
/* FIXME: Note that OP_BRA + 100 is > 128, so the two comments above
are in conflict! */
#define EXTRACT_BASIC_MAX 100
/* The index of names and the
code vector run on as long as necessary after the end. We store an explicit
......@@ -442,16 +322,13 @@ pointer that is always NULL.
*/
struct JSRegExp {
pcre_uint32 options;
unsigned options;
pcre_uint16 top_bracket;
pcre_uint16 top_backref;
unsigned short top_bracket;
unsigned short top_backref;
// jsRegExpExecute && jsRegExpCompile currently only how to handle ASCII
// chars for thse optimizations, however it would be trivial to add support
// for optimized UChar first_byte/req_byte scans
pcre_uint16 first_byte;
pcre_uint16 req_byte;
unsigned short first_byte;
unsigned short req_byte;
};
/* Internal shared data tables. These are tables that are used by more than one
......@@ -459,73 +336,82 @@ struct JSRegExp {
but are not part of the PCRE public API. The data for these tables is in the
pcre_tables.c module. */
#define _pcre_utf8_table1_size 6
#define kjs_pcre_utf8_table1_size 6
extern const int _pcre_utf8_table1[6];
extern const int _pcre_utf8_table2[6];
extern const int _pcre_utf8_table3[6];
extern const uschar _pcre_utf8_table4[0x40];
extern const int kjs_pcre_utf8_table1[6];
extern const int kjs_pcre_utf8_table2[6];
extern const int kjs_pcre_utf8_table3[6];
extern const unsigned char kjs_pcre_utf8_table4[0x40];
extern const uschar _pcre_default_tables[tables_length];
extern const unsigned char kjs_pcre_default_tables[tables_length];
static inline uschar toLowerCase(uschar c)
static inline unsigned char toLowerCase(unsigned char c)
{
static const uschar* lowerCaseChars = _pcre_default_tables + lcc_offset;
static const unsigned char* lowerCaseChars = kjs_pcre_default_tables + lcc_offset;
return lowerCaseChars[c];
}
static inline uschar flipCase(uschar c)
static inline unsigned char flipCase(unsigned char c)
{
static const uschar* flippedCaseChars = _pcre_default_tables + fcc_offset;
static const unsigned char* flippedCaseChars = kjs_pcre_default_tables + fcc_offset;
return flippedCaseChars[c];
}
static inline uschar classBitmapForChar(uschar c)
static inline unsigned char classBitmapForChar(unsigned char c)
{
static const uschar* charClassBitmaps = _pcre_default_tables + cbits_offset;
static const unsigned char* charClassBitmaps = kjs_pcre_default_tables + cbits_offset;
return charClassBitmaps[c];
}
static inline uschar charTypeForChar(uschar c)
static inline unsigned char charTypeForChar(unsigned char c)
{
const uschar* charTypeMap = _pcre_default_tables + ctypes_offset;
const unsigned char* charTypeMap = kjs_pcre_default_tables + ctypes_offset;
return charTypeMap[c];
}
static inline bool isWordChar(UChar c)
{
/* UTF8 Characters > 128 are assumed to be "non-word" characters. */
return (c < 128 && (charTypeForChar(c) & ctype_word));
return c < 128 && (charTypeForChar(c) & ctype_word);
}
static inline bool isSpaceChar(UChar c)
{
return (c < 128 && (charTypeForChar(c) & ctype_space));
return c < 128 && (charTypeForChar(c) & ctype_space);
}
/* Internal shared functions. These are functions that are used by more than
one of the exported public functions. They have to be "external" in the C
sense, but are not part of the PCRE public API. */
extern int _pcre_ucp_othercase(const unsigned int);
extern bool _pcre_xclass(int, const uschar*);
static inline bool isNewline(UChar nl)
{
return (nl == 0xA || nl == 0xD || nl == 0x2028 || nl == 0x2029);
}
// FIXME: It's unclear to me if this moves the opcode ptr to the start of all branches
// or to the end of all branches -- ecs
// FIXME: This abstraction is poor since it assumes that you want to jump based on whatever
// the next value in the stream is, and *then* follow any OP_ALT branches.
static inline void moveOpcodePtrPastAnyAlternateBranches(const uschar*& opcodePtr)
static inline bool isBracketStartOpcode(unsigned char opcode)
{
if (opcode >= OP_BRA)
return true;
switch (opcode) {
case OP_ASSERT:
case OP_ASSERT_NOT:
return true;
default:
return false;
}
}
static inline void advanceToEndOfBracket(const unsigned char*& opcodePtr)
{
do {
opcodePtr += getOpcodeValueAtOffset(opcodePtr, 1);
} while (*opcodePtr == OP_ALT);
ASSERT(isBracketStartOpcode(*opcodePtr) || *opcodePtr == OP_ALT);
do
opcodePtr += getLinkValue(opcodePtr + 1);
while (*opcodePtr == OP_ALT);
}
/* Internal shared functions. These are functions that are used in more
that one of the source files. They have to have external linkage, but
but are not part of the public API and so not exported from the library. */
extern int kjs_pcre_ucp_othercase(unsigned);
extern bool kjs_pcre_xclass(int, const unsigned char*);
#endif
#endif
......
......@@ -49,20 +49,20 @@ PCRE code modules. */
/* These are the breakpoints for different numbers of bytes in a UTF-8
character. */
const int _pcre_utf8_table1[6] =
const int kjs_pcre_utf8_table1[6] =
{ 0x7f, 0x7ff, 0xffff, 0x1fffff, 0x3ffffff, 0x7fffffff};
/* These are the indicator bits and the mask for the data bits to set in the
first byte of a character, indexed by the number of additional bytes. */
const int _pcre_utf8_table2[6] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
const int _pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
const int kjs_pcre_utf8_table2[6] = { 0, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc};
const int kjs_pcre_utf8_table3[6] = { 0xff, 0x1f, 0x0f, 0x07, 0x03, 0x01};
/* Table of the number of extra characters, indexed by the first character
masked with 0x3f. The highest number for a valid UTF-8 character is in fact
0x3d. */
const uschar _pcre_utf8_table4[0x40] = {
const unsigned char kjs_pcre_utf8_table4[0x40] = {
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,
......
......@@ -59,7 +59,7 @@ Arguments:
Returns: the other case or -1 if none
*/
int _pcre_ucp_othercase(const unsigned c)
int kjs_pcre_ucp_othercase(unsigned c)
{
int bot = 0;
int top = sizeof(ucp_table) / sizeof(cnode);
......
......@@ -59,13 +59,13 @@ Returns: true if character matches, else false
/* Get the next UTF-8 character, advancing the pointer. This is called when we
know we are in UTF-8 mode. */
static inline void getUTF8CharAndAdvancePointer(int& c, const uschar*& subjectPtr)
static inline void getUTF8CharAndAdvancePointer(int& c, const unsigned char*& subjectPtr)
{
c = *subjectPtr++;
if ((c & 0xc0) == 0xc0) {
int gcaa = _pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
int gcaa = kjs_pcre_utf8_table4[c & 0x3f]; /* Number of additional bytes */
int gcss = 6 * gcaa;
c = (c & _pcre_utf8_table3[gcaa]) << gcss;
c = (c & kjs_pcre_utf8_table3[gcaa]) << gcss;
while (gcaa-- > 0) {
gcss -= 6;
c |= (*subjectPtr++ & 0x3f) << gcss;
......@@ -73,7 +73,7 @@ static inline void getUTF8CharAndAdvancePointer(int& c, const uschar*& subjectPt
}
}
bool _pcre_xclass(int c, const uschar* data)
bool kjs_pcre_xclass(int c, const unsigned char* data)
{
bool negated = (*data & XCL_NOT);
......
......@@ -45,8 +45,8 @@ POSSIBILITY OF SUCH DAMAGE.
words that form a data item in the table. */
typedef struct cnode {
pcre_uint32 f0;
pcre_uint32 f1;
unsigned f0;
unsigned f1;
} cnode;
/* Things for the f0 field */
......
......@@ -48,12 +48,14 @@ namespace WTF {
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIAlpha(wchar_t c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#endif
inline bool isASCIIAlpha(int c) { return (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIAlphanumeric(char c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIAlphanumeric(unsigned short c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIAlphanumeric(wchar_t c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
#endif
inline bool isASCIIAlphanumeric(int c) { return c >= '0' && c <= '9' || (c | 0x20) >= 'a' && (c | 0x20) <= 'z'; }
inline bool isASCIIDigit(char c) { return (c >= '0') & (c <= '9'); }
inline bool isASCIIDigit(unsigned short c) { return (c >= '0') & (c <= '9'); }
......@@ -67,30 +69,35 @@ namespace WTF {
#if !COMPILER(MSVC) || defined(_NATIVE_WCHAR_T_DEFINED)
inline bool isASCIIHexDigit(wchar_t c) { retu