Drupal 8  8.0.2
Unicode Class Reference

Static Public Member Functions

static getStatus ()
 
static setStatus ($status)
 
static check ()
 
static encodingFromBOM ($data)
 
static convertToUtf8 ($data, $encoding)
 
static truncateBytes ($string, $len)
 
static strlen ($text)
 
static strtoupper ($text)
 
static strtolower ($text)
 
static ucfirst ($text)
 
static lcfirst ($text)
 
static ucwords ($text)
 
static substr ($text, $start, $length=NULL)
 
static truncate ($string, $max_length, $wordsafe=FALSE, $add_ellipsis=FALSE, $min_wordsafe_length=1)
 
static strcasecmp ($str1, $str2)
 
static mimeHeaderEncode ($string)
 
static mimeHeaderDecode ($header)
 
static caseFlip ($matches)
 
static validateUtf8 ($text)
 
static strpos ($haystack, $needle, $offset=0)
 

Data Fields

const PREG_CLASS_WORD_BOUNDARY
 
const STATUS_SINGLEBYTE = 0
 
const STATUS_MULTIBYTE = 1
 
const STATUS_ERROR = -1
 

Static Protected Attributes

static $status = 0
 

Detailed Description

Provides Unicode-related conversions and operations.

Member Function Documentation

static caseFlip (   $matches)
static

Flip U+C0-U+DE to U+E0-U+FD and back. Can be used as preg_replace callback.

Parameters
array$matchesAn array of matches by preg_replace_callback().
Returns
string The flipped text.
static check ( )
static

Checks for Unicode support in PHP and sets the proper settings if possible.

Because of the need to be able to handle text in various encodings, we do not support mbstring function overloading. HTTP input/output conversion must be disabled for similar reasons.

Returns
string A string identifier of a failed multibyte extension check, if any. Otherwise, an empty string.

Referenced by DrupalKernel\bootEnvironment(), and UnicodeTest\setUp().

Here is the caller graph for this function:

static convertToUtf8 (   $data,
  $encoding 
)
static

Converts data to UTF-8.

Requires the iconv, GNU recode or mbstring PHP extension.

Parameters
string$dataThe data to be converted.
string$encodingThe encoding that the data is in.
Returns
string|bool Converted data or FALSE.

Referenced by CssOptimizer\loadFile(), JsOptimizer\optimize(), and UnicodeTest\testConvertToUtf8().

Here is the caller graph for this function:

static encodingFromBOM (   $data)
static

Decodes UTF byte-order mark (BOM) into the encoding's name.

Parameters
string$dataThe data possibly containing a BOM. This can be the entire contents of a file, or just a fragment containing at least the first five bytes.
Returns
string|bool The name of the encoding, or FALSE if no byte order mark was present.

References Unicode\strpos().

Referenced by CssOptimizer\loadFile(), and JsOptimizer\optimize().

Here is the call graph for this function:

Here is the caller graph for this function:

static getStatus ( )
static

Gets the current status of unicode/multibyte support on this environment.

Returns
int The status of multibyte support. It can be one of:
  • ::STATUS_MULTIBYTE Full unicode support using an extension.
  • ::STATUS_SINGLEBYTE Standard PHP (emulated) unicode support.
  • ::STATUS_ERROR An error occurred. No unicode support.

Referenced by UnicodeTest\testStatus().

Here is the caller graph for this function:

static mimeHeaderDecode (   $header)
static

Decodes MIME/HTTP encoded header values.

Parameters
string$headerThe header to decode.
Returns
string The mime-decoded header.

References Unicode\strtolower().

Referenced by UnicodeTest\testMimeHeader().

Here is the call graph for this function:

Here is the caller graph for this function:

static mimeHeaderEncode (   $string)
static

Encodes MIME/HTTP headers that contain incorrectly encoded characters.

For example, Unicode::mimeHeaderEncode('t├ęst.txt') returns "=?UTF-8?B?dMOpc3QudHh0?=".

See http://www.rfc-editor.org/rfc/rfc2047.txt for more information.

Notes:

  • Only encode strings that contain non-ASCII characters.
  • We progressively cut-off a chunk with self::truncateBytes(). This ensures each chunk starts and ends on a character boundary.
  • Using
    as the chunk separator may cause problems on some systems and may have to be changed to
    or .
Parameters
string$stringThe header to encode.
Returns
string The mime-encoded header.

References Unicode\strlen(), and Unicode\substr().

Referenced by PhpMail\mail(), and UnicodeTest\testMimeHeader().

Here is the call graph for this function:

Here is the caller graph for this function:

static setStatus (   $status)
static

Sets the value for multibyte support status for the current environment.

The following status keys are supported:

  • ::STATUS_MULTIBYTE Full unicode support using an extension.
  • ::STATUS_SINGLEBYTE Standard PHP (emulated) unicode support.
  • ::STATUS_ERROR An error occurred. No unicode support.
Parameters
int$statusThe new status of multibyte support.

Referenced by UnicodeTest\testLcfirst(), UnicodeTest\testStatus(), UnicodeTest\testStrlen(), UnicodeTest\testStrpos(), UnicodeTest\testStrtolower(), UnicodeTest\testStrtoupper(), UnicodeTest\testSubstr(), and UnicodeTest\testUcwords().

Here is the caller graph for this function:

static strcasecmp (   $str1,
  $str2 
)
static

Compares UTF-8-encoded strings in a binary safe case-insensitive manner.

Parameters
string$str1The first string.
string$str2The second string.
Returns
int Returns < 0 if $str1 is less than $str2; > 0 if $str1 is greater than $str2, and 0 if they are equal.
static strlen (   $text)
static
static strpos (   $haystack,
  $needle,
  $offset = 0 
)
static

Finds the position of the first occurrence of a string in another string.

Parameters
string$haystackThe string to search in.
string$needleThe string to find in $haystack.
int$offsetIf specified, start the search at this number of characters from the beginning (default 0).
Returns
int|false The position where $needle occurs in $haystack, always relative to the beginning (independent of $offset), or FALSE if not found. Note that a return value of 0 is not the same as FALSE.

Referenced by Unicode\encodingFromBOM(), and UnicodeTest\testStrpos().

Here is the caller graph for this function:

static strtolower (   $text)
static

Converts a UTF-8 string to lowercase.

Parameters
string$textThe string to run the operation on.
Returns
string The string in lowercase.

Referenced by StringFieldTest\_testTextfieldWidgets(), HtmlToTextTest\assertHtmlToText(), HandlerBase\caseTransform(), Condition\compile(), NegotiationConfigureForm\configureFormTable(), FieldUnitTestBase\createFieldWithStorage(), ConfigTranslationListUiTest\doBlockListTest(), ConfigTranslationListUiTest\doContactFormsListTest(), ConfigTranslationListUiTest\doContentTypeListTest(), ConfigTranslationListUiTest\doCustomContentTypeListTest(), ConfigTranslationListUiTest\doFieldListTest(), ConfigTranslationListUiTest\doFormatsListTest(), ConfigTranslationListUiTest\doMenuListTest(), ConfigTranslationListUiTest\doShortcutListTest(), ConfigTranslationListUiTest\doUserRoleListTest(), ConfigTranslationListUiTest\doVocabularyListTest(), RequestPath\evaluate(), ViewsDataHelper\fetchedFieldSort(), Html\getClass(), ConfigTranslationFieldListBuilder\getFilterLabels(), Html\getId(), EntityType\getLowercaseLabel(), BlockBase\getMachineNameSuggestion(), BlockContentTranslationUITest\getNewEntityValues(), AnnotatedClassDiscovery\getProviderFromNamespace(), PhpSelection\getReferenceableEntities(), TestToolkit\getSupportedExtensions(), GDToolkit\getSupportedExtensions(), ConfigTranslationContextualLink\getTitle(), ConfigTranslationLocalTask\getTitle(), EntityAutocompleteController\handleAutocomplete(), Sql\init(), Condition\match(), PhpSelection\matchLabel(), Unicode\mimeHeaderDecode(), Schema\processField(), StringArgument\query(), ManageDisplayTest\setUp(), ViewEntityDependenciesTest\setUp(), BooleanFormatterSettingsTest\setUp(), TaxonomyTermViewTest\setUp(), EntityReferenceSettingsTest\setUp(), TimestampFormatterTest\setUp(), BooleanFormatterTest\setUp(), RawStringFormatterTest\setUp(), StringFormatterTest\setUp(), DateTimeFieldTest\setUp(), EntityReferenceItemTest\setUp(), TranslationTest\setUp(), BlockContentCreationTest\testBlockDelete(), BlockLanguageCacheTest\testBlockLinks(), BooleanFieldTest\testBooleanField(), EntityQueryTest\testCaseSensitivity(), BlockContentCreationTest\testConfigDependencies(), EntityReferenceItemTest\testConfigEntityReferenceItem(), EntityReferenceSettingsTest\testConfigTargetBundleDeletion(), ConfigTranslationUiTest\testContactConfigEntityTranslation(), FieldCrudTest\testCreateFieldCustomStorage(), NumberFieldTest\testCreateNumberDecimalField(), NumberFieldTest\testCreateNumberFloatField(), EntityReferenceSettingsTest\testCustomTargetBundleDeletion(), DateFormatsMachineNameTest\testDateFormatsMachineNameAllowedValues(), FilterDefaultFormatTest\testDefaultTextFormats(), FilterAdminTest\testDisabledFormat(), EditorAdminTest\testDisableFormatWithEditor(), EmailFieldTest\testEmailField(), EntityReferenceFieldDefaultValueTest\testEntityReferenceDefaultConfigValue(), EntityReferenceFieldDefaultValueTest\testEntityReferenceDefaultValue(), LinkFieldUITest\testFieldUI(), FilterAdminTest\testFilterAdmin(), FilterHooksTest\testFilterHooks(), FilterAdminTest\testFormatAdmin(), ImageFieldDefaultImagesTest\testInvalidDefaultImage(), LinkFieldTest\testLinkFormatter(), LinkFieldTest\testLinkSeparateFormatter(), LinkFieldTest\testLinkTitle(), MenuLanguageTest\testMenuLanguage(), MultiStepNodeFormBasicOptionsTest\testMultiStepNodeFormBasicOptions(), NodeTypeTranslationTest\testNodeTypeTitleLabelTranslation(), NodeTypeTranslationTest\testNodeTypeTranslation(), NumberFieldTest\testNumberDecimalField(), NumberFieldTest\testNumberFloatField(), NumberFieldTest\testNumberFormatter(), NumberFieldTest\testNumberIntegerField(), PageEditTest\testPageEdit(), ConfigImportRenameValidationTest\testRenameValidation(), ResponsiveImageFieldDisplayTest\testResponsiveImageFieldFormattersEmptyMediaQuery(), EntityReferenceItemTest\testSelectionHandlerSettings(), ContactSitewideTest\testSiteWideContact(), UnicodeTest\testStrtolower(), VocabularyUiTest\testTaxonomyAdminDeletingVocabulary(), TextFieldTest\testTextFieldValidation(), EditorFilterIntegrationTest\testTextFormatIntegration(), SearchTokenizerTest\testTokenizer(), VocabularyCrudTest\testUninstallReinstall(), UriItemTest\testUriField(), LinkFieldTest\testURLValidation(), VocabularyLanguageTest\testVocabularyDefaultLanguageForTerms(), VocabularyUiTest\testVocabularyInterface(), VocabularyTranslationTest\testVocabularyLanguage(), VocabularyLanguageTest\testVocabularyLanguage(), and MachineNameController\transliterate().

static strtoupper (   $text)
static

Converts a UTF-8 string to uppercase.

Parameters
string$textThe string to run the operation on.
Returns
string The string in uppercase.

Referenced by HandlerBase\caseTransform(), ConvertImageEffect\getSummary(), TestDiscovery\isUnitTest(), Schema\processField(), PathAliasTest\testAdminAlias(), EntityQueryTest\testCaseSensitivity(), HtmlToTextTest\testDrupalHtmlToTextBlockTagToNewline(), GlossaryTest\testGlossaryView(), PathAliasTest\testNodeAlias(), UnicodeTest\testStrtoupper(), and Unicode\ucwords().

Here is the caller graph for this function:

static substr (   $text,
  $start,
  $length = NULL 
)
static

Cuts off a piece of a string based on character indices and counts.

Follows the same behavior as PHP's own substr() function. Note that for cutting off a string at a known character/substring location, the usage of PHP's normal strpos/substr is safe and much faster.

Parameters
string$textThe input string.
int$startThe position at which to start reading.
int$lengthThe number of characters to read.
Returns
string The shortened string.

References Unicode\strlen().

Referenced by Updater\findInfoFile(), UriItem\generateSampleValue(), Sql\init(), ConfigInstaller\installCollectionDefaultConfig(), TestDiscovery\isUnitTest(), CssOptimizer\loadFile(), DbLog\log(), PhpSelection\matchLabel(), Unicode\mimeHeaderEncode(), JsOptimizer\optimize(), EntityUser\processStubRow(), EntityFile\processStubRow(), EntityQueryTest\testCaseSensitivity(), DedupeEntityTest\testDedupe(), QuickEditLoadingTest\testImageField(), SearchSimplifyTest\testSearchSimplifyUnicode(), AttachedAssetsTest\testSettings(), UnicodeTest\testSubstr(), TermTest\testTaxonomyGetTermByName(), QuickEditLoadingTest\testTitleBaseField(), QuickEditLoadingTest\testUserWithPermission(), DedupeBase\transform(), and Unicode\truncateBytes().

Here is the call graph for this function:

Here is the caller graph for this function:

static truncate (   $string,
  $max_length,
  $wordsafe = FALSE,
  $add_ellipsis = FALSE,
  $min_wordsafe_length = 1 
)
static

Truncates a UTF-8-encoded string safely to a number of characters.

Parameters
string$stringThe string to truncate.
int$max_lengthAn upper limit on the returned string length, including trailing ellipsis if $add_ellipsis is TRUE.
bool$wordsafeIf TRUE, attempt to truncate on a word boundary. Word boundaries are spaces, punctuation, and Unicode characters used as word boundaries in non-Latin languages; see Unicode::PREG_CLASS_WORD_BOUNDARY for more information. If a word boundary cannot be found that would make the length of the returned string fall within length guidelines (see parameters $max_length and $min_wordsafe_length), word boundaries are ignored.
bool$add_ellipsisIf TRUE, add '...' to the end of the truncated string (defaults to FALSE). The string length will still fall within $max_length.
int$min_wordsafe_lengthIf $wordsafe is TRUE, the minimum acceptable length for truncation (before adding an ellipsis, if $add_ellipsis is TRUE). Has no effect if $wordsafe is FALSE. This can be used to prevent having a very short resulting string that will not be understandable. For instance, if you are truncating the string "See myverylongurlexample.com for more information" to a word-safe return length of 20, the only available word boundary within 20 characters is after the word "See", which wouldn't leave a very informative string. If you had set $min_wordsafe_length to 10, though, the function would realise that "See" alone is too short, and would then just truncate ignoring word boundaries, giving you "See myverylongurl..." (assuming you had set $add_ellipses to TRUE).
Returns
string The truncated string.

References Unicode\PREG_CLASS_WORD_BOUNDARY.

Referenced by PathController\adminOverview(), InOperator\adminSummary(), DbLogTest\assertLogMessage(), SchemaTest\checkSchemaComment(), DbLogController\overview(), MenuParentFormSelector\parentSelectOptionsTreeWalk(), MenuLink\prepareRow(), DefaultProcessor\process(), BookManager\recurseTableOfContents(), MenuLinkSourceTest\setUp(), SearchPluginBase\suggestedTitle(), LinkFieldTest\testLinkFormatter(), LinkFieldTest\testLinkSeparateFormatter(), SearchPageTextTest\testSearchText(), UnicodeTest\testTruncate(), LinkSeparateFormatter\viewElements(), and LinkFormatter\viewElements().

Here is the caller graph for this function:

static truncateBytes (   $string,
  $len 
)
static

Truncates a UTF-8-encoded string safely to a number of bytes.

If the end position is in the middle of a UTF-8 sequence, it scans backwards until the beginning of the byte sequence.

Use this function whenever you want to chop off a string at an unsure location. On the other hand, if you're sure that you're splitting on a character boundary (e.g. after using strpos() or similar), you can safely use substr() instead.

Parameters
string$stringThe string to truncate.
int$lenAn upper limit on the returned string length.
Returns
string The truncated string.

References Unicode\strlen(), and Unicode\substr().

Referenced by Connection\query(), and UnicodeTest\testTruncateBytes().

Here is the call graph for this function:

Here is the caller graph for this function:

static ucfirst (   $text)
static

Capitalizes the first character of a UTF-8 string.

Parameters
string$textThe string to convert.
Returns
string The string with the first character as uppercase.

Referenced by PathBasedBreadcrumbBuilder\build(), ModulesListForm\buildRow(), HandlerBase\caseTransform(), PathProcessorLanguage\initProcessors(), DependencyTest\testMissingModules(), and UnicodeTest\testUcfirst().

Here is the caller graph for this function:

static validateUtf8 (   $text)
static

Checks whether a string is valid UTF-8.

All functions designed to filter input should use drupal_validate_utf8 to ensure they operate on valid UTF-8 strings to prevent bypass of the filter.

When text containing an invalid UTF-8 lead byte (0xC0 - 0xFF) is presented as UTF-8 to Internet Explorer 6, the program may misinterpret subsequent bytes. When these subsequent bytes are HTML control characters such as quotes or angle brackets, parts of the text that were deemed safe by filters end up in locations that are potentially unsafe; An onerror attribute that is outside of a tag, and thus deemed safe by a filter, can be interpreted by the browser as if it were inside the tag.

The function does not return FALSE for strings containing character codes above U+10FFFF, even though these are prohibited by RFC 3629.

Parameters
string$textThe text to check.
Returns
bool TRUE if the text is valid UTF-8, FALSE if not.

References Unicode\strlen().

Referenced by Xss\filter(), and UnicodeTest\testValidateUtf8().

Here is the call graph for this function:

Here is the caller graph for this function:

Field Documentation

const PREG_CLASS_WORD_BOUNDARY

Matches Unicode characters that are word boundaries.

Characters with the following General_category (gc) property values are used as word boundaries. While this does not fully conform to the Word Boundaries algorithm described in http://unicode.org/reports/tr29, as PCRE does not contain the Word_Break property table, this simpler algorithm has to do.

  • Cc, Cf, Cn, Co, Cs: Other.
  • Pc, Pd, Pe, Pf, Pi, Po, Ps: Punctuation.
  • Sc, Sk, Sm, So: Symbols.
  • Zl, Zp, Zs: Separators.

Non-boundary characters include the following General_category (gc) property values:

  • Ll, Lm, Lo, Lt, Lu: Letters.
  • Mc, Me, Mn: Combining Marks.
  • Nd, Nl, No: Numbers.

Note that the PCRE property matcher is not used because we wanted to be compatible with Unicode 5.2.0 regardless of the PCRE version used (and any bugs in PCRE property tables).

See Also
http://unicode.org/glossary

Referenced by Unicode\truncate().

const STATUS_ERROR = -1

Indicates an error during check for PHP unicode support.

Referenced by UnicodeTest\providerTestStatus().

const STATUS_MULTIBYTE = 1

The documentation for this class was generated from the following file: