[FEATURE] Use utf8mb4 on mysql for new instances 40/56440/10
authorLienhart Woitok <lienhart.woitok@netlogix.de>
Fri, 23 Mar 2018 12:54:40 +0000 (13:54 +0100)
committerJigal van Hemert <jigal.van.hemert@typo3.org>
Tue, 11 Sep 2018 15:30:33 +0000 (17:30 +0200)
If installing a new TYPO3 instance on mysql, utf8mb4 is now used as
default charset for the database connection and as default collation.

Upgraders may change LocalConfiguration to use utf8mb4, too. They
however need to take care of changing their collations and setting
according table detaults on their own.

A reports status check verifies there is no mixed collation.

Resolves: #80398
Resolves: #82080
Resolves: #82551
Releases: master
Change-Id: I6bf464a22c6ed74631bf5aacff9c2cfe670077da
Reviewed-on: https://review.typo3.org/56440
Reviewed-by: Christian Kuhn <lolli@schwarzbu.ch>
Tested-by: Christian Kuhn <lolli@schwarzbu.ch>
Tested-by: TYPO3com <no-reply@typo3.com>
Reviewed-by: Lienhart Woitok <lienhart.woitok@netlogix.de>
Tested-by: Lienhart Woitok <lienhart.woitok@netlogix.de>
Reviewed-by: Georg Gro├čberger <garfieldius67@gmail.com>
Reviewed-by: Jigal van Hemert <jigal.van.hemert@typo3.org>
Tested-by: Jigal van Hemert <jigal.van.hemert@typo3.org>
typo3/sysext/core/Classes/Database/Schema/ConnectionMigrator.php
typo3/sysext/core/Documentation/Changelog/master/Feature-80398-Utf8mb4OnMysqlByDefaultForNewInstances.rst [new file with mode: 0644]
typo3/sysext/core/Resources/Private/Sql/Cache/Backend/Typo3DatabaseBackendCache.sql
typo3/sysext/core/Resources/Private/Sql/Cache/Backend/Typo3DatabaseBackendTags.sql
typo3/sysext/core/ext_tables.sql
typo3/sysext/frontend/ext_tables.sql
typo3/sysext/install/Classes/Controller/InstallerController.php
typo3/sysext/redirects/ext_tables.sql
typo3/sysext/reports/Classes/Report/Status/ConfigurationStatus.php
typo3/sysext/reports/Resources/Private/Language/locallang_reports.xlf

index 990fb6c..68b1343 100644 (file)
@@ -311,6 +311,9 @@ class ConnectionMigrator
 
         $schemaConfig = GeneralUtility::makeInstance(SchemaConfig::class);
         $schemaConfig->setName($this->connection->getDatabase());
+        if (isset($this->connection->getParams()['tableoptions'])) {
+            $schemaConfig->setDefaultTableOptions($this->connection->getParams()['tableoptions']);
+        }
 
         return GeneralUtility::makeInstance(Schema::class, $tablesForConnection, [], $schemaConfig);
     }
@@ -1097,6 +1100,7 @@ class ConnectionMigrator
      */
     protected function transformTablesForDatabasePlatform(array $tables, Connection $connection): array
     {
+        $defaultTableOptions = $connection->getParams()['tableoptions'] ?? [];
         foreach ($tables as &$table) {
             $indexes = [];
             foreach ($table->getIndexes() as $key => $index) {
@@ -1141,7 +1145,7 @@ class ConnectionMigrator
                 $indexes,
                 $table->getForeignKeys(),
                 0,
-                $table->getOptions()
+                array_merge($defaultTableOptions, $table->getOptions())
             );
         }
 
diff --git a/typo3/sysext/core/Documentation/Changelog/master/Feature-80398-Utf8mb4OnMysqlByDefaultForNewInstances.rst b/typo3/sysext/core/Documentation/Changelog/master/Feature-80398-Utf8mb4OnMysqlByDefaultForNewInstances.rst
new file mode 100644 (file)
index 0000000..5c3dfe4
--- /dev/null
@@ -0,0 +1,59 @@
+.. include:: ../../Includes.txt
+
+===============================================================
+Feature: #80398 - utf8mb4 on mysql by default for new instances
+===============================================================
+
+See :issue:`80398`
+
+Description
+===========
+
+New instances created by the TYPO3 installer now set `utf8mb4` as charset and `utf8mb4_unicode_ci`
+collation by default for instances running on MySQL. This allows 4 byte unicode characters
+like emojis in MySQL.
+
+If upgrading instances, admins may change :file:`LocalConfiguration.php` to use this feature.
+The core does not provide mechanisms to update the collation of existing tables
+from `utf8_unicode_ci` to `utf8mb4_unicode_ci` for existing instances, though. Admins need
+to manage that on their own if needed, the reports module shows an information if the
+table schema use mixed collations. This should be avoided and fixed after manually configuring
+`utf8mb4` to avoid SQL errors when joining tables having different collations.
+
+Also note that manually upgrading to `utf8mb4` may lead to index length issues: The maximum key
+length on InnoDB tables is often 767 bytes and options to increase that have even been actively
+removed, for instance in recent MariaDB versions.
+A typical case is an index on a varchar(255) field: The DBMS assumes the worst case for the index
+length, which is 3 bytes per character for a utf8 (utf8mb3), but 4 bytes for utf8mb4: With utf8,
+the maximum index length is 3*255 + 1 = 766 bytes which fits into 767, but with utf8mb4, this
+is 4*255 + 1 = 1021 bytes, which exceeds the maximum length and leads to SQL errors when setting
+such an index.
+This scenario gets more complex with combined indexes and may need manual investigation when
+upgrading an existing instance from from `utf8` to `utf8mb4`. One solution is to restrict the
+index length in ext_tables.sql of the affected extension: :php:`KEY myKey (myField(191))`, which
+in this case leads to 4*191 + 1 = 764 bytes as maximum used length.
+
+The basic settings to use `utf8mb4` in :file:`LocalConfiguration` are::
+
+'DB' => [
+    'Connections' => [
+        'Default' => [
+            'driver' => 'mysqli',
+            ...
+            'charset' => 'utf8mb4',
+            'tableoptions' => [
+                 'charset' => 'utf8mb4',
+                 'collate' => 'utf8mb4_unicode_ci',
+            ],
+        ],
+    ],
+],
+
+
+Impact
+======
+
+`utf8mb4` is an allowed charset and `utf8mb4_unicode_ci` is an allowed collation and
+used by default for new instances running on MySQL.
+
+.. index:: PHP-API, LocalConfiguration, Database
index 386bbce..e3754e4 100644 (file)
@@ -4,5 +4,5 @@ CREATE TABLE ###CACHE_TABLE### (
        expires int(11) unsigned DEFAULT '0' NOT NULL,
        content longblob,
        PRIMARY KEY (id),
-       KEY cache_id (identifier,expires)
+       KEY cache_id (identifier(180),expires)
 ) ENGINE=InnoDB;
\ No newline at end of file
index d9b3672..00eb9f2 100644 (file)
@@ -3,6 +3,6 @@ CREATE TABLE ###TAGS_TABLE### (
        identifier varchar(250) DEFAULT '' NOT NULL,
        tag varchar(250) DEFAULT '' NOT NULL,
        PRIMARY KEY (id),
-       KEY cache_id (identifier),
-       KEY cache_tag (tag)
+       KEY cache_id (identifier(191)),
+       KEY cache_tag (tag(191))
 ) ENGINE=InnoDB;
\ No newline at end of file
index b84d29a..de02f5c 100644 (file)
@@ -254,8 +254,8 @@ CREATE TABLE sys_file_processedfile (
        height int(11) DEFAULT '0',
 
        PRIMARY KEY (uid),
-       KEY combined_1 (original,task_type,configurationsha1),
-       KEY identifier (storage,identifier(199))
+       KEY combined_1 (original,task_type(100),configurationsha1),
+       KEY identifier (storage,identifier(180))
 );
 
 #
@@ -344,8 +344,8 @@ CREATE TABLE sys_history (
        history_data mediumtext,
        workspace int(11) DEFAULT '0',
 
-       KEY recordident_1 (tablename,recuid),
-       KEY recordident_2 (tablename,tstamp)
+       KEY recordident_1 (tablename(100),recuid),
+       KEY recordident_2 (tablename(100),tstamp)
 ) ENGINE=InnoDB;
 
 #
@@ -383,9 +383,9 @@ CREATE TABLE sys_refindex (
        ref_string varchar(1024) DEFAULT '' NOT NULL,
 
        PRIMARY KEY (hash),
-       KEY lookup_rec (tablename(240),recuid),
-       KEY lookup_uid (ref_table(240),ref_uid),
-       KEY lookup_string (ref_string(255))
+       KEY lookup_rec (tablename(100),recuid),
+       KEY lookup_uid (ref_table(100),ref_uid),
+       KEY lookup_string (ref_string(191))
 );
 
 #
index 317be4c..419869d 100644 (file)
@@ -66,8 +66,8 @@ CREATE TABLE fe_users (
        lastlogin int(10) unsigned DEFAULT '0' NOT NULL,
        is_online int(10) unsigned DEFAULT '0' NOT NULL,
 
-       KEY parent (pid,username),
-       KEY username (username),
+       KEY parent (pid,username(100)),
+       KEY username (username(100)),
        KEY is_online (is_online)
 );
 
@@ -124,7 +124,7 @@ CREATE TABLE sys_domain (
        domainName varchar(255) DEFAULT '' NOT NULL,
 
        KEY getSysDomain (hidden),
-       KEY getDomainStartPage (pid,hidden,domainName)
+       KEY getDomainStartPage (pid,hidden,domainName(100))
 );
 
 #
index 4bd732e..192ae52 100644 (file)
@@ -442,6 +442,14 @@ class InstallerController
                     $defaultConnectionSettings['path'] = Environment::getConfigPath() . $dbFilename;
                 }
             }
+            // For mysql, set utf8mb4 as default charset
+            if (isset($postValues['driver']) && in_array($postValues['driver'], ['mysqli', 'pdo_mysql'])) {
+                $defaultConnectionSettings['charset'] = 'utf8mb4';
+                $defaultConnectionSettings['tableoptions'] = [
+                    'charset' => 'utf8mb4',
+                    'collate' => 'utf8mb4_unicode_ci',
+                ];
+            }
         }
 
         $success = false;
@@ -450,7 +458,10 @@ class InstallerController
             try {
                 $connectionParams = $defaultConnectionSettings;
                 $connectionParams['wrapperClass'] = Connection::class;
-                $connectionParams['charset'] = 'utf-8';
+                if (!isset($connectionParams['charset'])) {
+                    // utf-8 as default for non mysql
+                    $connectionParams['charset'] = 'utf-8';
+                }
                 DriverManager::getConnection($connectionParams)->ping();
                 $success = true;
             } catch (DBALException $e) {
@@ -1054,7 +1065,7 @@ For each website you need a TypoScript template on the main page of your website
         if (strpos($defaultDatabaseCharset, 'utf8') !== 0) {
             $result = new FlashMessage(
                 'Your database uses character set "' . $defaultDatabaseCharset . '", '
-                . 'but only "utf8" is supported with TYPO3. You probably want to change this before proceeding.',
+                . 'but only "utf8" and "utf8mb4" are supported with TYPO3. You probably want to change this before proceeding.',
                 'Invalid Charset',
                 FlashMessage::ERROR
             );
index ed440fe..5b6d631 100644 (file)
@@ -15,5 +15,5 @@ CREATE TABLE sys_redirect (
        lasthiton int(11) DEFAULT '0' NOT NULL,
        disable_hitcount tinyint(1) unsigned DEFAULT '0' NOT NULL,
 
-       KEY index_source (source_host,source_path)
+       KEY index_source (source_host(80),source_path(80))
 );
index c3f280e..c31527d 100644 (file)
@@ -244,6 +244,7 @@ class ConfigurationStatus implements StatusProviderInterface
         if (strpos($defaultDatabaseCharset, 'utf8') !== 0) {
             // If the default character set is e.g. latin1, BUT all tables in the system are UTF-8,
             // we assume that TYPO3 has the correct charset for adding tables, and everything is fine
+            $queryBuilder = $connection->createQueryBuilder();
             $nonUtf8TableCollationsFound = $queryBuilder->select('table_collation')
                 ->from('information_schema.tables')
                 ->where(
@@ -265,6 +266,61 @@ class ConfigurationStatus implements StatusProviderInterface
                 $severity = ReportStatus::INFO;
                 $statusValue = $this->getLanguageService()->getLL('status_info');
             }
+        } elseif (isset($GLOBALS['TYPO3_CONF_VARS']['DB']['Connections'][ConnectionPool::DEFAULT_CONNECTION_NAME]['tableoptions'])) {
+            $message = $this->getLanguageService()->getLL('status_MysqlDatabaseCharacterSet_Ok');
+
+            $tableOptions = $GLOBALS['TYPO3_CONF_VARS']['DB']['Connections'][ConnectionPool::DEFAULT_CONNECTION_NAME]['tableoptions'];
+            if (isset($tableOptions['collate'])) {
+                $collationConstraint = $queryBuilder->expr()->neq('table_collation', $queryBuilder->quote($tableOptions['collate']));
+                $charset = $tableOptions['collate'];
+            } elseif (isset($tableOptions['charset'])) {
+                $collationConstraint = $queryBuilder->expr()->notLike('table_collation', $queryBuilder->quote($tableOptions['charset'] . '%'));
+                $charset = $tableOptions['charset'];
+            }
+
+            if (isset($collationConstraint)) {
+                $queryBuilder = $connection->createQueryBuilder();
+                $wrongCollationTablesFound = $queryBuilder->select('table_collation')
+                    ->from('information_schema.tables')
+                    ->where(
+                        $queryBuilder->expr()->andX(
+                            $queryBuilder->expr()->eq('table_schema', $queryBuilder->quote($connection->getDatabase())),
+                            $collationConstraint
+                        )
+                    )
+                    ->setMaxResults(1)
+                    ->execute();
+
+                if ($wrongCollationTablesFound->rowCount() > 0) {
+                    $message = sprintf($this->getLanguageService()->getLL('status_MysqlDatabaseCharacterSet_MixedCollations'), $charset);
+                    $severity = ReportStatus::ERROR;
+                    $statusValue = $this->getLanguageService()->getLL('status_checkFailed');
+                } else {
+                    if (isset($tableOptions['collate'])) {
+                        $collationConstraint = $queryBuilder->expr()->neq('collation_name', $queryBuilder->quote($tableOptions['collate']));
+                    } elseif (isset($tableOptions['charset'])) {
+                        $collationConstraint = $queryBuilder->expr()->notLike('collation_name', $queryBuilder->quote($tableOptions['charset'] . '%'));
+                    }
+
+                    $queryBuilder = $connection->createQueryBuilder();
+                    $wrongCollationColumnsFound = $queryBuilder->select('collation_name')
+                        ->from('information_schema.columns')
+                        ->where(
+                            $queryBuilder->expr()->andX(
+                                $queryBuilder->expr()->eq('table_schema', $queryBuilder->quote($connection->getDatabase())),
+                                $collationConstraint
+                            )
+                        )
+                        ->setMaxResults(1)
+                        ->execute();
+
+                    if ($wrongCollationColumnsFound->rowCount() > 0) {
+                        $message = sprintf($this->getLanguageService()->getLL('status_MysqlDatabaseCharacterSet_MixedCollations'), $charset);
+                        $severity = ReportStatus::ERROR;
+                        $statusValue = $this->getLanguageService()->getLL('status_checkFailed');
+                    }
+                }
+            }
         } else {
             $message = $this->getLanguageService()->getLL('status_MysqlDatabaseCharacterSet_Ok');
         }
index 9d383f5..a02dea1 100644 (file)
                        <trans-unit id="status_MysqlDatabaseCharacterSet_Info">
                                <source>Your default database uses a different charset, but all tables uses utf-8. All good. But consider fixing your database collation and check the table creation settings.</source>
                        </trans-unit>
+                       <trans-unit id="status_MysqlDatabaseCharacterSet_MixedCollations">
+                               <source>Your default database is set to create tables with character set "%1$s", but contains tables or columns with different collations. Please fix these tables to avoid "illegal mix of collations" errors.</source>
+                       </trans-unit>
                        <trans-unit id="status_encryptedConnectionStatus">
                                <source>Encrypted backend connection (HTTPS)</source>
                        </trans-unit>