iljitsch.com

blog-onderwerpen: alles · BGP / IPv6 / meer · instellingen · zw/w · mijn bedrijf: inet⁶ consult · Twitter · Mastodon · LinkedIn · email · 🇺🇸 🇳🇱

Hoi, ik ben Iljitsch van Beijnum. Op deze pagina staan alle posts over alle onderwerpen.

MySQL Unicode weirdness

After looking at the SQLite Unicode behavior, it's now time to do the same for MySQL. Coincidentally, I'm currently migrating some old databases that were created in the very early 2000s to a more modern environment. I think those old databases were from the MySQL 3.x days, before MySQL gained any sort of Unicode support. Those old tables are thus still in the latin1 (ISO 8859-1) character set.

But I encountered some MySQL/Unicode weirdness...

Full article / permalink - posted 2023-09-21

Looking at MySQL Unicode behavior

https://en.wikipedia.org/wiki/European_ordering_rules

But as the transition from 8-bit characters to 16-bit characters was already well underway in the Windows world and in languages such as Java and Javascript,

# mysql --default_character_set=utf8mb4 muada Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 98 Server version: 8.0.34 MySQL Community Server - GPL

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.

Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.

mysql> SHOW SESSION VARIABLES LIKE 'character_set%'; +--------------------------+--------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | /usr/share/mysql-8.0/charsets/ | +--------------------------+--------------------------------+

DUCET

https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table https://www.unicode.org/reports/tr10/

http://www.unicode.org/Public/UCA/latest/allkeys.txt

https://en.wikipedia.org/wiki/European_ordering_rules https://en.wikipedia.org/wiki/ISO/IEC_14651

mysql> insert into test (id, text) values (5, concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4))); Query OK, 1 row affected (0.00 sec)

mysql> insert into test (id, text) values (6, concat('Zoe', char(12*16 + 12, 8*16 + 8 using utf8mb4)));

mysql> insert into test (id, text) values (7, 'Zoe');

mysql> select * from test order by text collate utf8mb4_bin; +----+-------+ | id | text | +----+-------+ | 1 | ZOE | | 4 | ZoA | | 7 | Zoe | | 6 | Zoë | | 3 | Zoy | | 5 | Zoë | | 2 | zoe | +----+-------+ 7 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_bin) as sortweight from test order by text colla +----+-------+----------------------------+ | id | text | sortweight | +----+-------+----------------------------+ | 1 | ZOE | 0x00005A00004F000045 | | 4 | ZoA | 0x00005A00006F000041 | | 7 | Zoe | 0x00005A00006F000065 | | 6 | Zoë | 0x00005A00006F000065000308 | | 3 | Zoy | 0x00005A00006F000079 | | 5 | Zoë | 0x00005A00006F0000EB | | 2 | zoe | 0x00007A00006F000065 | +----+-------+----------------------------+

mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test order by text collate utf8mb4_0900_ai_ci; +----+-------+------------------------+ | id | text | sortweight | +----+-------+------------------------+ | 4 | ZoA | 0x1F211DDD1C47 | | 1 | ZOE | 0x1F211DDD1CAA | | 2 | zoe | 0x1F211DDD1CAA | | 5 | Zoë | 0x1F211DDD1CAA | | 6 | Zoë | 0x1F211DDD1CAA | | 7 | Zoe | 0x1F211DDD1CAA | | 3 | Zoy | 0x1F211DDD1F0B | +----+-------+------------------------+

mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test order by text collate utf8mb4_0900_ai_ci desc; +----+-------+------------------------+ | id | text | sortweight | +----+-------+------------------------+ | 3 | Zoy | 0x1F211DDD1F0B | | 1 | ZOE | 0x1F211DDD1CAA | | 2 | zoe | 0x1F211DDD1CAA | | 5 | Zoë | 0x1F211DDD1CAA | | 6 | Zoë | 0x1F211DDD1CAA | | 7 | Zoe | 0x1F211DDD1CAA | | 4 | ZoA | 0x1F211DDD1C47 | +----+-------+------------------------+

mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test order by text collate utf8mb4_0900_as_ci; +----+-------+------------------------------------+ | id | text | sortweight | +----+-------+------------------------------------+ | 4 | ZoA | 0x1F211DDD1C470000002000200020 | | 1 | ZOE | 0x1F211DDD1CAA0000002000200020 | | 2 | zoe | 0x1F211DDD1CAA0000002000200020 | | 7 | Zoe | 0x1F211DDD1CAA0000002000200020 | | 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B | | 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B | | 3 | Zoy | 0x1F211DDD1F0B0000002000200020 | +----+-------+------------------------------------+

mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test order by text collate utf8mb4_0900_as_cs; +----+-------+--------------------------------------------------------+ | id | text | sortweight | +----+-------+--------------------------------------------------------+ | 4 | ZoA | 0x1F211DDD1C4700000020002000200000000800020008 | | 2 | zoe | 0x1F211DDD1CAA00000020002000200000000200020002 | | 7 | Zoe | 0x1F211DDD1CAA00000020002000200000000800020002 | | 1 | ZOE | 0x1F211DDD1CAA00000020002000200000000800080008 | | 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | | 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | | 3 | Zoy | 0x1F211DDD1F0B00000020002000200000000800020002 | +----+-------+--------------------------------------------------------+

ê

mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_bin; Empty set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_as_cs; Empty set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_ai_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_cs; ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_cs' mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci; +----+-------+------------------------+ | id | text | sortweight | +----+-------+------------------------+ | 1 | ZOE | 0x1F211DDD1CAA | | 2 | zoe | 0x1F211DDD1CAA | | 5 | Zoë | 0x1F211DDD1CAA | | 6 | Zoë | 0x1F211DDD1CAA | | 7 | Zoe | 0x1F211DDD1CAA | +----+-------+------------------------+ 5 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text utfmb4_0900_as_cs; ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'utfmb4_0900_as_cs' at line 1 mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text collate utfmb4_0900_as_cs; ERROR 1273 (HY000): Unknown collation: 'utfmb4_0900_as_cs' mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 10 using utf8mb4)) collate utf8mb4_0900_ai_ci order by text collate utf8mb4_0900_as_cs; +----+-------+------------------------+ | id | text | sortweight | +----+-------+------------------------+ | 2 | zoe | 0x1F211DDD1CAA | | 7 | Zoe | 0x1F211DDD1CAA | | 1 | ZOE | 0x1F211DDD1CAA | | 5 | Zoë | 0x1F211DDD1CAA | | 6 | Zoë | 0x1F211DDD1CAA | +----+-------+------------------------+ 5 rows in set (0.00 sec)

mysql> mysql> mysql> mysql> mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_bin; +----+------+------------------------+ | id | text | sortweight | +----+------+------------------------+ | 5 | Zoë | 0x5A6FC3AB | +----+------+------------------------+ 1 row in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_cs; +----+-------+--------------------------------------------------------+ | id | text | sortweight | +----+-------+--------------------------------------------------------+ | 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | | 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | +----+-------+--------------------------------------------------------+ 2 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_ai_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_ai_cs; ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_cs' mysql> mysql> mysql> mysql> mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_bin; +----+------+------------------------+ | id | text | sortweight | +----+------+------------------------+ | 5 | Zoë | 0x5A6FC3AB | +----+------+------------------------+ 1 row in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_ci; +----+-------+------------------------------------+ | id | text | sortweight | +----+-------+------------------------------------+ | 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B | | 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B | +----+-------+------------------------------------+ 2 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test where text = concat('Zo', char(12*16 + 3, 10*16 + 11 using utf8mb4)) collate utf8mb4_0900_as_cs; +----+-------+--------------------------------------------------------+ | id | text | sortweight | +----+-------+--------------------------------------------------------+ | 5 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | | 6 | Zoë | 0x1F211DDD1CAA0000002000200020002B00000008000200020002 | +----+-------+--------------------------------------------------------+ 2 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_bin) as sortweight from test where text = 'zoe' collate utf8mb4_0900_bin; +----+------+------------------------+ | id | text | sortweight | +----+------+------------------------+ | 2 | zoe | 0x7A6F65 | +----+------+------------------------+ 1 row in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_cs) as sortweight from test w here text = 'zoe' collate utf8mb4_0900_as_cs; +----+------+------------------------------------------------+ | id | text | sortweight | +----+------+------------------------------------------------+ | 2 | zoe | 0x1F211DDD1CAA00000020002000200000000200020002 | +----+------+------------------------------------------------+ 1 row in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_as_ci) as sortweight from test where text = 'zoe' collate utf8mb4_0900_as_ci; +----+------+--------------------------------+ | id | text | sortweight | +----+------+--------------------------------+ | 1 | ZOE | 0x1F211DDD1CAA0000002000200020 | | 2 | zoe | 0x1F211DDD1CAA0000002000200020 | | 7 | Zoe | 0x1F211DDD1CAA0000002000200020 | +----+------+--------------------------------+ 3 rows in set (0.00 sec)

mysql> select *, weight_string(text collate utf8mb4_0900_ai_ci) as sortweight from test where text = 'zoe' collate utf8mb4_0900_ai_ci; +----+-------+------------------------+ | id | text | sortweight | +----+-------+------------------------+ | 1 | ZOE | 0x1F211DDD1CAA | | 2 | zoe | 0x1F211DDD1CAA | | 5 | Zoë | 0x1F211DDD1CAA | | 6 | Zoë | 0x1F211DDD1CAA | | 7 | Zoe | 0x1F211DDD1CAA | +----+-------+------------------------+

mysql> show collation like '%utf8mb4_uni%'; +------------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +------------------------+---------+-----+---------+----------+---------+---------------+ | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 | PAD SPACE | | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE | +------------------------+---------+-----+---------+----------+---------+---------------+

mysql> show collation like '%utf8mb4_0900%'; +--------------------+---------+-----+---------+----------+---------+---------------+ | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute | +--------------------+---------+-----+---------+----------+---------+---------------+ | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD | | utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD | | utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD | | utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD | +--------------------+---------+-----+---------+----------+---------+---------------+

Permalink - posted 2023-09-16

Should the datacenter be in the middle?

The other day, I landed on this article: In Focus: Subsea Network Architecture: IXPs. The article takes some time to arrive at the point that undersea internet exchanges would be a good idea. The most eyecatching part is a variation on this image:

But should the datacenter and/or internet exhange in the middle between multiple users?

Full article / permalink - posted 2023-09-07

Looking at SQLite Unicode behavior

In this post, I want to have a look at how SQLite interacts with Unicode. (Also see my post The (dark) magic of Unicode.) As explained here, SQLite doesn't have full Unicode support unless that support is explicitly included when SQLite is compiled.

So what does this mean in practice?

Full article / permalink - posted 2023-09-05

SQLite: add a powerful database engine to any app

When I was 24, I decided to give up my job and go to college and study computer science. If I'd have known how many database classes that involved, maybe I would have reconsidered.

Back then, we had a big server that ran a RDBMS (relational database management system) that hundreds of students all used together. These systems were big, complex and expensive. (Oracle made its fortune selling RDBMSes.) MySQL and PostgresQL are somewhat more streamlined free and open source RDBMSes. Much better, but firewalling, user authentication and backups are still somewhat of a headache. But hey, if you need a database, you need a database.

Enter SQLite.

Full article / permalink - posted 2023-09-04

Skyline #20, regenboog-editie

Image link - posted 2023-08-27 in

oudere posts - nieuwere posts

Search for:
RSS feed

Archives: 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024