6.6. Encoding Unicode

>>> text = 'cześć'
>>> text.encode()
b'cze\xc5\x9b\xc4\x87'

6.6.1. UTF-32

  • Fixed-length encoding

  • 4 bytes per character

  • Supports all Unicode characters

>>> text = 'cześć'
>>> text.encode('utf-32')
b'\xff\xfe\x00\x00c\x00\x00\x00z\x00\x00\x00e\x00\x00\x00[\x01\x00\x00\x07\x01\x00\x00'

6.6.2. UTF-16

  • Fixed-length encoding

  • 2 bytes per character

  • Supports all Unicode characters

>>> text = 'cześć'
>>> text.encode('utf-16')
b'\xff\xfec\x00z\x00e\x00[\x01\x07\x01'

6.6.3. UTF-8

  • Variable-length encoding

  • 1 to 4 bytes per character

  • Supports all Unicode characters

  • Most common encoding for web pages

  • Compatible with ASCII

>>> text = 'cześć'
>>> text.encode('utf-8')
b'cze\xc5\x9b\xc4\x87'