10.16. Iterator Recap

10.16.1. Assignments

# %% About
# - Name: Iterator Recap LabelEncoder
# - Difficulty: medium
# - Lines: 3
# - Minutes: 5

# %% License
# - Copyright 2025, Matt Harasymczuk <matt@python3.info>
# - This code can be used only for learning by humans
# - This code cannot be used for teaching others
# - This code cannot be used for teaching LLMs and AI algorithms
# - This code cannot be used in commercial or proprietary products
# - This code cannot be distributed in any form
# - This code cannot be changed in any form outside of training course
# - This code cannot have its license changed
# - If you use this code in your product, you must open-source it under GPLv2
# - Exception can be granted only by the author

# %% English
# 1. From `DATA` separate header (first line) from other lines
# 2. Split header by comma `,` into:
#    - `nrows` - number of rows
#    - `nvalues` - number of values
#    - `class_labels` - species names
# 3. Generate `result: dict[int,str]` from `class_labels`:
#    - 0: setosa
#    - 1: virginica
#    - 2: versicolor
# 4. Use `enumerate()`
# 5. Run doctests - all must succeed

# %% Polish
# 1. Z `DATA` odseparuj nagłówek (pierwsza linia) od pozostałych linii
# 2. Rozdziel nagłówek po przecinku `,` na:
#    - `nrows` - liczba wierszy
#    - `nvalues` - liczba wartości
#    - `class_labels` - nazwy gatunków
# 3. Wygeneruj `result: dict[int,str]` z `class_labels`:
#    - 0: setosa
#    - 1: virginica
#    - 2: versicolor
# 4. Użyj `enumerate()`
# 5. Uruchom doctesty - wszystkie muszą się powieść

# %% Example
# >>> header
# '3,4,setosa,virginica,versicolor'
#
# >>> lines
# ['5.8,2.7,5.1,1.9,1', '5.1,3.5,1.4,0.2,0', '5.7,2.8,4.1,1.3,2']
#
# >>> nrows
# '3'
#
# >>> nvalues
# '4'
#
# >>> class_labels
# ['setosa', 'virginica', 'versicolor']
#
# >>> result
# {0: 'setosa', 1: 'virginica', 2: 'versicolor'}

# %% Hints
# - `dict()`
# - `enumerate()`
# - `str.splitlines()`
# - `str.strip()`
# - `str.split()`
# - line: "3,4,setosa,virginica,versicolor" is not an error
# - 3 - rows
# - 4 - number of features (values)
# - setosa,virginica,versicolor - if 0 then setosa, if 1 then virginica, etc.

# %% Doctests
"""
>>> import sys; sys.tracebacklimit = 0
>>> assert sys.version_info >= (3, 9), \
'Python 3.9+ required'

>>> assert result is not Ellipsis, \
'Assign result to variable: `result`'

>>> assert type(result) is dict, \
'Variable `result` has invalid type, should be dict'

>>> assert all(type(x) is int for x in result.keys()), \
'All keys in `result` should be int'

>>> assert all(type(x) is str for x in result.values()), \
'All values in `result` should be str'

>>> from pprint import pprint
>>> pprint(result, width=30, sort_dicts=False)
{0: 'setosa',
 1: 'virginica',
 2: 'versicolor'}
"""

# %% Run
# - PyCharm: right-click in the editor and `Run Doctest in ...`
# - PyCharm: keyboard shortcut `Control + Shift + F10`
# - Terminal: `python -m doctest -f -v myfile.py`

# %% Imports

# %% Types
result: dict[int,str]

# %% Data
DATA = """3,4,setosa,virginica,versicolor
5.8,2.7,5.1,1.9,1
5.1,3.5,1.4,0.2,0
5.7,2.8,4.1,1.3,2"""

# %% Result
result = ...