6.9. Idiom Filter

  • filter(callable, *iterables)

  • Select elements from sequence

  • Generator (lazy evaluated)

  • required callable - Function

  • required iterables - 1 or many sequence or iterator objects

>>> def even(x):
...     return x % 2 == 0
>>>
>>> result = (x for x in range(0,5) if even(x))
>>> result = filter(even, range(0,5))

6.9.1. Not-a-Generator

>>> from inspect import isgeneratorfunction, isgenerator
>>>
>>>
>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> isgeneratorfunction(filter)
False
>>>
>>> result = filter(even, [1,2,3])
>>> isgenerator(result)
False

6.9.2. Problem

Plain code:

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = []
>>>
>>> for x in DATA:
...     if even(x):
...         result.append(x)
>>>
>>> print(result)
[2, 4, 6]

Comprehension:

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = [x for x in DATA if even(x)]
>>>
>>> print(result)
[2, 4, 6]

6.9.3. Solution

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = filter(even, DATA)
>>>
>>> list(result)
[2, 4, 6]

6.9.4. Lazy Evaluation

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> DATA = [1, 2, 3, 4, 5, 6]
>>> result = filter(even, DATA)
>>>
>>> next(result)
2
>>> next(result)
4
>>> next(result)
6
>>> next(result)
Traceback (most recent call last):
StopIteration

6.9.5. Performance

>>> def even(x):
...     return x % 2 == 0
>>>
>>>
>>> data = [1, 2, 3, 4, 5, 6]
>>> 
... %%timeit -r 1000 -n 1000
... result = [x for x in data if even(x)]
1.11 µs ± 139 ns per loop (mean ± std. dev. of 1000 runs, 1,000 loops each)
>>> 
... %%timeit -r 1000 -n 1000
... result = list(filter(even, data))
921 ns ± 112 ns per loop (mean ± std. dev. of 1000 runs, 1,000 loops each)

6.9.6. Use Case - 0x01

>>> users = [
...     {'age': 41, 'username': 'mwatney'},
...     {'age': 40, 'username': 'mlewis'},
...     {'age': 39, 'username': 'rmartinez'},
...     {'age': 40, 'username': 'avogel'},
...     {'age': 29, 'username': 'bjohanssen'},
...     {'age': 36, 'username': 'cbeck'},
... ]
>>> def above40(user):
...     return user['age'] >= 40
>>>
>>> def under40(user):
...     return user['age'] < 40
>>> result = filter(above40, users)
>>> list(result)  
[{'age': 41, 'username': 'mwatney'},
 {'age': 40, 'username': 'mlewis'},
 {'age': 40, 'username': 'avogel'}]
>>> result = filter(under40, users)
>>> list(result)  
[{'age': 39, 'username': 'rmartinez'},
 {'age': 29, 'username': 'bjohanssen'},
 {'age': 36, 'username': 'cbeck'}]

6.9.7. Use Case - 0x02

>>> users = [
...     {'is_admin': False, 'name': 'Mark Watney'},
...     {'is_admin': True,  'name': 'Melissa Lewis'},
...     {'is_admin': False, 'name': 'Rick Martinez'},
...     {'is_admin': False, 'name': 'Alex Vogel'},
...     {'is_admin': True,  'name': 'Beth Johanssen'},
...     {'is_admin': False, 'name': 'Chris Beck'},
... ]
>>>
>>>
>>> def admin(user):
...     return user['is_admin'] is True
>>>
>>>
>>> result = filter(admin, users)
>>> list(result)  
[{'is_admin': True, 'name': 'Melissa Lewis'},
 {'is_admin': True, 'name': 'Beth Johanssen'}]

6.9.8. Use Case - 0x03

>>> users = [
...     'mwatney',
...     'mlewis',
...     'rmartinez',
...     'avogel',
...     'bjohanssen',
...     'cbeck',
... ]
>>>
>>> admins = [
...     'mlewis',
...     'bjohanssen',
... ]
>>>
>>>
>>> def is_admin(user):
...     return user in admins
>>>
>>>
>>> result = filter(is_admin, users)
>>> list(result)
['mlewis', 'bjohanssen']

6.9.9. Use Case - 0x04

>>> class User:
...     firstname: str
...     lastname: str
...     groups: list[str]
...
...     def __init__(self, firstname, lastname, groups):
...         self.firstname = firstname
...         self.lastname = lastname
...         self.groups = groups
...
...     def __repr__(self):
...         return f'{self.firstname}'
...
>>> DATABASE = [
...     User('Mark', 'Watney', groups=['user', 'staff']),
...     User('Melissa', 'Lewis', groups=['user', 'staff', 'admin']),
...     User('Rick', 'Martinez', groups=['user', 'staff']),
...     User('Alex', 'Vogel', groups=['user']),
...     User('Beth', 'Johanssen', groups=['user', 'staff', 'admin']),
...     User('Chris', 'Beck', groups=['user', 'staff']),
... ]
>>> def is_user(user: User) -> bool:
...     return 'user' in user.groups
>>>
>>> def is_staff(user: User) -> bool:
...     return 'staff' in user.groups
>>>
>>> def is_admin(user: User) -> bool:
...     return 'admin' in user.groups
>>> users = filter(is_user, DATABASE)
>>> staff = filter(is_staff, DATABASE)
>>> admins = filter(is_admin, DATABASE)
>>> list(users)
[Mark, Melissa, Rick, Alex, Beth, Chris]
>>>
>>> list(staff)
[Mark, Melissa, Rick, Beth, Chris]
>>>
>>> list(admins)
[Melissa, Beth]

6.9.10. Assignments

Code 6.28. Solution
"""
* Assignment: Idiom Filter Apply
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 2 min

English:
    1. Define function `odd()`:
       a. takes one argument
       b. returns True if argument is odd
       c. returns False if argument is even
    2. Use `filter()` to apply function `odd()` to DATA
    3. Define `result: filter` with result
    4. Run doctests - all must succeed

Polish:
    1. Zdefiniuj funckję `odd()`:
       a. przyjmuje jeden argument
       b. zwraca True jeżeli argument jest nieparzysty
       c. zwraca False jeżeli argument jest parzysty
    2. Użyj `filter()` aby zaaplikować funkcję `odd()` do DATA
    3. Zdefiniuj `result: filter` z wynikiem
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * filter()

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from inspect import isfunction

    >>> assert isfunction(odd), \
    'Object `odd` must be a function'

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'

    >>> assert type(result) is filter, \
    'Variable `result` has invalid type, should be filter'

    >>> result = list(result)
    >>> assert type(result) is list, \
    'Evaluated `result` has invalid type, should be list'

    >>> assert all(type(x) is int for x in result), \
    'All rows in `result` should be int'

    >>> result
    [1, 3, 5, 7, 9]
"""


DATA = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


# Returns if number is odd (modulo divisible by 2 without reminder)
# type: Callable[[int], bool]
def odd(x):
    ...

# Cube numbers in DATA
# type: filter
result = ...


Code 6.29. Solution
"""
* Assignment: Idiom Filter Apply
* Type: class assignment
* Complexity: easy
* Lines of code: 7 lines
* Time: 5 min

English:
    1. Filter-out lines from `DATA` when:
        a. line is empty
        b. line has only spaces
        c. starts with # (comment)
    2. Use `filter()` to apply function `valid()` to DATA
    3. Define `result: filter` with result
    4. Run doctests - all must succeed

Polish:
    1. Odfiltruj linie z `DATA` gdy:
        a. linia jest pusta
        b. linia ma tylko spacje
        c. zaczyna się od # (komentarz)
    2. Użyj `filter()` aby zaaplikować funkcję `valid()` do DATA
    3. Zdefiniuj `result: filter` z wynikiem
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * filter()
    * str.splitlines()
    * str.startswith()
    * len()

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from inspect import isfunction

    >>> assert isfunction(valid), \
    'Object `valid` must be a function'

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'

    >>> assert type(result) is filter, \
    'Variable `result` has invalid type, should be filter'

    >>> result = list(result)
    >>> assert type(result) is list, \
    'Evaluated `result` has invalid type, should be list'

    >>> assert all(type(x) is str for x in result), \
    'All rows in `result` should be str'

    >>> list(result)  # doctest: +NORMALIZE_WHITESPACE
    ['127.0.0.1       localhost',
     '127.0.0.1       astromatt',
     '10.13.37.1      nasa.gov esa.int',
     '255.255.255.255 broadcasthost',
     '::1             localhost']
"""

DATA = """##
# `/etc/hosts` structure:
#   - IPv4 or IPv6
#   - Hostnames
##

127.0.0.1       localhost
127.0.0.1       astromatt
10.13.37.1      nasa.gov esa.int
255.255.255.255 broadcasthost
::1             localhost"""

# Filter-out lines from `DATA` when:
# - line is empty
# - line has only spaces
# - starts with # (comment)
# type: Callable[[str], bool]
def valid(line):
    ...

# Use `filter()` to apply function `valid()` to DATA
# type: filter
result = ...

Code 6.30. Solution
"""
* Assignment: Idiom Filter Apply
* Type: class assignment
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Filter-out non-numeric (int or float) values from `DATA`
    2. Define `result: filter` with result
    3. Run doctests - all must succeed

Polish:
    1. Odfiltruj nie numeryczne (int lub float) wartości z `DATA`
    2. Zdefiniuj `result: filter` z wynikiem
    3. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * filter()
    * isinstance()
    * type()

Tests:
    >>> import sys; sys.tracebacklimit = 0
    >>> from inspect import isfunction

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'

    >>> assert type(result) is filter, \
    'Variable `result` has invalid type, should be filter'

    >>> result = list(result)
    >>> assert type(result) is list, \
    'Evaluated `result` has invalid type, should be list'

    >>> assert all(type(x) in (int,float) for x in result), \
    'All rows in `result` should be str'

    >>> result
    [0, 2.0, 4, 5.0]
"""

DATA = [0, True, 2.0, 'three', 4, 5.0, ['six']]

# Filter-out non-numeric (int or float) values from `DATA`
# type: filter
result = ...