Item 16: Prefer Catch-All Unpacking over Slicing

Notes

  • A limitation of basic unpacking is that you must know the length of the sequence being unpacked
    • For example, lets say we want to extract the two oldest car’s from a list
    • Below fails because it expects two items but encounters more than two
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, second_oldest = car_ages_descending
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 3
      1 car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
      2 car_ages_descending = sorted(car_ages, reverse=True)
----> 3 oldest, second_oldest = car_ages_descending

ValueError: too many values to unpack (expected 2, got 10)
  • It seems natural to attempt to resolve the above with indices and slicing (see Item 14)
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest = car_ages_descending[0]
second_oldest = car_ages_descending[1]
others = car_ages_descending[2:]
print(f"oldest: {oldest}, second oldest: {second_oldest}, others: {others}")
oldest: 20, second oldest: 19, others: [15, 9, 8, 7, 6, 4, 1, 0]
  • Works but is noisy
    • Would be nice to have the clean syntax of the unpacking
  • Also prone to off-by-one errors
    • If we modify the size of one set, we have to make sure the other subsets are synchronised
  • Python provides catch-all unpacking via the * operator
    • Let’s part of an unpacking expression receive all other values that aren’t matched
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, second_oldest, *others = car_ages_descending
print(f"oldest: {oldest}, second oldest: {second_oldest}, others: {others}")
oldest: 20, second oldest: 19, others: [15, 9, 8, 7, 6, 4, 1, 0]
  • Code is shorter, easier to read and less brittle to changes
  • Starred expression can appear at any point in an unpacking
    • Start, end, middle etc.
    • Benefits any time we have one optional slice
    • E.g. if we instead wanted to extract the oldest and the youngest car
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
oldest, *others, youngest = car_ages_descending

print(f"oldest: {oldest}, youngest: {youngest}, others: {others}")
oldest: 20, youngest: 0, others: [19, 15, 9, 8, 7, 6, 4, 1]
  • Whenever you use a star expression, you must have at least one required match,
    • e.g. Below generates a syntax error
car_ages = [0, 9, 4, 8, 7, 20, 19, 1, 6, 15]
car_ages_descending = sorted(car_ages, reverse=True)
*others = car_ages_descending
  Cell In[5], line 3
    *others = car_ages_descending
    ^
SyntaxError: starred assignment target must be in a list or tuple
  • You can only use one catch-all expression in an unpacking at the same level
    • e.g. Below fails too
first, *middle, *second_middle, last = [1,2,3,4]
  Cell In[6], line 1
    first, *middle, *second_middle, last = [1,2,3,4]
    ^
SyntaxError: multiple starred expressions in assignment
  • You can use multiple catch-all’s for different levels in a nested structure
    • Generally try to avoid this
    • It can make things hard to read
car_inventory = {
    "Downtown": ("Silver Shadow", "Pinto", "DMC"),
    "Airport" : ("Skyline", "Viper", "Gremlin", "Nova"),
}

((loc1, (best1, *rest1)),
 (loc2, (best2, *rest2))) = car_inventory.items()

print(f"Best at {loc1} is {best1}, others are {rest1}")
print(f"Best at {loc2} is {best2}, others are {rest2}")
Best at Downtown is Silver Shadow, others are ['Pinto', 'DMC']
Best at Airport is Skyline, others are ['Viper', 'Gremlin', 'Nova']
  • Starred expressions always become lists
  • If there is nothing left to unpack then the list is empty
    • Very useful when working with lists of at least \(N\) elements
short_list = [1, 2]
first, second, *rest = short_list

print(f"First: {first}, Second: {second}, Rest: {rest}")
First: 1, Second: 2, Rest: []
  • You can unpack arbitrary iterators
    • Typically not very useful with basic multiple assignment
    • Here we unpack values iterating over a range
    • In this case probably more useful to assign to a list matching the unpacking pattern
it = iter(range(1, 3))
first, second = it
print(f"{first} and {second}")
1 and 2
  • Much more useful when we have starred expressions
  • E.g. iterating over CSV data
    • Iterator yields rows from the CSV (+ the header)

We could process this using indices and slices

def generate_csv():
    yield ("Date", "Make", "Model", "Year", "Price")
    for i in range(100):
        yield ("2019-03-25", "Honda", "Fit", "2010", "$3400")
        yield ("2019-03-26", "Ford", "F150", "2008", "$2400")

all_csv_rows = list(generate_csv())
header = all_csv_rows[0]
rows = all_csv_rows[1:]
print("CSV header:", header)
print("Row count:", len(rows))
CSV header: ('Date', 'Make', 'Model', 'Year', 'Price')
Row count: 200
  • But we can also easily unpack using a starred expression
def generate_csv():
    yield ("Date", "Make", "Model", "Year", "Price")
    for i in range(100):
        yield ("2019-03-25", "Honda", "Fit", "2010", "$3400")
        yield ("2019-03-26", "Ford", "F150", "2008", "$2400")


it = generate_csv()
header, *rows = it
print("CSV header:", header)
print("Row count:", len(rows))
CSV header: ('Date', 'Make', 'Model', 'Year', 'Price')
Row count: 200
  • One thing to be careful is that the unpacking assignment will result in the entire iterator being read into memory
    • This could cause your program to crash
    • Only use catch-all unpacking for iterators when you have a good understanding of their size and can ensure it will fit in memory

Things to Remember

  • Unpacking assignments may include a starred expression to store all values not matching by any other variable capture
  • Starred expressions may appear in any position of the unpacking
    • But only once per level of unpacking
    • There must be at least one variable assignment that is not to the starred expression
  • The variable assigned by a starred expression will always be a list
    • If nothing is assigned to the variable, the list is empty
  • When dividing a list into non-overlapping parts, catch-all unpacking is less error-prone and cleaner to read than using separate slicing and indexing statements