Item 28: Know how to Construct Key-Dependent Default Values with __missing__

Notes

  • setdefault can be used to set a default value when a key is missing (Item 26)
  • defaultdict allows the default value to be generated by a instance specific function (Item 27)
  • Both have limitations in the full general case
  • Consider, managing social media profile pictures
    • Dictionary needs to map profile picture path names to file handles to read/write as required
  • An approach using a normal dictionary might be implemented as below
import os

pictures = {}
path = "foo_image.png"

if (handle := pictures.get(path)) is None:
    try:
        handle = open(path, "a+b")
    except OSError:
        print(f"Failed to open file {path}")
        raise
    else:
        pictures[path] = handle

handle.seek(0)
image_data = handle.read()

if os.path.exists(path):
    os.remove(path)
  • If the file exists

    • One access is performed
  • If the file doesn’t exist

    • One access is performed by get
    • Then an assignment is performed in the else case
  • Reading is then separated from the access

  • We could also use the in operator, or handle a KeyError

    • They don’t reduce the readability or the number of accesses
  • One could also try a setdefault approach

import os

pictures = {}
path = "foo_image.png"

try:
    handle = pictures.setdefault(path, open(path, "a+b"))
except OSError:
    print(f"Failed to open path")
else:
    handle.seek(0)
    image_data = handle.read()

if os.path.exists(path):
    os.remove(path)
  • Above has problems
    • open is called every time we query the dictionary for path
      • Can clause conflicts with existing file handles
    • Hard to differentiate exceptions raised by open from exceptions raised from setdefault
  • Our third option we’ve already explored is to instead use a defaultdict
from collections import defaultdict
import os

def open_picture(profile_path):
    try:
        return open(profile_path, "a+b")
    except OSError:
        print(f"failed to open path")
        raise

path = "foo_image.png"
pictures = defaultdict(open_picture)
handle = pictures[path]
handle.seek(0)
image_data = handle.read()

if os.path.exists(path):
    os.remove(path)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 13
     11 path = "foo_image.png"
     12 pictures = defaultdict(open_picture)
---> 13 handle = pictures[path]
     14 handle.seek(0)
     15 image_data = handle.read()

TypeError: open_picture() missing 1 required positional argument: 'profile_path'
  • However, this doesn’t work
    • defaultdict’s construction function interface expects that it takes no arguments
    • Mean’s we can’t make the default item parameterised by the path
  • The fourth solution is to overwrite the dictionaries default __missing__ method
    • __missing__ controls how the dictionary handles missing keys
    • Requires us to subclass dictionary
import os


def open_picture(profile_path):
    try:
        return open(profile_path, "a+b")
    except OSError:
        print(f"failed to open path")
        raise


class Pictures(dict):
    def __missing__(self, key):
        value = open_picture(key)
        self[key] = value
        return value

pictures = Pictures()
handle = pictures[path]
handle.seek(0)
image_data = handle.read()

if os.path.exists(path):
    os.remove(path)
  • __missing__ is called when pictures[path] accesses a non-existent keys
    • Delegates this out to the open_picture function
      • Creates a new value
    • Then assigns it to they key
    • Then returns the value
  • __missing__ is called when a key doesn’t exist and has three requirements
    1. It must construct the new value
    2. It must assign that value to the key
    3. It must return that value
  • Subsequent accesses will not call __missing__ again as they key now exists

Things to Remember

  • setdefault should be avoided when creating a default value has a high cost or may raise exceptions
  • The function for defaultdict’s default value construction cannot accept any arguments
    • Therefore the default value cannot be parameterised, including by the key value
  • The __missing__ is a dict dunder method that is called when a key is missing
    • It accepts an object instance and the missing key
    • Can be overwritten in a dict subclass to construct default key values parameterised by the key
    • __missing__ must create a new value, set the key to that value and return the value