Chapter 8: Storing Collections of Data

Notes

Lists and Tracking Sales

  • Consider the following vignette
  • The owner of an ice-cream stand wants a program to track sales
    • There are ten stands, each selling multiple items
    • The program should take sales data as input and then provide the following views on the data
      • Sorted from lowest to highest
      • Sorted from highest to lowest
      • Show just the highest and the lowest
      • Show the total number of sales
      • Show the average number of sales
Important

Getting the specification right: Storyboarding

Agreeing on the specification with your client is important. A technique is called storyboarding, best done by sitting down with a paper and pen (or a whiteboard)

A storyboard shows how the program should flow in response to various user inputs. E.g. depicting the menus the user might use, with a storyboard for each menu choice. The storyboard should also show how the program will work

For bigger programs you can break different components out into their own storyboards, much in the same way we built up functions. Storyboards depict what needs to happen, but not how to do it.

  • Given the spec for the ice cream stand we can now outline the program

    1. Store the sales data in variables
    2. Implement a way to sort the data
    3. A way to print the output
    4. Store the data globally and pass it to functions to handle the work
  • We can construct the prototype interface, similar to the Ride Selector Program

      Ice-Cream Sales
    
      1: Print the Sales
      2: Sort Low to High
      3: Sort High to Low
      4: Highest and Lowest
      5: Total Sales
      6: Average Sales
      7: Enter Figures
    
      Enter your command: 3

Limitations of Individual Variables

  • We first need to store the sales
    • For ten stores, we could theoretically use ten variables, one for each store

    • But this method becomes clunky when we want to start analysing the variables

    • E.g. the following code (FindingLargestSales.py), only handles finding if the first stand is the one with the greatest sales

        # Example 8.1 Finding the Largest Sales
        #
        # Checks if sales1 has the largest sales. Demonstrates the difficulty of using
        # individual named variables to deal with aggregate data
      
        import BTCInput
      
        sales1 = BTCInput.read_int("Enter the sales for stand 1: ")
        sales2 = BTCInput.read_int("Enter the sales for stand 2: ")
        sales3 = BTCInput.read_int("Enter the sales for stand 3: ")
        sales4 = BTCInput.read_int("Enter the sales for stand 4: ")
        sales5 = BTCInput.read_int("Enter the sales for stand 5: ")
        sales6 = BTCInput.read_int("Enter the sales for stand 6: ")
        sales7 = BTCInput.read_int("Enter the sales for stand 7: ")
        sales8 = BTCInput.read_int("Enter the sales for stand 8: ")
        sales9 = BTCInput.read_int("Enter the sales for stand 9: ")
        sales10 = BTCInput.read_int("Enter the sales for stand 10: ")
      
        if (
            sales1 > sales2
            and sales1 > sales3
            and sales1 > sales4
            and sales1 > sales5
            and sales1 > sales6
            and sales1 > sales7
            and sales1 > sales8
            and sales1 > sales9
            and sales1 > sales10
        ):
            print("Stand 1 had the best sales")
    • Problem: We would have to repeat the code each time for each individual sales variable

    • If we add more stands, we have add another named variable and another big if statement

      • AND modify all the previous if statements
  • Clearly this approach is not very maintainable

Lists in Python

  • A collection is a composite type
    • It stores multiple elements of another type
  • We’ve already (briefly) seen one type of collection the tuple
  • The most common form of collection is the list
    • What it sounds like, a list of items
Make Something Happen: Creating a List

Open a python interpreter and work through the following steps to learn about list

  1. A list is created using brackets around the contents [], e.g.

     sales = []
    • The above defines sales as an empty list
  2. Items can be appended to a list using the append function

     sales.append(99)
     sales
    [99]
    • As we can see from above sales now contains the value 99
  3. Calling append again, adds the new item to the end of the list

     sales.append(100)
     sales
    [99, 100]
  4. Observe from above you can see the contents of a list, by simply typing the variable name in the interpreter

    • In scripts we can also use the explicit print call

        print(sales)
      [99, 100]
  5. You can access individual items of the list, using the indexing operator []

     sales[0]
    99
    • Syntax is list_name[index] where index is an integer giving the index of the item
    • Python lists are zero-indexed. i.e. the first value is stored at index \(0\)
  6. The indexing operator can be used to change the value of an item at a given index

     sales[1] = 101
     sales
    [99, 101]
    • The above changes the value of the second item in sales to \(101\)
    Warning

    Indexed elements must exist

    Whenever we use the indexing operator the index must exist! For example if we tried to view the (non-existent) third item, we would get an error, e.g.

     example_list = [1, 2]
     print(example_list[2])
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    Cell In[7], line 2
          1 example_list = [1, 2]
    ----> 2 print(example_list[2])
    
    IndexError: list index out of range

    The above illustrates the common off-by-one error where we access the last index past the list rather than the last element of the list. Here the type of exception thrown is called an IndexError

  7. A single list can store values of different types, and can replace items with new items of a different type

     sales.append("Rob")
     sales[0] = "Python"
     sales
    ['Python', 101, 'Rob']
    • The above appends a new string "Rob", converts sales[0] from an int to the string "Python" and leaves the number \(101\) in sales[1] untouched
    • Overall list thus mixes string and integer types*
Warning

Avoid Mixing Types in Lists

ust because you can* mix types in lists, doesn’t mean you should. Typically lists and list processing is much easier when a list stores all items of the same type*

Read in a List

  • You can use loops to populate a list (see ReadAndDisplay.py)

      # Example 8.2.1 Read and Display
      #
      # Demonstrates using a loop to populate a list
    
      import BTCInput
    
      # create an empty list to populate
      sales = []
    
      for count in range(1, 11):
          prompt = "Enter the sales for stand " + str(count) + ": "
          sales.append(BTCInput.read_int(prompt))
    
      print(sales)
Code Analysis: Investigate a List Reading Loop

Examine the code given above and consider the following questions to understand how the list is processed

  1. What is the purpose of the count variable?

    • count tracks the value of the current index in the loop. This is used to print the id for the sales stand we are collecting the data from
  2. Why does the range of count go from \(1\) to \(11\)?

    • The range function returns a collection with the start included but the stop excluded. Since we have stores \(1\) through \(10\), we want the range to go from \(1\) to \(11\) so the generated numbers are \(1\) through to \(10\)
  3. Which item in the list would hold the sales for stand number \(1\)?

    • The first item in the list, or the zeroth indexed, i.e. sales[0]
  4. What part of the code would have to be changed if we instead had \(100\) stands?

    • We simply change range(1,11) through to range(1,101)

    • The program below (ReadAndDisplay2.py) is a variant in which the user specifies the number of stands

        # Example 8.2.2 Read and Display 2
        #
        # Improved version of Read and Display which allows the user to specify
        # the number of stands
      
        import BTCInput
      
        # create an empty list to populate
        sales = []
      
        number_of_stands = BTCInput.read_int("Enter the number of stands: ")
        for count in range(1, number_of_stands + 1):
            prompt = "Enter the sales for stand " + str(count) + ": "
            sales.append(BTCInput.read_int(prompt))
      
        print(sales)
    • The above is more flexible, but as a result it is more complicated, the trade off between flexibility and ease of use is one that should be considered with the input of the users

  5. If I got one sales value wrong, would it be possible to edit the list to put in a corrected version?

    • This is not implemented in the current program, but we have already seen that you can reassign the value of list at a given index, so we could implement this in a more complete program

Display a list using a for Loop

  • We’ve already seen that print has a default way of displaying a list

  • We can use a for loop for if we want custom printing for each item

      # Example 8.3 Read and Display Loop
      #
      # Uses a for loop to provide custom list printing
    
      import BTCInput
    
      sales = []
    
      for count in range(1, 11):
          prompt = "Enter the sales for stand " + str(count) + ": "
          sales.append(BTCInput.read_int(prompt))
    
      # print a heading
      print("Sales Figures")
      count = 1
      for sales_value in sales:
          print("Sales for stand", count, "are", sales_value)
          count = count + 1
Make Something Happen: Read the Names of Guests for a Party

Lists can hold any type of data that you need to store, including strings. You can change the ice-cream sales program to read and store the names of guests for a party or an event you’re planning. Make a modified version of the sales program that reads in some guest names and then displays them. Make your program handle between \(5\) and \(15\) guests

  • We basically just copy the previous program with the following changes
    • sales \(\rightarrow\) guests
    • sales_value \(\rightarrow\) guest
    • We change the prompts to appropriately refer to guests rather than sales
  • The two main changes are
    1. We add an initial prompt for the number of guests
      • We use BTCInput.read_int_ranged to ensure the value is from \(5\) to \(15\)
    2. We use BTCInput.read_text instead of BTCInput.read_int to get the guest names
    # Exercise 8.1 Party Guests
    #
    # A program that receives and then prints a list of party guests
    # Works for between 5 and 15 guests

    import BTCInput

    guests = []
    number_of_guests = BTCInput.read_int_ranged(
        "Enter the number of guests (5-15): ", 5, 15
    )

    for count in range(1, number_of_guests + 1):
        prompt = "Enter the name of guest " + str(count) + ": "
        guests.append(BTCInput.read_text(prompt))

    # print a heading
    print("\nGuests attending:")
    count = 1
    for guest in guests:
        print("- ", guest)
        count = count + 1

Refactor Programs into Functions

  • The previous examples build up our program as one long chain of events

  • However, if we think about our program this isn’t strictly the cleanest

    • There are two distinct responsibilities occuring
      1. First we read in the data
      2. Second we display the data
    • These are natural candidates to be converted into functions
  • By pairing these behaviours the program locks us into one way of processing data

    • What happens if we want to read in a second set of data?
    • What if we want to print the data multiple times?
  • Refactoring is the process of modifying existing code

    • Specifically changing how factors interact
  • Refactoring avoids the problem of overcomplicating the design at the start of the process

    • Instead we write the program the most simple way we can
    • Then once a structure emerges, or we need to add functionality we can refactor the design
  • Let us factor out the two key components identified above into a new implementation (Functions.py)

      # Example 8.4 Functions
      #
      # Demonstrates refactoring a program into component functions
    
      import BTCInput
    
      sales = []
    
    
      def read_sales(number_of_sales):
          """
          Reads in the sales values and stores them in the sales list
    
          Parameters
          ----------
          number_of_sales : int
              Number of Stores to record sales values for
    
          Returns
          -------
          None
              Results are read into the sales list
          """
          sales.clear()  # remove existing sales values
          for count in range(1, number_of_sales + 1):
              prompt = "Enter the sales for stand " + str(count) + ": "
              sales.append(BTCInput.read_int(prompt))
    
    
      def print_sales():
          """
          Prints the sales figures on the screen with a heading.
    
          Each figure is numbered in sequence
    
          Returns
          -------
          None
          """
          print("Sales Figures")
          count = 1
          for sales_value in sales:
              print("Sales for stand", count, "are", sales_value)
              count = count + 1
    
    
      read_sales(10)
      print_sales()

Code Analysis: Functions in the Sales Analysis Program

Our sales analysis program now consists of two functions, read_sales and print_sales

  1. What does the parameter for the read_sales function do?

    • We hinted at in the previous section that we might want to account for the potential for the number of stands to change in a future implementation. To support this behaviour read_sales reads in the number of sales value that it should reads
  2. What does clear do?

    • We want to start with a fresh list every time we read the sales values
    • clear is a method on list objects that clears its contents
  3. Why don’t we need to tell the print_sales function how many sales figures to print?

    • The for loop goes through the contents of the sales list
    • A list tracks its own size
    • In some languages like C, containers do not naturally track their sizes and we would need to specify them
  4. Why didn’t we have to write global sales in the read_sales function?

    • Python variable names are references to memory
    • These are distinct from the objects that live in that memory
    • Assignments change what object a reference (variable) refers to
      • e.g. sales=[]
    • However, calling methods on a variable, is not changing the reference e.g. sales.append(99) (They change the object contents)
      • So we don’t need to use global because by calling methods its clear what reference we’re using

Create Placeholder Functions

  • A development technique called stubs is where we write placeholder functions before we can provide a complete implementation for a given behaviour
  • The placeholders are sometimes called stub functions e.g. the two below
def sort_high_to_low():
    """
    Print out a sales list from highest to lowest

    Returns
    -------
    None

    See Also
    --------
    sort_low_to_high : sorts from lowest to highest
    """
    pass


def sort_low_to_high():
    """
    Print out a sales list from lowest to highest

    Returns
    -------
    None

    See Also
    --------
    sort_high_to_low : sorts from highest to lowest
    """
    pass
  • Placeholders let us model the flow of program before we have all the behaviours specified
    • Obviously does not model the complete program since the functions are incomplete
  • pass is a keyword for a statement that does nothing
    • It is effectively a placeholder statement

Create a User Menu

  • At the start of the Chapter we defined a user interface
    • By using the previous discussion on stubbing, and our initial functions we can implement this menu (see the full implementation in FunctionsAndMenu.py)
      menu = """
      Ice Cream Sales
    
      1. Print the Sales
      2. Sort High to Low
      3. Sort Low to High
      4. Highest and Lowest
      5. Total Sales
      6. Average Sales
      7. Enter Figures
    
      Enter your command: """
    
      command = BTCInput.read_int_ranged(menu, 1, 7)
    
      if command == 1:
          print_sales()
      elif command == 2:
          sort_high_to_low()
      elif command == 3:
          sort_low_to_high()
      elif command == 4:
          highest_and_lowest()
      elif command == 5:
          total_sales()
      elif command == 6:
          average_sales()
      elif command == 7:
          read_sales(10)
      else:
          raise ValueError("Unexpected value " + str(command) + " found")
  • We use stub functions for the unimplemented behaviour
Tip

Using Else Clauses to Guard Against Modification

In the example above the final else clause should never trip because we expect the result of BTCInput.read_int_ranged(menu, 1, 7) to be between \(1\) and \(7\) (inclusive) which is captured by the if..elif chain

Why then do we include the else clause? The reason is to protect against modification. This could include,

  1. The author of BTCInput introduces a bug in read_int_ranged that allows invalid input to leak through
  2. Someone editing the sales program changes the allowed range of input for read_int_ranged (perhaps to introduce new functions) but forgets to include them in the elif chain

In either case, the else clause trips, and rather than a silent error which may have occured if we expected the else to catch a \(7\), or if there was no else an exception is raised, which immediately notifies us that there’s a problem in the code

This technique of guarding against potential modifications is a simple technique for catching sources of errors and making sure you’re confirming your assumptions

Use the elif keyword to simplify conditions
  • In many of the examples and exercises I’ve used elif to simplify cases where we would otherwise have a bunch of nested if...else conditions.
  • elif is short for else if and is effectively a next condition to check if the first if (or all preceding elif) statement is False
    • All elif conditions must come before the else

Sort Using Bubble Sort

  • Sorting is a common task for computing programs
  • It can be time-intensive
  • There are often multiple ways that we may wish to sort things, e.g.
    • Alphabetically vs Numerically
    • Increasing vs Decreasing
    • Case-sensitive vs Case-insensitive
  • Traditional sorts are down, one item (or pair of items) at a time
  • Algorithms, are a sequence of steps that solve a problem
    • Sorting Algorithms are algorithms that sort collections
    • Programming is really the implementation of an algorithm
  • Bubble Sort is a simple sorting algorithm
    • Easy to follow and understand
    • Not scalable to larger data sets

Initialise a list with Test Data

  • Often when implementing an algorithm we want to use a fixed set of test data
    • i.e. Data for which we can easily know the desired final state or output
    • Allows us to check our algorithm is not incorrect
  • We can define a list in python with some contents,
sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]

Sort a List from High to Low

block-beta
    columns 6

    classDef BG stroke:transparent, fill:transparent

    index["Index"]:1
    class index BG

    block:Indices:5
    columns 10
        0
        1
        2
        3
        4
        5
        6
        7
        8
        9
    end

    value["Value"]:1
    class value BG

    block:Values:5
    columns 10
        50
        54_1["54"]
        29
        33
        22
        100
        45
        54_2["54"]
        89
        75
    end

  • The above shows how the test data looks in a python list
  • For a highest to lowest sort we want the largest value to be in index \(0\) and the lowest in index \(9\)
  • The basic idea of Bubble sort is to compare neighbouring values, if the right value is larger we want to swap them so the larger value is on the left
    • Thus closer to the top of the list
Important

Swap Two Values in a Variable

The following code to swap two variables is broken,

if sales[0] < sales[1]:
    # the two items are in the wrong order and must be swapped
    sales[0] = sales[1]
    sales[1] = sales[0]

Why? Lets work through what happens

  1. sales[0] is set to the value of sales[1]
  2. sales[1] is set to the current value of sales[0]
  3. But, sales[0] has already been set to sales[1]
    • So sales[1] is set to the same value it already has

The net result is that we only copy sales[1] to sales[0]

The correct implementation is given below,

if sales[0] < sales[1]:
    temp = sales[0]
    sales[0] = sales[1]
    sales[1] = temp

temp is used to store the value of sales[0] before it was overwritten

Obviously, we don’t want to write the code with explicit reference to indices. However we can write this generically with a for loop as below

for count in range(0, len(sales) - 1):
    if sales[count] < sales[count - 1]:
        temp = sales[count]
        sales[count] = sales[count + 1]
        sales[count + 1] = temp
Code Analysis: Work through a List using a Loop

The above code uses some new python features. Work through the following questions to understand what’s going on

  1. Why have you used a for loop, rather than a while loop?

    • We could use either, the for loop is slightly smaller since we don’t have to manually increment count
    • Additionally range technically returns what is called a generator,
    • This is more memory efficient
      • Rather than creating a full list of numbers in memory, it just returns the next number each time the for loop requests it
  2. What does the len function do on line \(1\)?

    • len returns the length of a collection, i.e. the number of items in the collection
    • This lets you write code that is insensitive to the size of the collection being worked with
    • Means our sorting code could work on any length list
  3. Why is the limit of count the length of the list minus 1?

    • This is because bubble sort compares the current item to the item to its right, i.e. at the next index
    • If the range goes to the last index, then program will try an access an element one past the end of the list which doesn’t exist
      • This will cause an error. e.g.
     a_list = [1,2]
     for count in range(0, len(a_list)):
         if a_list[count] < a_list[count + 1]:
             temp = a_list[count]
             a_list[count] = a_list[count + 1]
             a_list[count + 1] = temp
    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    Cell In[9], line 3
          1 a_list = [1,2]
          2 for count in range(0, len(a_list)):
    ----> 3     if a_list[count] < a_list[count + 1]:
          4         temp = a_list[count]
          5         a_list[count] = a_list[count + 1]
    
    IndexError: list index out of range
# Example 8.6 Bubble Sort First Pass
#
# Implements the first pass of bubble sort and shows the impact on the list

# test data
sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]


def sort_high_to_low():
    """
    Print out a sales list from highest to lowest

    Returns
    -------
    None
    """

    for count in range(0, len(sales) - 1):
        if sales[count] < sales[count + 1]:
            temp = sales[count]
            sales[count] = sales[count + 1]
            sales[count + 1] = temp


print("Input list:", sales)

sort_high_to_low()

print("Output list:", sales)
Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
Output list: [54, 50, 33, 29, 100, 45, 54, 89, 75, 22]

after which the test data looks like this

block-beta
    columns 6

    classDef BG stroke:transparent, fill:transparent

    index["Index"]:1
    class index BG

    block:Indices:5
    columns 10
        0
        1
        2
        3
        4
        5
        6
        7
        8
        9
    end

    value["Value"]:1
    class value BG

    block:Values:5
    columns 10
        54_1["54"]
        50
        33
        29
        100
        45
        54_2["54"]
        89
        75
        22
    end

  • Notice that the list has been partially sorted
    • Also notice that the smallest value \(22\) has been moved to the correct index (the end)
    • The high numbers effectively bubble left past one of the values smaller than them
  • Since we can see that after sorting the smallest value has been moved to the end we expect on the second loop through the second smallest value will have been moved to the correct spot
    • So we want to loop through len(sales) times
  • The working bubble sort implemention is then,
# Example 8.7 Bubble Sort Multiple Pass
#
# Implements a complete working version of bubble sort

# test data
sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]


def sort_high_to_low():
    """
    Print out a sales list from highest to lowest

    Returns
    -------
    None
    """
    for sort_pass in range(0, len(sales)):
        for count in range(0, len(sales) - 1):
            if sales[count] < sales[count + 1]:
                temp = sales[count]
                sales[count] = sales[count + 1]
                sales[count + 1] = temp


print("Input list:", sales)

sort_high_to_low()

print("Output list:", sales)
Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
Output list: [100, 89, 75, 54, 54, 50, 45, 33, 29, 22]
Code Analysis: Improving Performance

As seen above, the sorting program now works correctly. Once you have a working implementation its worth investigating if there are changes you can make to improve the efficiency. Work through the following questions to get the idea

  1. Is the program making more comparisons than necessary?

    • Yes, as we mentioned before, after one pass the smallest item will always be at the end of the collection

    • This means we don’t need to check any swaps against it any more for the inner loop

    • After each pass the size of this sorted section increases by at least one

    • An implementation taking this into account is,

        for sort_pass in range(0, len(sales)):
            for count in range(0, len(sales) - 1 - sort_pass):
                if sales[count] < sales[count + 1]:
                    temp = sales[count]
                    sales[count] = sales[count + 1]
                    sales[count + 1] = temp
  2. Is the program performing more passes through the list than nessecary?

    • Probably, unless the largest value is at the end of the list all values should be bubbled to their correct spot in less than len(sales) passes
    • We can stop doing additional passes if we work out the list is already sorted
    • How?
      • We use a flag to track if any swaps occur in a pass
      • If none do then the list is already sorted and we can stop
        # Example 8.8 Efficient Bubble Sort
        #
        # A bubble sort implementation incorporating efficiency savings to the number
        # of comparisons and passes through the list
      
        # test data
        sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
      
      
        def sort_high_to_low():
            """
            Print out a sales list from highest to lowest
      
            Returns
            -------
            None
            """
            for sort_pass in range(0, len(sales)):
                done_swap = False
                for count in range(0, len(sales) - 1 - sort_pass):
                    if sales[count] < sales[count + 1]:
                        temp = sales[count]
                        sales[count] = sales[count + 1]
                        sales[count + 1] = temp
                        done_swap = True
                if not done_swap:
                    break
      
        print("Input list:", sales)
      
        sort_high_to_low()
      
        print("Output list:", sales)
      Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
      Output list: [100, 89, 75, 54, 54, 50, 45, 33, 29, 22]
Make Something Happen: Sort Alphabetically

Bubble sort works for strings as well as integers. We saw that in Chapter 5 the python relational operators also work for strings. See if you can modify the Party Guest Program to display the names in alphabetical order

We can basically just reuse our sort code, but renamed for the guest program.

def sort_alphabetical():
    """
    Sorts a list alphabetically

    Returns
    -------
    None
    """
    for sort_pass in range(0, len(guests)):
        done_swap = False
        for count in range(0, len(guests) - 1 - sort_pass):
            if guests[count] > guests[count + 1]:
                temp = guests[count]
                guests[count] = guests[count + 1]
                guests[count + 1] = temp
                done_swap = True
        if not done_swap:
            break

There is a second modification above, which is changing the sign of the relational operator, e.g.

guests[count] < guests[count + 1]

has been changed to,

guests[count] > guests[count + 1]

This is because as written the program tries to put the smallest strings last, but for strings; where the relational operator is alphabetically ordered this puts strings starting with a for example, after those starting with z etc. So we need to swap the sign so that the list is printed a, b, … , z etc.

Why don’t we have to make more modifications? Well the code as written only requires that the items being sorted are stored in a list, and that the items in the list can be compared with a relational operator. Both of these properties are satisfied by a collection of strings so the code effectively works out of the box

The complete code, including the integration with reading and printing the guest list is given in SortAlphabetically.py

Sort a List from Low to High

  • To flip the direction of the sort, we just need the condition that determines what is out of order or not
    • We do this by changing \(<\) to \(>\), i.e.

        # Example 8.9 Bubble Sort Low to High
        #
        # Implementation of Bubble Sort that sorts from low to high
      
        # test data
        sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
      
      
        def sort_low_to_high():
            """
            Print out a sales list from highest to lowest
      
            Returns
            -------
            None
            """
            for sort_pass in range(0, len(sales)):
                done_swap = False
                for count in range(0, len(sales) - 1 - sort_pass):
                    if sales[count] > sales[count + 1]:
                        temp = sales[count]
                        sales[count] = sales[count + 1]
                        sales[count + 1] = temp
                        done_swap = True
                if not done_swap:
                    break
      
      
        print("Input list:", sales)
      
        sort_low_to_high()
      
        print("Output list:", sales)
      Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
      Output list: [22, 29, 33, 45, 50, 54, 54, 75, 89, 100]
  • The code above is given in BubbleSortLowToHigh.py

Find the Highest and Lowest Sales Values

  • In comparison to sorting, finding a value is much easier

  • The basic outline for finding the highest is,

      for values in collection
          if(new value > highest seen so far)
              highest = new value
  • We can write the code for the highest and lowest in python then as,

      highest = sales[0]
      for sales_value in sales:
          if sales_value > highest:
              highest = sales_value
    
      lowest = sales[0]
      for sales_value in sales:
          if sales_value < lowest:
              lowest = sales_value
  • If we want to find both at the same time, then we can combine the code above, which means we only have to do one pass through the collection

      # Example 8.10 Highest and Lowest
      #
      # Function that finds the highest and lowest value in a collection
    
      # Example 8.9 Bubble Sort Low to High
      #
      # Implementation of Bubble Sort that sorts from low to high
    
      # test data
      sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    
    
      def highest_and_lowest():
          """
          Print out the highest and lowest elements of a sales list
    
          Returns
          -------
          None
          """
          highest = sales[0]
          lowest = sales[0]
    
          for sales_value in sales:
              if sales_value > highest:
                  highest = sales_value
              elif sales_value < lowest:
                  lowest = sales_value
          print("The highest is:", highest)
          print("The lowest is", lowest)
    
    
      print("Input list:", sales)
    
      highest_and_lowest()
    Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    The highest is: 100
    The lowest is 22

Evaluate Total and Average Sales

  • To evaluate the total we have to sum the contents of a list, simple using the for loops we’ve looked at, (implementation in TotalSales.py)

      # Example 8.11 Total Sales
      #
      # Calculate the Total Sales
    
      # test data
      sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    
    
      def total_sales():
          """
          Print out the total sales of a sales list
    
          Returns
          -------
          None
          """
          total = 0
          for sales_value in sales:
              total = total + sales_value
          print("Total sales are:", total)
    
    
      print("Input list:", sales)
    
      total_sales()
    Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    Total sales are: 551
  • It is a simple extra step to them calculate the average, (divide the total by the number of elements in the collection)

      # Example 8.12 Average Sales
      #
      # Calculate the Average Sales
    
      # test data
      sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    
    
      def average_sales():
          """
          Print out the average sales of a sales list
    
          Returns
          -------
          None
          """
          total = 0
          for sales_value in sales:
              total = total + sales_value
          average_sales = total / len(sales)
          print("Average sales are:", average_sales)
    
    
      print("Input list:", sales)
    
      average_sales()
    Input list: [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    Average sales are: 55.1

Complete the Program

  • The previous Exercises have given us all the parts, now we want to put it together
  • The crux of our program should be a loop around the menu through which the user selects different functions
  • We first however need to read in the data from the user
  • For useability we should add the ability to quit the program
  • The final program implements this
# Example 8.13 Complete Program
#
# A Complete implementation of the Sales Program combining all the individual
# programs that we have implemented

import BTCInput

sales = []


def read_sales(number_of_sales):
    """
    Reads in the sales values and stores them in the sales list

    Parameters
    ----------
    number_of_sales : int
        Number of Stores to record sales values for

    Returns
    -------
    None
        Results are read into the sales list
    """
    sales.clear()  # remove existing sales values
    for count in range(1, number_of_sales + 1):
        prompt = "Enter the sales for stand " + str(count) + ": "
        sales.append(BTCInput.read_int(prompt))


def print_sales():
    """
    Prints the sales figures on the screen with a heading. Each figure is
    numbered in sequence

    Returns
    -------
    None
    """
    print("Sales Figures")
    count = 1
    for sales_value in sales:
        print("Sales for stand", count, "are", sales_value)
        count = count + 1


def sort_high_to_low():
    """
    Print out a sales list from highest to lowest

    Returns
    -------
    None

    See Also
    --------
    sort_low_to_high : sorts from lowest to highest
    """
    for sort_pass in range(0, len(sales)):
        done_swap = False
        for count in range(0, len(sales) - 1 - sort_pass):
            if sales[count] < sales[count + 1]:
                temp = sales[count]
                sales[count] = sales[count + 1]
                sales[count + 1] = temp
                done_swap = True
        if not done_swap:
            break


def sort_low_to_high():
    """
    Print out a sales list from lowest to highest

    Returns
    -------
    None

    See Also
    --------
    sort_high_to_low : sorts from highest to lowest
    """
    for sort_pass in range(0, len(sales)):
        done_swap = False
        for count in range(0, len(sales) - 1 - sort_pass):
            if sales[count] > sales[count + 1]:
                temp = sales[count]
                sales[count] = sales[count + 1]
                sales[count + 1] = temp
                done_swap = True
        if not done_swap:
            break


def highest_and_lowest():
    """
    Print out the highest and lowest elements of a sales list

    Returns
    -------
    None
    """
    highest = sales[0]
    lowest = sales[0]

    for sales_value in sales:
        if sales_value > highest:
            highest = sales_value
        elif sales_value < lowest:
            lowest = sales_value
    print("The highest is:", highest)
    print("The lowest is", lowest)


def total_sales():
    """
    Print out the total sales of a sales list

    Returns
    -------
    None
    """
    total = 0
    for sales_value in sales:
        total = total + sales_value
    print("Total sales are:", total)


def average_sales():
    """
    Print out the average sales of a sales list

    Returns
    -------
    None
    """
    total = 0
    for sales_value in sales:
        total = total + sales_value
    average_sales = total / len(sales)
    print("Average sales are:", average_sales)


# Get initial sales list
read_sales(10)


menu = """
Ice Cream Sales

0. Quit the Program
1. Print the Sales
2. Sort High to Low
3. Sort Low to High
4. Highest and Lowest
5. Total Sales
6. Average Sales
7. Enter Figures

Enter your command: """

while True:
    command = BTCInput.read_int_ranged(menu, 0, 7)
    if command == 0:
        break
    if command == 1:
        print_sales()
    elif command == 2:
        sort_high_to_low()
    elif command == 3:
        sort_low_to_high()
    elif command == 4:
        highest_and_lowest()
    elif command == 5:
        total_sales()
    elif command == 6:
        average_sales()
    elif command == 7:
        read_sales(10)
    else:
        raise ValueError("Unexpected value " + str(command) + " found")
Warning

Keeping Information Synchronised when Sorting

Playing around with the program you might notice one thing. The stands are numbered in the order that they are printed. This works great for printing the original list out, but once we start sorting these numbers don’t match their original value. This is fine if we only care about the sales figures, but if we want to maintain a relationship between a stand and its sales this is something that would have to be modified.

This is something you would discuss with the client

Store Data in a File

  • A natural extension to the program would be the ability to read or store the sales data to a file

  • Files allow for persisting the data between sessions

  • To do this we’ll add two new options, 8. Save Sales and 9. Load Sales

  • Let us start by stubbing out our functions (the complete integration is found in LoadAndSave.py),

      def save_sales(file_path):
          """
          Saves the contents of the sales list to a file
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to save to
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the save fails
    
          See Also
          --------
          load_sales : load sales from a sales list file
          """
          print("Save the sales in:", file_path)
    
    
      def load_sales(file_path):
          """
          loads the contents of a file into the sales list
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to load from
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the load fails
    
          See Also
          --------
          save_sales : save the sales list into a file
          """
          print("Load the sales in:", file_path)
  • We also add a basic integration to the user menu, where we use BTCInput.read_text to get a file name, then call the function

  • Observe that by adding the complete docstring’s we’re also starting to document the requirements for these functions in-code

        elif command == 7:
          read_sales(10)
      elif command == 8:
          file_to_save_to = BTCInput.read_text("Enter file to save to: ")
          save_sales(file_to_save_to)
      elif command == 9:
          file_to_load_from = BTCInput.read_text("Enter file to load: ")
          load_sales(file_to_load_from)
      else:
          raise ValueError("Unexpected value " + str(command) + " found")

Write into a File

  • When interacting with a file, python represents it as a memory object

    • Technically representing the connection
  • open creates a connection to a file, the below, opens a file, test.txt, in write mode w and stores it in the variable output_file

      output_file = open('test.txt', 'w')
    • The two arguments are called the file_path and the mode
      • file_path is the file you want to open
      • mode is what you want to do with it
Caution

It’s very easy to overwrite an existing file

The open function will not prevent you from modifying important files. For example files opened for write will first wipe the contents of any existing file that matches the path then write the new contents.

Python provides the os module which has some extra functionality for handling files and directories, e.g. you can check that a file exists before you open it if you then want check if the user wants to overwrite it before opening it

import os.path
if os.path.isfile("text.txt"):
    print("The file exists")
  • If we’ve opened a file in write mode, we can use the write method on the file object to write to the file

      output_file.write("First line\n")
      output_file.write("Second line\n")
      output_file.close()
  • Once you’re done with a file you need to call close

    • Completes any unfinished writes (ensures data integrity)
    • Releases the file so other programs or processes can use it
      • Files open for writing are locked for editing by that process, nothing else can use them
  • Putting everything together our simple file writing program is,

      # Exercise 8.15 File Output
      #
      # A simple program to demonstrate opening and writing to a file
    
      output_file = open("test.txt", "w")
      output_file.write("line 1\n")
      output_file.write("line 2\n")
      output_file.close()
Code Analysis: File Writing

Consider the following questions about file writing

  1. Why have you called the write function a method? Isn’t it a function?

    • As discussed earlier, methods are functions associated with a specific object
    • Typically when we say functionw we refer to a function that is defined outside of an object
    • write is a method on the file object
      • It is impossible to use write without there being a file object to use
      • Methods allow us to work with multiple file objects without having to worry about making sure we pass the correct one to the function
  2. What does the \n mean at the end of the strings?

    • It’s the new line symbol write doesn’t automatically end the line after we call it
    • We have to manually pass the new line
  3. Where is the file text.txt actually created?

    • The file_path is relative to the current running python program

    • Hence the file is written to the same directory

      • E.g. if we had a folder called “My Programs” with a python program “MakeFiles.py”, when we run “MakeFiles.py” the files it makes are stored in “My Programs”
    • You can use more complicated file_paths

      1. path = "./data/test.txt" would look for test.txt in the data subdirectory of the current python program (relative path)
      2. path = "c:/data/test.txt" would look for test.txt in the data subdirectory of the c drive (absolute path)
      Note

      Denoting a Directory Seperator

      On Windows \ is used to seperate directories, but in python you always use /

  4. Can any program use a file written from a Python program?

    • Yes, python uses the underlying operating systems file handling services
    • Any other program on the operating system can access files created or modified by python
  5. Can I add lines at the end of a python file?

    • Yes, rather than open the file in write w, you open the file in append (a).
    • Any writes will then be appended to the end of the file.
    • A non-existent file will be created the same way as for write mode
Write the Sales Figures
  • Using the above discussion we can implement the write_sales function

      # Example 8.16 Write Sales
      #
      # Implements the Write Sales function
    
      # test data
      sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    
    
      def save_sales(file_path):
          """
          Saves the contents of the sales list to a file
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to save to
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the save fails
          """
          print("Save the sales in: ", file_path)
          output_file = open(file_path, "w")
          for sale in sales:
              output_file.write(str(sale) + "\n")
          output_file.close()
    
    
      save_sales("test_output.txt")
Code Analysis: The save_sales Function

The save_sales function combines several behaviours and is worth examining in detail. What is the purpose of the function? To take a list of sales figures and write those figures to a file (preferably in a format that is easy for a human to read and to load back into the program.) Consider the following questions

  1. What does the str function do? Why are we using it?

    • The str function converts the sales number to a string
    • While print can handle non-string inputs, write can only take a string
  2. Why can’t we just write out the sales list as one object?

    • A list does not provide any built-in methods for writing an object out to a file
    • We could try and print out it’s string representation (i.e. call str and output that)
    • Doesn’t give us great ability to control the way the data is output

Read from a File

  • We an also use open to read from a file, we just use the read mode (r)

      input_file = open("test.txt", "r")
  • We can then loop over the lines in a file using a for loop

    for line in input_file:
          print(line)
  • We should still use close() when we’re done reading

      input_file.close()
  • The complete sample program looks like,

      # Example 8.17 File Input
      #
      # Demonstrates reading input from a file
    
      input_file = open("test.txt", "r")
      for line in input_file:
          print(line)
      input_file.close()
Code Analysis: Reading from Files

Work through the following questions to understand how reading from files works

  1. If you look at the following output, you’ll notice there are empty lines after each line of text. Why is that?

     line 1
    
     line 2
    
    • Every time we read a line from a file, we read the terminating new line

    • This is included in the string stored in line so when we call print we get that new line and the new line added by print

    • We could fix this by modifying our print call, to remove the new line

        print(line, end='')
    • A more natural way to fix this is to remove the newline when we first read in the string

    • The strip method when called without arguments returns a copy of the string with all leading and trailing whitespace removed from the string

      line = line.strip()
    • This is an example of conditioning input
    • Process of making sure that an input does not contain any unexpected values
    • E.g. we might also want to use strip to remove non-printable characters
      • lstrip and rstrip are variants of strip that only work on the lead or end of the string respectively
  2. Why do we have to close the file we’re reading?

    • For reading a file forgetting to close it won’t cause issues with other programs or processes that also try to read from the file
    • However, lets other programs now write to that file
    • Releases the memory associated with holding the connection
    • Your computer might not let you shut down if it thinks there are still unclosed files
  3. What would happen if you tried to write to a file that had been opened for reading?

    • An exception will be raised
    • r+ is a mode that lets you read and write to a file
    • You typically don’t want to read and write to a file at the same time
      • Hard to ensure the integrity of the data and avoid corrupting it
      • Such as by writing a line longer than the one previously written
        • this may corrupt the next line
    • A better pattern is to load data, update the data then write that back into the file
      • A temporary file (often abreviated as a tmp file) can be used if we need an intermediate file to write to
  4. Can a program read an entire file at once?

    • Yes, the* read method by default will try to read an entire file
    • line endings are preserved
    • Be careful with large files, as this may overwhelm your computers memory…
     # Example 8.18 File Read
     #
     # Demonstrates the use of file_object.read to read
     # the contents of a file in one go
    
     input_file = open("test.txt", "r")
     total_file = input_file.read()
     print(total_file)
     input_file.close()
Read the Sales Figures
  • Let’s now implement load_sales

      # Example 8.19 Load Sales
      #
      # Implements the Load Sales function
    
      sales = []
    
    
      def load_sales(file_path):
          """
          loads the contents of a file into the sales list
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to load from
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the load fails
          """
          print("Load the sales in:", file_path)
          sales.clear()
          input_file = open(file_path, "r")
          for line in input_file:
              line = line.strip()
              sales.append(int(line))
          input_file.close()
Code Analysis: The load_sales Function

load_sales works as the opposite of save_sales instead of taking a sales list and putting it into a text file, we pull the figures from a file and load them into the sales list. Consider the following questions

  1. What does the int function do?

    • The numbers pulled out of the file are initially stored as a string
    • We need to convert them to a number, so we call int
  2. What happens if the input file was empty?

    • The function works as one would hope
    • The loop doesn’t iterate and we get an empty sales list

Deal with File Errors

  • Dealing with files, also means dealing with the errors they can introduce
    • e.g. A file might have been deleted, a USB removed, or simply the user might pass the wrong name
  • When an error occurs we want to ensure two things:
    1. No files are left open
    2. The user is aware that the error has occured
  • File objects typically raise exceptions when their methods
    • Enables us to handle and report on their errors
    • Use the try ... except syntax we’ve seen before
      try:
          output_file = open(file_path, "w")
          for sale in sales:
              output_file.write(str(sale) + "\n")
          output_file.close()
          print("File Written Successfully")
      except:
          print("Something went wrong with the file")
Code Analysis: Dealing with File Handling Exceptions

The code performing the file write is wrapped in a try...except block. If write, open or close causes an exception it will be caught and handled by the except clause. Let’s work through the following questions to see if this solves the ensures that the file is closed and the user is informed

  1. In what circumstances will the code in the except part be executed?

    • If any of the file functions, write, open, or close raise an exception, the code in the except part will be executed
    • An error message is thus only printed when an error occurs
  2. In what circumstances will the “File written successfully?” message be printed?

    • This is only printed if every step in the file writing process is completed successfully
  3. An error message is always printed if an error is thrown, but will the file always be closed?

    • No, this is a problem, as we said that all files needed to be closed even when an error occurs!
    • We could put the close statement in the exception handling section to, but a more general solution to this problem is to use a finally block
      • A finally block contains code that is always executed after all of the try and/or except code has executed
      • Good for code that we naturally want to run after the block no matter if the process succeeds or fail (such as clean-up)
     try:
         output_file = open(filename, "w")
         for sale in sales:
             output_file.write(str(sale) + "\n")
     except:
         print("Something went wrong with writing to the file")
     finally:
         output_file.close()

Use the with Construction to Tidy up File Access

  • It would be great if we didn’t have to remember to manually ensure a file gets closed
    • Failing to properly close a file can lead to hard to pin down behaviour
Warning

Intermittent Faults are the Worst Kind to Fix

A piece of code that is broken all the time is annoying, but at least you can typically easily identify what is not working. If a program fails only some of the time this can be much harder to solve. Often you require precise directions as to the steps taken up to the point of failure in order to be able to attempt to replicate the problem. This adds significant overhead to fixing the problem

  • The with construct allows the programmer to automatically manage the acquisition and release of resources
    • More generic than just file access
    • You can write your own services to work with with
      • Advanced topic we can ignore for now

block-beta
    columns 6

    classDef BG stroke:transparent, fill:transparent


    space
    title["Breakdown of a with statement"]:4
    space

    class title BG

    block:With
    columns 1
        with["with"]
        withDescr["(start of a with block)"]
    end

    class with BG
    class withDescr BG


    block:Expression
    columns 1
        expression["expression"]
        expressionDescr["(expression generating resource to use)"]
    end

    class expression BG
    class expressionDescr BG

    block:As
    columns 1
        as["as"]
        space
    end

    class as BG

    block:Name
    columns 1
        name["name"]
        nameDescr["(name to represent the resource)"]
    end

    class name BG
    class nameDescr BG

    block:Colon
    columns 1
        colon[":"]
        space
    end

    class colon BG

    block:Suite
    columns 1
        suite["Statement block"]
        suiteDescr["(statements)"]
    end

    class suite BG
    class suiteDescr BG

  • with is used to provide an object that provides a service

  • as is used to assign a semantically meaningful name to the resource

  • with activates an “enter” behaviour on its object

    • For files this is open
  • When the block is finished, with calls some exit behaviour on the object

    • For files this causes the file to be closed
  • with allows us to ensure a few things

    1. The file is always closed
    2. The reference to the file only exists as long as we are using it
      # Example 8.20 Using with to Access Files
      #
      # Rewrites read_sales and load_sales to use the with functionality
      # implemented in python
    
      # test data
      sales = [50, 54, 29, 33, 22, 100, 45, 54, 89, 75]
    
    
      def save_sales(file_path):
          """
          Saves the contents of the sales list to a file
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to save to
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the save fails
    
          See Also
          --------
          load_sales : load sales from a given file
          """
          print("Save the sales in:", file_path)
          try:
              with open(file_path, "w") as output_file:
                  for sale in sales:
                      output_file.write(str(sale) + "\n")
          except:  # noqa: E722
              print("Something went wrong with the file")
    
    
      def load_sales(file_path):
          """
          loads the contents of a file into the sales list
    
          Parameters
          ----------
    
          file_path : str
              string giving the file path to load from
    
          Returns
          -------
          None
    
          Raises
          ------
          FileException
              Raised if the load fails
    
          See Also
          --------
          save_sales : save sales to a file
          """
          print("Load the sales in:", file_path)
          sales.clear()
          try:
              with open(file_path, "r") as input_file:
                  for line in input_file:
                      line = line.strip()
                      sales.append(int(line))
          except:  # noqa: E722
              print("Something went wrong with the file")
    
    
      print("Sales before save and load:", sales)
      save_sales("test.txt")
      load_sales("test.txt")
      print("Sales after save and load:", sales)
  • Observe that we no longer have to explicitly include the close call

  • with does not handle exceptions however, so we still have to include a try...except block

  • When an exception occurs the with first releases the resource with its exit behaviour

    • e.g. closes the file
    • Then the excecution moves to the except block
  • If we wanted to handle exceptions without releasing the resource, we would have to swap the order to,

      with open("file", "mode"):
          try:
              #do standard thing here
          except:
              # handle exception without releasing resource
          finally:
              # do something regardless of success or fail without releasing resource

Make Something Happen: Record a List with a save Function

Add a save function to your party guest program so that you can record a list of people who attended your party

We build off our version that generates a sorted list. We can basically copy the save_sales function making changes to the refer to the guests list instead of sales and giving a more appropriate name to the loop variable.

def save(file_path):
    """
    Saves the guest list to a file

    Parameters
    ----------

    file_path : str
        string giving the file path to save to

    Returns
    -------
    None

    Raises
    ------
    FileException
        Raised if the save fails
    """
    print("Save the guest list in:", file_path)
    try:
        with open(file_path, "w") as output_file:
            for guest in guests:
                output_file.write(str(guest) + "\n")
    except:  # noqa: E722
        print("Something went wrong with the file")

We then run the program as normal

  1. Ask for the number of guests
  2. Read in the guests
  3. Sort the guest list
  4. Display the guest list

We then ask the user if they want to save the guest list. For simplicity we use BTCInput.read_input_ranged to ask for a \(0\) or a \(1\) where a \(1\) indicates the user wishes to save, while \(0\) indicates they dont. If the user wishes to save we then prompt them using BTCInput.read_text for a file name and then call save on the given file path

user_wants_to_save = BTCInput.read_int_ranged(
    "Would you like save the list? (1 for yes, 0 for no): ", min_value=0, max_value=1
)

if user_wants_to_save:
    save_file_name = BTCInput.read_text("Enter file name to save as: ")
    save(save_file_name)

Store Tables of Data

  • A list holds data in one dimension, i.e. its length
  • Often data is multi-dimensional
  • e.g. Our Ice Cream Sales client might now ask for the ability to track sales, by store and by day of the week

block-beta
    columns 5

    classDef Header fill:#bbf,stroke:#333,stroke-width:4px;
    classDef BG stroke:transparent, fill:transparent

    space:2
    title["Data Table"]:2
    space:1

    class title BG

    space
    block:fields:4
    columns 4
        monday["Monday"]
        tuesday["Tuesday"]
        wednesday["Wednesday"]
        stop["..."]
    end

    class fields Header

    Stand1["Stand 1"]
    50
    80
    10
    Blank1["..."]

    class Stand1 BG

    Stand2["Stand 2"]
    54
    98
    7
    Blank2["..."]

    class Stand2 BG

    Stand3["Stand 3"]
    29
    40
    80_2["80"]
    Blank3["..."]

    class Stand3 BG

    Stand4["..."]
    stand4_1[" "]
    stand4_2[" "]
    stand4_3[" "]
    stand4_4[" "]

    class Stand4 BG

  • Our current implementation is effectively a vertical slice for one of the days

  • Can implement multiple lists, one per day of the week

    • Effectively repeats the problem we had before of a distinct named variable for each item
  • We want a list of lists

      mon_sales = [50, 54, 29, 33,  22, 100, 45, 54, 89, 75]
      tue_sales = [80, 98, 40, 43, 43, 80, 50, 60, 79, 30]
      wed_sales = [10, 7, 80, 43, 48, 82, 33, 55, 83, 80]
      thu_sales = [15, 20, 38, 10, 36, 50, 20, 26, 45, 20]
      fri_sales = [20, 25, 47, 18, 56, 70, 30, 36, 65, 28]
      sat_sales = [122, 140, 245, 128, 156, 163, 90, 140, 150, 128]
      sun_sales = [100, 130, 234, 114, 138, 156, 107, 132, 134, 148]
    
      week_sales = [mon_sales, tue_sales, wed_sales, thu_sales, fri_sales, sat_sales, sun_sales]
  • Think of lists of lists as a collection of rows and columns

    • We first specify the row we want say tue_sales

    • Then the column, say Stand 1

        print(week_sales[1][0])
      80

Code Analysis: Inadequate Index Values

It can be difficult to get the hang of working with multiple indices. Which of the following indices would fail when the program runs?

Statement 1: week_sales[0][0] = 50
Statement 2: week_sales[8][7] = 88
Statement 3: week_sales[7][10] = 100
  1. Statement 1 is valid
  2. Statement 2 is invalid because the first index \(8\) corresponds to the day of the week
    • The valid indices here are \(0\) to \(6\)
  3. Statement 3 is also invalid for the same reason
    • Even though there are seven days of the week
    • The list is zero indexed

Let’s see this in action

  • Statement 1:
week_sales[0][0]
50
  • Statement 2:
week_sales[8][7]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[20], line 1
----> 1 week_sales[8][7]

IndexError: list index out of range
  • Statement 3:
week_sales[7][10]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 week_sales[7][10]

IndexError: list index out of range
Tip

Make it easy to test your program

Testing is important, but unless it’s easy or automatic it’s pretty common to get left by the wayside.

In a program one might use a function make_test_data or for larger projects a test framework that is used to generate test data.

Whenever you find yourself repeating a pattern to test code, consider how you can automate or bypass that process

Use Loops to Work with Tables

  • We can use nested for loops to work through individual values in a list of lists

  • E.g. if we want to calculate the total sales over a week, (full code given in TablesOfSaleData.py)

      total_sales = 0
      for day_sales in week_sales:
          for sales_value in day_sales:
              total_sales = total_sales + sales_value
    
      print("Total sales for the week are", total_sales)
    Total sales for the week are 5205
    • day_sales in the outer loop iterates over each constituent list in the list of lists
    • sales_value is then each value in the current list referenced by day_sales
Code Analysis: Loop Counting

Consider the code for summing the sales data in the previous example. Answer the following questions to make sure you understand how it works

  1. How many times will the statements inside the two loops be obeyed?

    • In total they will be run \(70\) times
    • The outer loop runs seven times (once for each day of the week)
    • The inner loop runs ten times (one for each stand)
      • for each iteration of the outer loop
  2. How would you change this program so that it could handle more than one week’s worth of sales?

    • We can add more days to the list
    • Rather than have them correspond to Monday - Friday it might be Week 1 Day 1 etc.
    • These would be additional rows in the list of lists
  3. How would we add a day’s worth of sales to the list?

    • We have to read in a new list of values

    • Can then append it to the list of lists

        read_sales(10) # read ten values into sales list
        week_sales.append(sales) # append the values to the weekly sales list

More than Two Dimensions

  • It is possible to work with higher dimensions

  • For example we might want to store multiple weeks of data

    • Then we would have a list of (list of (lists))s
  • Works just like two dimensions but with an extra index, for example we can append a week of sales like so,

      annual_sales.append(week_sales)
Tip

Keep your dimensions low

You should rarely have to use more than three dimensions. If you find yourself using highly nested / high-dimensional structures you might want to rethink how you’re representing your data

One technique we will see later is the use of classes, which can make it easier to create linear collections

The computer itself is perfectly happy working in higher dimensions. The real difficulty is that you probably aren’t and it can be hard to reason about high dimension data

Use Lists as Lookup Tables

  • Now we have the ability to manipulate weekly sales data, the next question is how to display that data and the requests.

  • When we enter the data we want to see something like,

      Enter the Monday sales figures for stand 2:
  • Here we need to have a variable to control what day is printed

    • Simplest implementation is an integer to track the day, implemented in DayNameIf.py
      # Example 8.22 Day Name If
      #
      # Uses a if, elif, else construction to convert an integer
      # to a string representation of the day of the week
    
      import time
    
      current_time = time.localtime()
      day_number = current_time.tm_wday
    
      if day_number == 0:
          day_name = "Monday"
      elif day_number == 1:
          day_name = "Tuesday"
      elif day_number == 2:
          day_name = "Wednesday"
      elif day_number == 3:
          day_name = "Thursday"
      elif day_number == 4:
          day_name = "Friday"
      elif day_number == 5:
          day_name = "Saturday"
      elif day_number == 6:
          day_name = "Sunday"
      else:
          raise ValueError("Unexpected day_number " + str(day_number) + " encountered")
    
      print(day_name)
    Friday
  • This works, but is fragile, a cleaner way to do this is to use a lookup table

    • i.e. we use day_number to index a list that stores the correct day
  • We use thetime library for fun so the program prints the current day

      # Example 8.23 Day Name List
      #
      # Uses a lookup table to correctly print the day
    
      import time
    
      current_time = time.localtime()
      day_number = current_time.tm_wday
    
      day_names = [
          "Monday",
          "Tuesday",
          "Wednesday",
          "Thursday",
          "Friday",
          "Saturday",
          "Sunday",
      ]
    
      day_name = day_names[day_number]
    
      print("Today is", day_name)
    Today is Friday
  • Lookup tables are powerful for shrinking written code

  • They also are used to create data-driven applications

    • Programs that use built-in or loaded data rather than fixed behaviour

Tuples

  • Lists are the standard collection type
    • They are mutable, i.e. we can change the value of a given index or add new items
  • Consider the day_names list, once defined we don’t want to change it
    • We would like to also prevent this, to catch potential programming errors e.g.

        day_names[5] = "Splatterday"
  • A tuple is like a list, but the contents cannot be changed
    • A tuple is said to be immutable
    • If we attempt to change the tuple we get an error, (demonstrated in the implementation DayNameList.py)
      • Specifically a TypeError
      • Because the action we are trying to take (change the value at an index) is not supported by the object type (tuple)
        # Example 8.24 Day Name Tuple
        #
        # Reimplements the Day Name lookup table with a tuple
        # and demonstrates the immutability of the data structure
      
        import time
      
        current_time = time.localtime()
        day_number = current_time.tm_wday
      
        day_names = (
            "Monday",
            "Tuesday",
            "Wednesday",
            "Thursday",
            "Friday",
            "Saturday",
            "Sunday",
        )
      
        day_name = day_names[day_number]
      
        print("Today is", day_name)
      
        print("Attempting to change the lookup table...")
      
        day_names[day_number] = "Splatterday"  # type: ignore
        print("Today is", day_names[day_number])
      Today is Friday
      Attempting to change the lookup table...
      ---------------------------------------------------------------------------
      TypeError                                 Traceback (most recent call last)
      Cell In[25], line 27
           23 print("Today is", day_name)
           25 print("Attempting to change the lookup table...")
      ---> 27 day_names[day_number] = "Splatterday"  # type: ignore
           28 print("Today is", day_names[day_number])
      
      TypeError: 'tuple' object does not support item assignment
  • Tuple is created as for a list but using () to delimit the items rather than []
  • Tuples are good for working with complicated values
    • e.g. composite types
  • For Example, consider a pirates treasure map
    • Treasure’s location is given by
      1. A reference landmark
      2. Number of steps north
      3. Number of steps east
  • A function can strictly speaking return one value
    • We can return multiple values as a tuple
      def get_treasure_location():
          # get the treasures location
          return ("The old oak tree", 20, 30)
    • This returns three values
      1. The string "The old oak tree"
      2. The number of steps north, 20
      3. The number of steps east, 30
  • Like lists, tuples are zero-indexed
Warning

Take care with your tuple indices

When returning multiple items from a function via a tuple, we have to be clear to specify the order of what the items in the tuple correspond to. This is effectively a contract between the function and any caller (if you change the order, you will break the code of anyone who relies on the current order)

The order that parameters are returned in should thus be clearly documented, e.g.

def get_treasure_location():
    """
    Gets the location of the treasure

    Returns
    -------
    str
        Name of a landmark to start at
    int
        Number of paces north
    int
        Number of paces east
    """

    return ("The old oak tree", 20, 30)
  • An alternative to explicitly referencing the index of a returned tuple, is called tuple-unpacking
    • We provide a comma-seperated list of variables to assign the tuple values (in order) to, e.g.

            landmark, north, east = get_treasure_location()
        print("Start at", landmark, "walk", north, "paces north and", east, "paces east")
  • The complete Pirate’s Treasure program implemention is given in PiratesTreasure.py

Summary

  • Lists can be used to store large and arbitarily sized data
    • We refer to the individual elements of a list as items
    • append lets us add new elements to a list (at the end)
    • len returns the number of items in a list
    • lists can contain different types of data in the same list
    • list values are accessed via the indexing operator []
      • lists are indexed from \(0\)
      • The last index in a list is len(list) - 1
    • Nested lists allow for multi-dimensional structures
  • Files can be manipulated by python
    • open is used to access a file
    • files can be read from or written to
    • for can be used to loop over lines from a file
    • when using write to write to a file, newlines ('\n') must be added exactly
    • strip can be used to remove whitespace when reading lines from a file
    • Files must be closed using the close method once they are no longer in use
    • Files can raise exceptions which must be handled or notified to the user
      • They must ensure the file is still closed
    • with can be used to automatically ensure a file is closed once it is no longer used, even in error scenarios
  • Tuples are immutable collections
    • Once they are defined we cannot modify or add values
    • Tuples are suitable for tuples or other fixed collections
    • Tuples can be used by functions that return more than one value

Questions and Answers

  1. Do we really need lists?
    • Yes, any scenario with large or arbitrary data needs collections to meaningfully handle and manipulate them
  2. Do we really need tuples?
    • No, techically we could just use lists instead. They are useful though because they enforce properties that lists don’t such as immutability which is useful in some cases
  3. How does the list actually work?
    • When a list is created the program reserves memory to hold a few items
    • The memory also tracks the number of items currently stored in the list
    • Appending an item consumes part of the allocated memory
    • If the list doesn’t have enough room, then more memory is allocated to the list
    • When accessing a list item, the list checks if the item exists
      • If the item doesn’t exist, an exception is thrown
      • else, the item is found and returned
  4. Why are tuples called tuples?
    • Tuples are ordered collections of elements in mathematics. Python adopted the terminology
  5. Should the sales program use a list to store the sales figures or a tuple?
    • It depends on the operations we want to peform
    • Once we have the list of sales figures, none of our operations strictly change the tuple (except sorting)
      • Can implement sorting them as creating a new tuple
      • Probably good to then use a tuple from a security perspective
      • However, this makes the code more complicated
    • If we wanted to introduce an edit function later to modify sales data we might prefer a list for the clean implementation
      • As again opposed to the tuple approach
  6. Can functions return lists instead of tuples?
    • Yes, they can.
    • However, typically the results of functions cannot be changed
      • So naturally a tuple
  7. Will my program run faster if I use tuples to store all the data in it?
    • Potentially, tuples are faster to implement than lists
    • Depends on what the program does, if you’re mutating a lot of data, the cost of constantly recreating multiple tuples might be greater than the cost of creating and modifying a list
    • The speed difference should hardly be noticable in any case
  8. Does the with construction stop objects from throwing exceptions?
    • No, with is designed to ensure that even if an object throws an exception the managed resource is released correctly
    • with will still pass on the exception