Introduction to Data Structures

What are Data Structures?

Data structures provide different ways to organize and store multiple pieces of data in your program. You can think of them as different types of containers - each with a different way of accessing information and suited for specific purposes.

E.g.

scores = [85, 92, 78, 96] # list
languages = ("English", "Spanish", "German", "Chinese", "Arabic") # tuple
translations = {"Dog": "Hund", "Cat": "Katze", "House": "Haus"} # dictionary
  • scores uses a list (in Python represented by square brackets []) - modifiable sequence of values
  • languages uses a tuple (in Python represented by round brackets ()) - fixed sequence of values
  • translations uses a dictionary (in Python represented by curly braces {}) - mappings of values (“Dog”, “Cat”, “House”) to keys (“Hund”, “Katze”, “Haus”)

Four important data structures

Python provides four main built-in data structures, each with different characteristics:

1. Lists [] - Ordered and Changeable

Lists store items in order and allow you to change them after creation. Each item can be accessed by an index representing the order in which it was inserted. In Python, the index starts from 0 (i.e. the first item in the list is accessed with index 0 and the second is accessed with index 1).

Creating lists

Square brackets [] are used to create lists: [values separated by commas]

fruits_list = ["apple", "grape", "cherry", "orange", "orange", "orange", "melon"]

Accessing data with index numbers

Lists (and tuples) allow you to access individual items or groups of items using their position.

  • Square brackets are used to access with the index: a_list[index]
  • Indexing starts at 0 (first item is [0], second is [1], etc.)
  • Negative indices count from the end of the list (-1 is last, -2 is second-to-last, etc.)
  • A colon is used to specify a range of indices (slicing, see below): a_list[start_index:end_index]
  • In index ranges, end_index is excluded from the range (a_list[0:2] only includes a_list[0] and a_list[1])
  • Leaving start_index or end_index blank means “the rest of the list” since start_index defaults to 0 while end_index defaults to the end of the list

Single item access

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]

print(fruits_list[0])     # First item: apple
print(fruits_list[2])     # Third item: cherry
print(fruits_list[-1])    # Last item: melon
print(fruits_list[-2])    # Second from end: orange
  • Square brackets are used to access with the index, e.g. fruits_list[2] is cherry
  • Indexing starts at 0, e.g. fruits_list[0] is apple
  • Negative indices count from the end of the list, e.g. fruits_list[-1] is melon

Slicing (getting multiple items)

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]

print(fruits_list[1:3])   # Items at index 1 and 2: ["grape", "cherry"]
print(fruits_list[:3])    # First three items (index 0, 1, 2): ["apple", "grape", "cherry"]
print(fruits_list[2:])    # From index 2 to end: ["cherry", "orange", "cherry", "orange", "melon"]
print(fruits_list[-3:])   # Last three items: ["cherry", "orange", "melon"]
print(fruits_list[:-2])   # Everything except last two: ["apple", "grape", "cherry", "orange", "cherry"]
print(fruits_list[::2])   # Every second item: ["apple", "cherry", "cherry", "melon"]
print(fruits_list[::-1])  # List in reverse: ["melon", "orange", "cherry", "orange", "cherry", "grape", "apple"]
  • A colon is used to specify a range of indices, e.g. fruits_list[1:3]
  • In index ranges, end_index is excluded from the range, e.g. fruits[1:3] only includes fruits_list[1] and fruits_list[2], i.e. ["grape", "cherry"]
  • Leaving start_index or end_index blank means “the rest of the list”, e.g. fruits_list[2:] means everything from fruits_list[2], i.e. ["cherry", "orange", "orange", "orange", "melon"]
  • An additional colon can be used to specify a step size, e.g. fruits_list[::2] includes every second element (including the first element), i.e. ["apple", "cherry", "orange", "melon"]

Common slicing patterns:

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Get first N items
first_three = numbers[:3]           # [0, 1, 2]

# Get last N items  
last_three = numbers[-3:]           # [7, 8, 9]

# Get everything except first N
skip_first_two = numbers[2:]        # [2, 3, 4, 5, 6, 7, 8, 9]

# Get everything except last N
skip_last_two = numbers[:-2]        # [0, 1, 2, 3, 4, 5, 6, 7]

# Get middle section
middle = numbers[3:7]               # [3, 4, 5, 6]

# Get every other item (with step)
evens = numbers[::2]                # [0, 2, 4, 6, 8]

# Reverse a list
reversed_list = numbers[::-1]       # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Inserting items

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
fruits_list.append("grape")  # Add to end: ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon", "grape"]
fruits_list.insert("grape", 2) # Insert at position: ["apple", "grape", "grape", "cherry", "orange", "cherry", "orange", "melon", "grape"]
  • The append() function adds a value to the end of the list
  • The insert() function inserts a value at a specfic position in the list: a_list.insert(value_to_insert, index)

Removing Items

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
fruits_list.remove("orange") # Remove specific value
fruits_list.remove("strawberry") # ValueError: list.remove(x): x not in list
print(fruits) # Output: ['apple', 'grape', 'cherry', 'cherry', 'orange', 'melon']

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
last_fruit = fruits_list.pop() # Remove last item, save it in last_fruit
print(last_fruit) # melon
print(fruits_list) # Output: ['apple', 'grape', 'cherry', 'orange', 'cherry', 'orange']
  • The remove() function removes a specific value from the list. If there is more than one instance of the value, the first instance will be removed first, e.g. fruits_list.remove("orange") gives ["apple", "grape", "cherry", "cherry", "orange", "melon"]
  • The pop() function removes the last item and returns it so you can save it to a variable.

Checking size and membership

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
print(len(fruits_list)) # Output: 7
print("apple" in fruits_list) # Output: True
print("strawberry" in fruits_list) # Output: False
  • The len function gets the length of the list, i.e. the number of elements in it
  • The keyword in is used to test for membership, i.e. if an element can be found in the list: a_value in a_list

Use lists when:

  • Order matters
  • You need to change items later
  • You want to allow duplicate values
  • You need to access items by position

2. Tuples () - Ordered but Unchangeable

As in the case of lists, tuples store items in order. But they cannot be changed after creation. Accessing data works very much in the same way as lists.

Creating tuples

Round brackets () are used to create tuples: (values separated by commas)

fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")

Accessing data with index numbers

fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")

print(fruits_tuple[0])      # First item: apple
print(fruits_tuple[2])      # Third item: cherry
print(fruits_tuple[-1])     # Last item: melon
print(fruits_tuple[-2])     # Second from end: orange

print(fruits_tuple[1:3])   # Items at index 1 and 2: ("grape", "cherry")
print(fruits_tuple[:3])    # First three items (index 0, 1, 2): ("apple", "grape", "cherry")
print(fruits_tuple[2:])    # From index 2 to end: ("cherry", "orange", "cherry", "orange", "melon")
print(fruits_tuple[-3:])   # Last three items: ("cherry", "orange", "melon")
print(fruits_tuple[:-2])   # Everything except last two: ("apple", "grape", "cherry", "orange", "cherry")
print(fruits_tuple[::2])   # Every second item: ("apple", "cherry", "cherry", "melon")
print(fruits_tuple[::-1])  # List in reverse: ("melon", "orange", "cherry", "orange", "cherry", "grape", "apple")

Checking size and membership

fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")
print(len(fruits_tuple)) # Output: 7
print("apple" in fruits_tuple) # Output: True
print("strawberry" in fruits_tuple) # Output: False
  • The len function gets the length of the tuple, i.e. the number of elements in it
  • The keyword in is used to test for membership, i.e. if an element can be found in the tuple: a_value in a_tuple

Use tuples when:

  • Order matters but data shouldn’t change
  • You want to ensure data stays the same
  • You need a lightweight, efficient container

3. Sets {} - Unique Items Only

Sets store unique items with no duplicates and no guaranteed order.

Creating sets

Curly braces {} are used to create sets: {values separated by commas}

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}

Accessing data by testing membership

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}

print("apple" in fruits_set)       # Output: True
print("strawberry" in fruits_set)  # Output: False
  • The keyword in is used to test for membership, i.e. if an element can be found in the set: a_value in a_set

Removing duplicates

fruits_set = {"apple", "grape", "cherry", "orange", "cherry", "orange", "melon"} # set declared with duplicate values

print(fruits_set)   # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}

## Removing duplicates from a list step by step
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"] # list with duplicate values
fruits_set = set(fruits_list) # convert to set: {'melon', 'apple', 'orange', 'cherry', 'grape'}
print(fruits_set) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
deduplicated_fruits_list = list(fruits_set) # This is the same as list(set(fruits_list))
print(dedupliated_frits_list) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}

# Removing duplicates from a list all in one line
print(list(set(fruits_list))) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
  • Even when a set is declared with duplicate values, it will only have one instance of the value
  • set(a_list) can be used to remove duplicates from lists. If you still wish to use this as a list, you will need to convert it back to a list: list(set(a_list))

Inserting items

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
fruits_set.add("strawberry") # Add item (no assumptions about order)
print(fruits_set) # Output: {"apple", "grape", "cherry", "orange", "melon", "strawberry"}
  • The add() function adds a value to the set without any assumptions about order (think of it as just throwing something on a pile)

Removing items

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
my_set.remove("orange")        # Remove specific item
  • The remove() function removes a specific (unique) value from the set

Checking size and membership

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
print(len(fruits_set)) # Output: 5
print("apple" in fruits_set) # Output: True
print("strawberry" in fruits_set) # Output: False
  • The len function gets the length of the set, i.e. the number of elements in it.
  • The keyword in is used to test for membership, i.e. if an element can be found in the set: a_value in a_set

Use sets when:

  • You need to remove duplicates
  • You want to quickly check if items exist
  • Order doesn’t matter

4. Dictionaries {key: value} - Key-Value Pairs

Dictionaries store information as key-value pairs for fast lookups.

Creating dictionaries

Curly braces {} with colons : are used to create dictionaries: {key:value pairs seaparated by commas}

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}

Accessing data with keys

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}

print(fruits_dict["apple"])     # Output: tree
print(fruits_dict["strawberry"])     # KeyError!

# Safe access with .get() function - returns None
print(fruits_dict.get("apple"))     # Output: tree
print(fruits_dict.get("strawberry"))     # Output: None

# Safe access with .get() function and default
print(fruits_dict.get("apple", "Not found"))  # Output: tree
print(fruits_dict.get("strawberry", "Can't find that fruit"))  # Output: Can't find that fruit
  • Square brackets with a key are used to access values stored in a dictionary: a_dictionary[a_key].
  • The get function allows “safe access” to a value. This means that if the key is not found in the dictionary (and hence can’t be used to access a value), a default value is returned: : a_dictionary.get(a_key, default_value) returns default_value if a_key is not found
  • If you don’t specify a default value with safe access, None is returned: a_dictionary.get(key) returns None
  • Without safe access, an error would be thrown

Remember that the key has to be given precisely (e.g. the letters have to be the correct case) for it to work, so in the example above, fruits["Apple"] would throw an error because the first letter of "Apple" needs to be lower case, i.e. fruits["apple"]

Inserting items and editing values

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
fruits_dict["strawberry"] = "vine"    # Add key-value pair
fruits_dict["strawberry"] = "ground"  # Assign new value
  • To insert an item (key-value pair) into a dictionary, you use the same syntax as you did for accessing a value with the square brackets and key, but assign this to a value: a_dictionary[a_key] = a_value
  • The exact same syntax is used to assign a new value to that key. Each dictionary key is unique (think of it as a unique address to access the value), so the previous value is overwritten

Removing items

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
del fruits_dict["grape"] # Remove key-value pair {"grape": "vine"}
print(fruits_dict) # Output: {"apple": "tree", "cherry": "tree", "orange": "tree", "melon": "vine"}
  • The del command removes a key-value pair from a dictionary by referencing it with the key: del a_dictionary[a_key]

Checking size and membership

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
print(len(fruits_dict))    # Output: 5
print("apple" in fruits_dict.keys()) # Key exists, Output: True
print("apple" in fruits_dict) # Key exists, Output: True
print("strawberry" in fruits_dict) # Key does not exist, Output: False
print("tree" in fruits_dict) # "tree" is a value, not a key, Output: False
print("tree" in fruits_dict.values()) # Value exists, Output: True
  • The keyword in is used to test for membership and when applied to the dictionary, it applies to the keys, i.e. if an element can be found in the set: a_value in a_set

Use dictionaries when:

  • You need to look up information by a specific key
  • You want to associate related pieces of data
  • You need fast access to specific items

Iterating Through Data Structures

Lists and Tuples

E.g.

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
for fruit in fruits_list:
    print(f"Fruit: {fruit}")

Output:

Fruit: apple
Fruit: grape
Fruit: cherry
Fruit: orange
Fruit: cherry
Fruit: orange
Fruit: melon

The enumerate() function returns both the index of the element and the value

fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
# With index
for i, fruit in enumerate(fruits_list):
    print(f"Fruit {i}: {fruit}") 

Output:

Fruit 0: apple
Fruit 1: grape
Fruit 2: cherry
Fruit 3: orange
Fruit 4: cherry
Fruit 5: orange
Fruit 6: melon

Sets

fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
for fruit in fruits_set:
    print(f"Fruit: {fruit}")

Output:

Fruit: melon
Fruit: apple
Fruit: orange
Fruit: cherry
Fruit: grape

Dictionaries

If you use the same syntax as you did for lists and tuples for iterating through a dictionary, you iterate through the keys. In order to iterate through each key-value pair, you need to use .items(), and if you want to iterate through just the values, you use .values().

Iterating over keys

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}

for k in fruits_dict:
    print(f"Fruit: {k}")

#Output:

Fruit: apple
Fruit: grape
Fruit: cherry
Fruit: orange
Fruit: melon

Iterating over values

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}

for v in fruits_dict.values():
    print(f"Category: {v}")

#Output:

Category: tree
Category: vine
Category: tree
Category: tree
Category: vine

Iterating over key-value pairs

fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}

for k, v in fruits_dict.items():
    print(f"{k}: {v}")

Output:

apple: tree
grape: vine
cherry: tree
orange: tree
melon: vine

Nested Data Structures

You can put data structures inside other data structures for complex information. Here are some common examples.

List of dictionaries

{"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
# List of dictionaries
fruit_data = [
    {"name": "apple", "common category": "tree"},
    {"name": "grape", "grade": "vine"},
    {"name": "cherry", "grade": "tree"}
]

Dictionary with lists

# Dictionary with lists
fruits = {
    "apple": ["tree", "pome"],
    "grape": ["vine", "berry"],
    "cherry": ["tree", "drupe"]
}

Dictionary with mixed types

# Dictionary with mixed types
fruit_data = {
    "name": "apple",
    "categories": ["tree", "pome"],
    "locations_grown_globalproportion": {"China": 0.48, "USA": 0.05, "Poland": 0.03}
}

Choosing the Right Data Structure

As a programmer, you will need to make decisions about which data structure to use for a given purpose. Choose based on how you need to access and modify your data.

  • Lists give you flexile ordered collections that allow you to add/remove elements and easily modify individual elements
  • Tuples allow you to impose a fixed structure on the daya and are efficient for ordered collections that shouldn’t change
  • Sets allow you to remove duplicates and test for membership
  • Dictionaries allow you to define relationships between values and quickly look values up

There are often trade-offs between different factors, e.g. efficiency vs readability, fast insertion vs fast retrieval, but here are some broad decision factors. It is also likely that complex, real-world data is likely to involve nested structures involving different data structures so it is important to understand how to exploit the benefits of each of them.

Decision Guide

Need ordered data that changes?List

schedule = ["breakfast", "work", "lunch", "shopping", "call Sam", "dinner", "watch tv"]
shopping_list = ["milk", "bread", "eggs"]

Need ordered/structured data that stays the same?Tuple

rgb_color = (255, 128, 0)
coordinates = (-124, 89)

Need unique items only?Set

unique_colours = {"green", "blue", "red"}
unique_numbers = {3, 7, 9}

Need to look up information by name/key?Dictionary

user_profile = {"username": "alice", "email": "alice@email.com"}
country_aliases = {"Great Britain": "UK", "United Kingdom": "UK", "United States": "USA", "US": "USA"}


Practice Exercises

Solidify your understanding of data structures with these exercises.

Q1: List manipulation

Write a program that creates a list of numbers from 1 to 10, then:

  • Print the first 3 numbers
  • Print the last 3 numbers
  • Print every other number
  • Print the list in reverse

Expected output:

First 3: [1, 2, 3]
Last 3: [8, 9, 10]
Every other: [1, 3, 5, 7, 9]
Reversed: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Example solution
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(f"First 3: {numbers[:3]}")
print(f"Last 3: {numbers[-3:]}")
print(f"Every other: {numbers[::2]}")
print(f"Reversed: {numbers[::-1]}")

Q2: Dictionary lookup

Create a dictionary that maps country names to their capitals. Ask the user for a country name and print its capital. If the country isn’t in your dictionary, print “Country not found”.

Expected output:

Enter a country: France
The capital of France is Paris

Enter a country: Atlantis
Country not found
Example solution
capitals = {
    "France": "Paris",
    "Japan": "Tokyo",
    "Brazil": "Brasília",
    "Egypt": "Cairo",
    "Australia": "Canberra"
}

country = input("Enter a country: ")

if country in capitals:
    print(f"The capital of {country} is {capitals[country]}")
else:
    print("Country not found")

# Alternative using .get()
capital = capitals.get(country)
if capital:
    print(f"The capital of {country} is {capital}")
else:
    print("Country not found")

Q3: Remove duplicates

Write a program that takes a list with duplicate items and removes all duplicates, then prints both the original list and the deduplicated list.

Expected output:

Original: ['France', 'Spain', 'France', 'Spain', 'Brazil', 'Egypt', 'France']
Unique items: ['France', 'Spain', 'Brazil', 'Egypt']
Number of unique items: 4
Example solution
countries = ['France', 'Spain', 'France', 'Spain', 'Brazil', 'Egypt', 'France']

print(f"Original: {countries}")

# Remove duplicates using set
unique_countries = list(set(countries))

print(f"Unique countries: {unique_countries}")
print(f"Number of unique countries: {len(unique_countries)}")

Q4: Counting items

Write a program that counts how many times each fruit appears in a list and stores the counts in a dictionary.

Expected output:

Country counts: {'France': 3, 'Spain': 2, 'Brazil': 1, 'Egypt': 1}
Most common country: France (3 times)
Example solution
countries = ['France', 'Spain', 'France', 'Brazil', 'Spain', 'Egypt', 'France']

# Count occurrences
country_counts = {}
for country in countries:
    if country in country_counts:
        country_counts[country] = country_counts[country] + 1
    else:
        country_counts[country] = 1

print(f"Country counts: {fruit_counts}")

# Find most common
max_count = 0
most_common = ""
for country, count in country_counts.items():
    if count > max_count:
        max_count = count
        most_common = country

print(f"Most common country: {most_common} ({max_count} times)")

Q5: Nested data structures

Create a dictionary where each student’s name maps to a dictionary containing their grades in different subjects. Then calculate and print each student’s average grade.

Expected output:

Alice's average: 88.33
Bob's average: 85.00
Charlie's average: 91.67
Example solution
students = {
    "Alice": {"Math": 85, "English": 92, "Science": 88},
    "Bob": {"Math": 78, "English": 88, "Science": 89},
    "Charlie": {"Math": 95, "English": 90, "Science": 90}
}

for student, grades in students.items():
    total = 0
    count = 0
    
    for subject, grade in grades.items():
        total = total + grade
        count = count + 1
    
    average = total / count
    print(f"{student}'s average: {average:.2f}")

Alternative solution using built-in functions (sum() and len()):

students = {
    "Alice": {"Math": 85, "English": 92, "Science": 88},
    "Bob": {"Math": 78, "English": 88, "Science": 89},
    "Charlie": {"Math": 95, "English": 90, "Science": 90}
}

for student, grades in students.items():
    average = sum(grades.values()) / len(grades)
    print(f"{student}'s average: {average:.2f}")