Introduction to Data Structures
What are Data Structures?
Data structures provide different ways to organize and store multiple pieces of data in your program. You can think of them as different types of containers - each with a different way of accessing information and suited for specific purposes.
E.g.
scores = [85, 92, 78, 96] # list
languages = ("English", "Spanish", "German", "Chinese", "Arabic") # tuple
translations = {"Dog": "Hund", "Cat": "Katze", "House": "Haus"} # dictionary
scoresuses a list (in Python represented by square brackets[]) - modifiable sequence of valueslanguagesuses a tuple (in Python represented by round brackets()) - fixed sequence of valuestranslationsuses a dictionary (in Python represented by curly braces{}) - mappings of values (“Dog”, “Cat”, “House”) to keys (“Hund”, “Katze”, “Haus”)
Four important data structures
Python provides four main built-in data structures, each with different characteristics:
1. Lists [] - Ordered and Changeable
Lists store items in order and allow you to change them after creation. Each item can be accessed by an index representing the order in which it was inserted. In Python, the index starts from 0 (i.e. the first item in the list is accessed with index 0 and the second is accessed with index 1).
Creating lists
Square brackets [] are used to create lists: [values separated by commas]
fruits_list = ["apple", "grape", "cherry", "orange", "orange", "orange", "melon"]
Accessing data with index numbers
Lists (and tuples) allow you to access individual items or groups of items using their position.
- Square brackets are used to access with the index:
a_list[index] - Indexing starts at
0(first item is[0], second is[1], etc.) - Negative indices count from the end of the list (
-1is last,-2is second-to-last, etc.) - A colon is used to specify a range of indices (slicing, see below):
a_list[start_index:end_index] - In index ranges,
end_indexis excluded from the range (a_list[0:2]only includesa_list[0]anda_list[1]) - Leaving
start_indexorend_indexblank means “the rest of the list” sincestart_indexdefaults to0whileend_indexdefaults to the end of the list
Single item access
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
print(fruits_list[0]) # First item: apple
print(fruits_list[2]) # Third item: cherry
print(fruits_list[-1]) # Last item: melon
print(fruits_list[-2]) # Second from end: orange
- Square brackets are used to access with the index, e.g.
fruits_list[2]ischerry - Indexing starts at
0, e.g.fruits_list[0]isapple - Negative indices count from the end of the list, e.g.
fruits_list[-1]ismelon
Slicing (getting multiple items)
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
print(fruits_list[1:3]) # Items at index 1 and 2: ["grape", "cherry"]
print(fruits_list[:3]) # First three items (index 0, 1, 2): ["apple", "grape", "cherry"]
print(fruits_list[2:]) # From index 2 to end: ["cherry", "orange", "cherry", "orange", "melon"]
print(fruits_list[-3:]) # Last three items: ["cherry", "orange", "melon"]
print(fruits_list[:-2]) # Everything except last two: ["apple", "grape", "cherry", "orange", "cherry"]
print(fruits_list[::2]) # Every second item: ["apple", "cherry", "cherry", "melon"]
print(fruits_list[::-1]) # List in reverse: ["melon", "orange", "cherry", "orange", "cherry", "grape", "apple"]
- A colon is used to specify a range of indices, e.g.
fruits_list[1:3] - In index ranges,
end_indexis excluded from the range, e.g.fruits[1:3]only includesfruits_list[1]andfruits_list[2], i.e.["grape", "cherry"] - Leaving
start_indexorend_indexblank means “the rest of the list”, e.g.fruits_list[2:]means everything fromfruits_list[2], i.e.["cherry", "orange", "orange", "orange", "melon"] - An additional colon can be used to specify a step size, e.g.
fruits_list[::2]includes every second element (including the first element), i.e.["apple", "cherry", "orange", "melon"]
Common slicing patterns:
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# Get first N items
first_three = numbers[:3] # [0, 1, 2]
# Get last N items
last_three = numbers[-3:] # [7, 8, 9]
# Get everything except first N
skip_first_two = numbers[2:] # [2, 3, 4, 5, 6, 7, 8, 9]
# Get everything except last N
skip_last_two = numbers[:-2] # [0, 1, 2, 3, 4, 5, 6, 7]
# Get middle section
middle = numbers[3:7] # [3, 4, 5, 6]
# Get every other item (with step)
evens = numbers[::2] # [0, 2, 4, 6, 8]
# Reverse a list
reversed_list = numbers[::-1] # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Inserting items
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
fruits_list.append("grape") # Add to end: ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon", "grape"]
fruits_list.insert("grape", 2) # Insert at position: ["apple", "grape", "grape", "cherry", "orange", "cherry", "orange", "melon", "grape"]
- The
append()function adds a value to the end of the list - The
insert()function inserts a value at a specfic position in the list:a_list.insert(value_to_insert, index)
Removing Items
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
fruits_list.remove("orange") # Remove specific value
fruits_list.remove("strawberry") # ValueError: list.remove(x): x not in list
print(fruits) # Output: ['apple', 'grape', 'cherry', 'cherry', 'orange', 'melon']
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
last_fruit = fruits_list.pop() # Remove last item, save it in last_fruit
print(last_fruit) # melon
print(fruits_list) # Output: ['apple', 'grape', 'cherry', 'orange', 'cherry', 'orange']
- The
remove()function removes a specific value from the list. If there is more than one instance of the value, the first instance will be removed first, e.g.fruits_list.remove("orange")gives["apple", "grape", "cherry", "cherry", "orange", "melon"] - The
pop()function removes the last item and returns it so you can save it to a variable.
Checking size and membership
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
print(len(fruits_list)) # Output: 7
print("apple" in fruits_list) # Output: True
print("strawberry" in fruits_list) # Output: False
- The
lenfunction gets the length of the list, i.e. the number of elements in it - The keyword
inis used to test for membership, i.e. if an element can be found in the list:a_value in a_list
Use lists when:
- Order matters
- You need to change items later
- You want to allow duplicate values
- You need to access items by position
2. Tuples () - Ordered but Unchangeable
As in the case of lists, tuples store items in order. But they cannot be changed after creation. Accessing data works very much in the same way as lists.
Creating tuples
Round brackets () are used to create tuples: (values separated by commas)
fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")
Accessing data with index numbers
fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")
print(fruits_tuple[0]) # First item: apple
print(fruits_tuple[2]) # Third item: cherry
print(fruits_tuple[-1]) # Last item: melon
print(fruits_tuple[-2]) # Second from end: orange
print(fruits_tuple[1:3]) # Items at index 1 and 2: ("grape", "cherry")
print(fruits_tuple[:3]) # First three items (index 0, 1, 2): ("apple", "grape", "cherry")
print(fruits_tuple[2:]) # From index 2 to end: ("cherry", "orange", "cherry", "orange", "melon")
print(fruits_tuple[-3:]) # Last three items: ("cherry", "orange", "melon")
print(fruits_tuple[:-2]) # Everything except last two: ("apple", "grape", "cherry", "orange", "cherry")
print(fruits_tuple[::2]) # Every second item: ("apple", "cherry", "cherry", "melon")
print(fruits_tuple[::-1]) # List in reverse: ("melon", "orange", "cherry", "orange", "cherry", "grape", "apple")
Checking size and membership
fruits_tuple = ("apple", "grape", "cherry", "orange", "cherry", "orange", "melon")
print(len(fruits_tuple)) # Output: 7
print("apple" in fruits_tuple) # Output: True
print("strawberry" in fruits_tuple) # Output: False
- The
lenfunction gets the length of the tuple, i.e. the number of elements in it - The keyword
inis used to test for membership, i.e. if an element can be found in the tuple:a_value in a_tuple
Use tuples when:
- Order matters but data shouldn’t change
- You want to ensure data stays the same
- You need a lightweight, efficient container
3. Sets {} - Unique Items Only
Sets store unique items with no duplicates and no guaranteed order.
Creating sets
Curly braces {} are used to create sets: {values separated by commas}
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
Accessing data by testing membership
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
print("apple" in fruits_set) # Output: True
print("strawberry" in fruits_set) # Output: False
- The keyword
inis used to test for membership, i.e. if an element can be found in the set:a_value in a_set
Removing duplicates
fruits_set = {"apple", "grape", "cherry", "orange", "cherry", "orange", "melon"} # set declared with duplicate values
print(fruits_set) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
## Removing duplicates from a list step by step
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"] # list with duplicate values
fruits_set = set(fruits_list) # convert to set: {'melon', 'apple', 'orange', 'cherry', 'grape'}
print(fruits_set) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
deduplicated_fruits_list = list(fruits_set) # This is the same as list(set(fruits_list))
print(dedupliated_frits_list) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
# Removing duplicates from a list all in one line
print(list(set(fruits_list))) # Output: {'melon', 'apple', 'orange', 'cherry', 'grape'}
- Even when a set is declared with duplicate values, it will only have one instance of the value
set(a_list)can be used to remove duplicates from lists. If you still wish to use this as a list, you will need to convert it back to a list:list(set(a_list))
Inserting items
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
fruits_set.add("strawberry") # Add item (no assumptions about order)
print(fruits_set) # Output: {"apple", "grape", "cherry", "orange", "melon", "strawberry"}
- The
add()function adds a value to the set without any assumptions about order (think of it as just throwing something on a pile)
Removing items
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
my_set.remove("orange") # Remove specific item
- The
remove()function removes a specific (unique) value from the set
Checking size and membership
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
print(len(fruits_set)) # Output: 5
print("apple" in fruits_set) # Output: True
print("strawberry" in fruits_set) # Output: False
- The
lenfunction gets the length of the set, i.e. the number of elements in it. - The keyword
inis used to test for membership, i.e. if an element can be found in the set:a_value in a_set
Use sets when:
- You need to remove duplicates
- You want to quickly check if items exist
- Order doesn’t matter
4. Dictionaries {key: value} - Key-Value Pairs
Dictionaries store information as key-value pairs for fast lookups.
Creating dictionaries
Curly braces {} with colons : are used to create dictionaries: {key:value pairs seaparated by commas}
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
Accessing data with keys
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
print(fruits_dict["apple"]) # Output: tree
print(fruits_dict["strawberry"]) # KeyError!
# Safe access with .get() function - returns None
print(fruits_dict.get("apple")) # Output: tree
print(fruits_dict.get("strawberry")) # Output: None
# Safe access with .get() function and default
print(fruits_dict.get("apple", "Not found")) # Output: tree
print(fruits_dict.get("strawberry", "Can't find that fruit")) # Output: Can't find that fruit
- Square brackets with a key are used to access values stored in a dictionary:
a_dictionary[a_key]. - The
getfunction allows “safe access” to a value. This means that if the key is not found in the dictionary (and hence can’t be used to access a value), a default value is returned: :a_dictionary.get(a_key, default_value)returnsdefault_valueifa_keyis not found - If you don’t specify a default value with safe access,
Noneis returned:a_dictionary.get(key)returnsNone - Without safe access, an error would be thrown
Remember that the key has to be given precisely (e.g. the letters have to be the correct case) for it to work, so in the example above, fruits["Apple"] would throw an error because the first letter of "Apple" needs to be lower case, i.e. fruits["apple"]
Inserting items and editing values
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
fruits_dict["strawberry"] = "vine" # Add key-value pair
fruits_dict["strawberry"] = "ground" # Assign new value
- To insert an item (key-value pair) into a dictionary, you use the same syntax as you did for accessing a value with the square brackets and key, but assign this to a value:
a_dictionary[a_key] = a_value - The exact same syntax is used to assign a new value to that key. Each dictionary key is unique (think of it as a unique address to access the value), so the previous value is overwritten
Removing items
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
del fruits_dict["grape"] # Remove key-value pair {"grape": "vine"}
print(fruits_dict) # Output: {"apple": "tree", "cherry": "tree", "orange": "tree", "melon": "vine"}
- The
delcommand removes a key-value pair from a dictionary by referencing it with the key:del a_dictionary[a_key]
Checking size and membership
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
print(len(fruits_dict)) # Output: 5
print("apple" in fruits_dict.keys()) # Key exists, Output: True
print("apple" in fruits_dict) # Key exists, Output: True
print("strawberry" in fruits_dict) # Key does not exist, Output: False
print("tree" in fruits_dict) # "tree" is a value, not a key, Output: False
print("tree" in fruits_dict.values()) # Value exists, Output: True
- The keyword
inis used to test for membership and when applied to the dictionary, it applies to the keys, i.e. if an element can be found in the set:a_value in a_set
Use dictionaries when:
- You need to look up information by a specific key
- You want to associate related pieces of data
- You need fast access to specific items
Iterating Through Data Structures
Lists and Tuples
E.g.
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
for fruit in fruits_list:
print(f"Fruit: {fruit}")
Output:
Fruit: apple
Fruit: grape
Fruit: cherry
Fruit: orange
Fruit: cherry
Fruit: orange
Fruit: melon
The enumerate() function returns both the index of the element and the value
fruits_list = ["apple", "grape", "cherry", "orange", "cherry", "orange", "melon"]
# With index
for i, fruit in enumerate(fruits_list):
print(f"Fruit {i}: {fruit}")
Output:
Fruit 0: apple
Fruit 1: grape
Fruit 2: cherry
Fruit 3: orange
Fruit 4: cherry
Fruit 5: orange
Fruit 6: melon
Sets
fruits_set = {"apple", "grape", "cherry", "orange", "melon"}
for fruit in fruits_set:
print(f"Fruit: {fruit}")
Output:
Fruit: melon
Fruit: apple
Fruit: orange
Fruit: cherry
Fruit: grape
Dictionaries
If you use the same syntax as you did for lists and tuples for iterating through a dictionary, you iterate through the keys. In order to iterate through each key-value pair, you need to use .items(), and if you want to iterate through just the values, you use .values().
Iterating over keys
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
for k in fruits_dict:
print(f"Fruit: {k}")
#Output:
Fruit: apple
Fruit: grape
Fruit: cherry
Fruit: orange
Fruit: melon
Iterating over values
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
for v in fruits_dict.values():
print(f"Category: {v}")
#Output:
Category: tree
Category: vine
Category: tree
Category: tree
Category: vine
Iterating over key-value pairs
fruits_dict = {"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
for k, v in fruits_dict.items():
print(f"{k}: {v}")
Output:
apple: tree
grape: vine
cherry: tree
orange: tree
melon: vine
Nested Data Structures
You can put data structures inside other data structures for complex information. Here are some common examples.
List of dictionaries
{"apple": "tree", "grape": "vine", "cherry": "tree", "orange": "tree", "melon": "vine"}
# List of dictionaries
fruit_data = [
{"name": "apple", "common category": "tree"},
{"name": "grape", "grade": "vine"},
{"name": "cherry", "grade": "tree"}
]
Dictionary with lists
# Dictionary with lists
fruits = {
"apple": ["tree", "pome"],
"grape": ["vine", "berry"],
"cherry": ["tree", "drupe"]
}
Dictionary with mixed types
# Dictionary with mixed types
fruit_data = {
"name": "apple",
"categories": ["tree", "pome"],
"locations_grown_globalproportion": {"China": 0.48, "USA": 0.05, "Poland": 0.03}
}
Choosing the Right Data Structure
As a programmer, you will need to make decisions about which data structure to use for a given purpose. Choose based on how you need to access and modify your data.
- Lists give you flexile ordered collections that allow you to add/remove elements and easily modify individual elements
- Tuples allow you to impose a fixed structure on the daya and are efficient for ordered collections that shouldn’t change
- Sets allow you to remove duplicates and test for membership
- Dictionaries allow you to define relationships between values and quickly look values up
There are often trade-offs between different factors, e.g. efficiency vs readability, fast insertion vs fast retrieval, but here are some broad decision factors. It is also likely that complex, real-world data is likely to involve nested structures involving different data structures so it is important to understand how to exploit the benefits of each of them.
Decision Guide
Need ordered data that changes? → List
schedule = ["breakfast", "work", "lunch", "shopping", "call Sam", "dinner", "watch tv"]
shopping_list = ["milk", "bread", "eggs"]
Need ordered/structured data that stays the same? → Tuple
rgb_color = (255, 128, 0)
coordinates = (-124, 89)
Need unique items only? → Set
unique_colours = {"green", "blue", "red"}
unique_numbers = {3, 7, 9}
Need to look up information by name/key? → Dictionary
user_profile = {"username": "alice", "email": "alice@email.com"}
country_aliases = {"Great Britain": "UK", "United Kingdom": "UK", "United States": "USA", "US": "USA"}
Practice Exercises
Solidify your understanding of data structures with these exercises.
Q1: List manipulation
Write a program that creates a list of numbers from 1 to 10, then:
- Print the first 3 numbers
- Print the last 3 numbers
- Print every other number
- Print the list in reverse
Expected output:
First 3: [1, 2, 3]
Last 3: [8, 9, 10]
Every other: [1, 3, 5, 7, 9]
Reversed: [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
Example solution
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(f"First 3: {numbers[:3]}")
print(f"Last 3: {numbers[-3:]}")
print(f"Every other: {numbers[::2]}")
print(f"Reversed: {numbers[::-1]}")
Q2: Dictionary lookup
Create a dictionary that maps country names to their capitals. Ask the user for a country name and print its capital. If the country isn’t in your dictionary, print “Country not found”.
Expected output:
Enter a country: France
The capital of France is Paris
Enter a country: Atlantis
Country not found
Example solution
capitals = {
"France": "Paris",
"Japan": "Tokyo",
"Brazil": "Brasília",
"Egypt": "Cairo",
"Australia": "Canberra"
}
country = input("Enter a country: ")
if country in capitals:
print(f"The capital of {country} is {capitals[country]}")
else:
print("Country not found")
# Alternative using .get()
capital = capitals.get(country)
if capital:
print(f"The capital of {country} is {capital}")
else:
print("Country not found")
Q3: Remove duplicates
Write a program that takes a list with duplicate items and removes all duplicates, then prints both the original list and the deduplicated list.
Expected output:
Original: ['France', 'Spain', 'France', 'Spain', 'Brazil', 'Egypt', 'France']
Unique items: ['France', 'Spain', 'Brazil', 'Egypt']
Number of unique items: 4
Example solution
countries = ['France', 'Spain', 'France', 'Spain', 'Brazil', 'Egypt', 'France']
print(f"Original: {countries}")
# Remove duplicates using set
unique_countries = list(set(countries))
print(f"Unique countries: {unique_countries}")
print(f"Number of unique countries: {len(unique_countries)}")
Q4: Counting items
Write a program that counts how many times each fruit appears in a list and stores the counts in a dictionary.
Expected output:
Country counts: {'France': 3, 'Spain': 2, 'Brazil': 1, 'Egypt': 1}
Most common country: France (3 times)
Example solution
countries = ['France', 'Spain', 'France', 'Brazil', 'Spain', 'Egypt', 'France']
# Count occurrences
country_counts = {}
for country in countries:
if country in country_counts:
country_counts[country] = country_counts[country] + 1
else:
country_counts[country] = 1
print(f"Country counts: {fruit_counts}")
# Find most common
max_count = 0
most_common = ""
for country, count in country_counts.items():
if count > max_count:
max_count = count
most_common = country
print(f"Most common country: {most_common} ({max_count} times)")
Q5: Nested data structures
Create a dictionary where each student’s name maps to a dictionary containing their grades in different subjects. Then calculate and print each student’s average grade.
Expected output:
Alice's average: 88.33
Bob's average: 85.00
Charlie's average: 91.67
Example solution
students = {
"Alice": {"Math": 85, "English": 92, "Science": 88},
"Bob": {"Math": 78, "English": 88, "Science": 89},
"Charlie": {"Math": 95, "English": 90, "Science": 90}
}
for student, grades in students.items():
total = 0
count = 0
for subject, grade in grades.items():
total = total + grade
count = count + 1
average = total / count
print(f"{student}'s average: {average:.2f}")
Alternative solution using built-in functions (sum() and len()):
students = {
"Alice": {"Math": 85, "English": 92, "Science": 88},
"Bob": {"Math": 78, "English": 88, "Science": 89},
"Charlie": {"Math": 95, "English": 90, "Science": 90}
}
for student, grades in students.items():
average = sum(grades.values()) / len(grades)
print(f"{student}'s average: {average:.2f}")
