Introduction to Working with files and JSON

In real-world programs, you often need to save and load data so that you can leave and pick up from when the program last ran. This is called persistence. There are various ways of storing data, from simple text files or spreadsheets, to highly structured databases where data in each column have to adhere to stringent formatting constraints.

Reading and writing text files

A text file is a simple file that stores data as plain text. You can use Python to read from and write to text files using the built-in open() function.

Reading from a Text File

Read in the text in one block

with open("a_text_file.txt", "r") as file:
    content = file.read()
    print(content)
  • with open("a_text_file.txt", "r") as file: - Opens the file for reading.
    • "a_text_file.txt" is the file name
    • "r" means “read mode”
    • with automatically closes the file with a file context manager when done
  • "r" means “read mode”.
  • with automatically closes the file when done.

Read in the text line by line

with open("a_text_file.txt", "r") as file:
    for line in file:
        print(line.strip())
  • with open("a_text_file.txt", "r") as file: - Opens the file for reading.
    • "a_text_file.txt" is the file name
    • "r" means “read mode”
    • with automatically closes the file with a file context manager when done
  • "r" means “read mode”.
  • with automatically closes the file when done.
  • .strip() removes extra spaces and newlines.

Handling spreadsheet-style data files with a separator, e.g. csv files

data_dictionary = {}
with open("data/comma_separated_file.txt", "r") as file:
    for line in file:
        # Remove whitespace and split by comma
        line = line.strip()
        if line:  # Skip empty lines
            ## Assume you know there are three columns
            key_column, value_column, other_column = line.split(",")
            data_dictionary[key_column] = value_column
  • with open("data/comma_separated_file.txt", "r") as file: - Opens the file for reading.
    • "data/comma_separated_file.txt" is the file path
    • "r" means “read mode”
    • with automatically closes the file with a file context manager when done
  • for line in file: - Reads each line from the file one by one
  • line.strip() - Removes whitespace (spaces, newlines) from the line
  • line.split(",") - Splits the line into the columns at the separator you have chosen (in this case a comma)
    • "dog,mammal,animal" becomes ["dog", "mammal", "animal"]
    • key_column, value_column, other_column = line.split(",") - Unpacks the two parts into variables

Writing to a Text File

append mode: {#append-mode} Write an entire block of text (a_string).

with open("a_text_file.txt", "a") as file:
    file.write(a_string)
  • "a" means “append mode”
    • Adds to the end of the file without writing over the existing content
    • If the file doesn’t exist, it is created
  • file.write(a_string)
    • Writes a_string

Write a line of text:

with open("a_text_file.txt", "a") as file:
    file.write(f"{a_string}\n")
  • "a" means “append mode”
    • Adds to the end of the file without writing over the existing content
    • If the file doesn’t exist, it is created
  • file.write(f"{a_string}\n")
    • Writes a_string
    • \n adds a new line.

write mode:

with open("flashcards.txt", "w") as file:
    file.write(a_string)
  • "w" means “write mode”
    • Overwrites the file.
    • If the file doesn’t exist, it is created
  • file.write(a_string)
    • Writes a_string

File context managers and the with statement

It is generally considered best practice to use a context manager when for working with files in Python, since it reduced the likelihood that you will leave files open when the program terminates.

The Problem with Manual File Handling

You could open and close files manually like this:

# ❌ Not recommended - easy to forget to close!
file = open("flashcards.txt", "r")
data = file.read()
file.close()  # What if an error happens before this line?

Problems:

  • If an error occurs before file.close(), the file stays open
  • You might forget to call .close()
  • Open files consume system resources

The Solution: Context Managers with with

The with statement automatically handles opening and closing files, e.g.

# With a context manager, the file is automatically closed
with open("a_file.txt", "r") as file:
    # 1. Opens the file
    # 2. Assigns it to 'file'
    # 3. Runs your code inside the block
    content = file.read()
    # File is open and available here
    # Content is read into the program's working memory
    print(content)  # ✅ Works - file is open

# File is automatically closed here, even if errors occurred
# Even though the file is now closed, the content is still available in the program's memory
print(content)  # ✅ Still works! The data was copied to the variable

The pattern:

with open(filename, mode) as variable_name:
    # Do something with the file
    # File is open and available here
    
# File is automatically closed here
# Any variables into which the data from the file was read 
# will still continue to hold the data

Why use with?

Automatic cleanup - File closes even if errors occur
Cleaner code - No need to remember .close()
Best practice - Used by professional Python developers
Resource efficient - Prevents file handle leaks

Example: Reading with Context Manager

# Read entire file content
with open("a_file.txt", "r") as file:
    content = file.read()
    print(content)

# File is already closed here - safe!

Example: Writing with Context Manager

# Write to a file
with open("a_file.txt", "w") as file:
    file.write("Dog\n")
    file.write("Cat\n")

# File is saved and closed automatically

Multiple Files at Once

You can even open multiple files in one with statement:

with open("input.txt", "r") as input_file, open("output.txt", "w") as output_file:
    for line in input_file:
        output_file.write(line)

# Both files automatically closed

Context managers Beyond Files

The with statement isn’t just for files. It is used for any resource that needs cleanup, for example when connecting to a database or a website.

# Database connection
with database.connect() as connection:
    connection.execute("SELECT * FROM users")

# Network connection to URL
with requests.get(url) as response:
    data = response.json()

What is JSON?

JSON (JavaScript Object Notation) is a standard format that is used for data exchange between apps and websites. It is used to store structured data in a dictionary-like structure so that it can be read reliably by computers.

E.g.

{
  "Dog": "Hund",
  "Cat": "Katze"
}

For many purposes, this makes it easier to work with than text files, since you can immediately get dictionary-like structures out without having to add lots of code to process the special characters (e.g. {, }) and separators (e.g. ,).


Reading and writing JSON Files

Python has a built-in json module for working with JSON files.

dump() and load() JSON file operations

The json module provides two main functions for file operations:

json.dump(data, file)

  • Writes Python data (dictionaries, lists) to a JSON file
  • Converts Python objects → JSON format → saves to file

json.load(file)

  • Reads JSON data from a file and converts it to Python data
  • Reads from file → converts JSON format → returns Python objects

Writing Data to a JSON File

import json

flashcards = {"Dog": "Hund", "Cat": "Katze"}
with open("flashcards.json", "w") as file:
    json.dump(flashcards, file)
    # Takes the flashcards dictionary and writes it to the file as JSON

Reading Data from a JSON File

import json

with open("flashcards.json", "r") as file:
    flashcards = json.load(file)
print(flashcards)

Adding to a JSON File

import json

# 1. Load existing data
try:
    with open("data.json", "r") as file:
        data_list = json.load(file)
except FileNotFoundError:
    data_list = []

# 2. Modify in memory
data_list.append(new_item)

# 3. Save everything back
with open("data.json", "w") as file:
    json.dump(data_list, file, indent=2)

dumps() and loads() JSON string operations (not for files)

As well as the dump() and load() file operations, json module provides two functions, dumps() and loads() (note the s) for converting between JSON strings and Python data:

json.dumps(data)

  • Converts Python data to a JSON string (the ‘s’ stands for ‘string’)

json.loads(string)

  • Converts a JSON string to Python data

Nested JSON Structures

JSON can contain nested structures, such as dictionaries within dictionaries, lists within dictionaries, or any combination of these. Nested structures enable you to organize complex data hierarchically.

Simple vs. Nested JSON

Simple (flat) structure:

{
  "Dog": "Hund",
  "Cat": "Katze"
}

In this simple dictionary structure, the keys are the English animal names ("Dog", "Cat").

Nested structure:

{
  "Dog": {
    "translation": "Hund",
    "category": "Animals",
  },
  "Cat": {
    "translation": "Katze",
    "category": "Animals"
  }
}

This nested structure consists of a dictionary within a dictionary. The outer dictionary has the English animal name ("Dog", "Cat") as the keys, while the inner dictionaries have the kind of information you are storing about the animal ("translation", "category").

Mixing Single Values and Lists

A common pattern is to have some fields as single values and others as lists:

{
  "Dog": {
    "translations": ["Hund", "Köter"],
    "category": "Animals",
    "examples": [
      "Der Hund bellt",
      "Ich habe einen Hund"
    ]
  },
  "Cat": {
    "translations": ["Katze", "Mieze"],
    "category": "Animals",
    "examples": [
      "Die Katze miaut",
      "Ich have eine Katze"
    ]
  }
}
  • "category" is a single value (string)
  • "translations" and "examples" are lists (multiple values)

Accessing Nested Data

import json

with open("a_file.json", "r") as file:
    data = json.load(file)

# Access single value
category = data["Dog"]["category"]
print(category)  # "Animals"

# Access list
translations = data["Dog"]["translations"]
print(translations)  # ["Hund", "Köter"]

# Access first item in list
first_translation = data["Dog"]["translations"][0]
print(first_translation)  # "Hund"

# Loop through list
for example in data["Dog"]["examples"]:
    print(example)
# Output:
# Der Hund bellt
# Ich habe einen Hund

Creating Nested Structures

import json

# Build nested structure
data = {}

data["Dog"] = {
    "translations": ["Hund", "Köter"],
    "category": "Animals",
    "examples": [
      "Der Hund bellt",
      "Ich habe einen Hund"
    ]
}

data["Cat"] = {
    "translations": ["Katze", "Mieze"],
    "category": "Animals",
    "examples": [
      "Die Katze miaut",
      "Ich have eine Katze"
  ]
}

# Save to file
with open("a_file.json", "w") as file:
    json.dump(data, file, indent=2)

Result in file:

{
  "Dog": {
    "translations": ["Hund", "Köter"],
    "category": "Animals",
    "examples": [
      "Der Hund bellt",
      "Ich habe einen Hund"
    ]
  },
  "Cat": {
    "translations": ["Katze", "Mieze"],
    "category": "Animals",
    "examples": [
      "Die Katze miaut",
      "Ich have eine Katze"
    ]
  }
}

List of Dictionaries

Another common pattern is a list where each item is a dictionary:

[
  {
    "event_id": 1,
    "date": "2024-10-31",
    "severity": 8,
    "staff_involved": ["Alexa", "Han", "Zoe"]
  },
  {
    "event_id": 2,
    "date": "2024-11-01",
    "severity": 9,
    "staff_involved": ["Sandra", "Ashish"]
  }
]

Accessing data:

import json

with open("a_file.json", "r") as file:
    data = json.load(file)

# Access first event
first_event = data[0]
print(first_event["severity"])  # 8

# Loop through all sessions
for event in data:
    print(f"Event {event['event_id']}: {event['severity']}")
# Output:
# Event 1: Severity 8
# Event 2: Severity 9

# Access list within dictionary
staff = data[0]["staff_involved"]
print(staff)  # ["Alexa", "Han", "Zoe"]

Error Handling with Files

Reading and writing files is a risky business. Sometimes files might not exist yet, or the data might be corrupted.

Use try and except to handle these situations gracefully:

import json
filename = "a_file.json"
try:
    with open(filename, "r") as file:
        content = json.load(file)
except FileNotFoundError:
    print(f"No file called {filename} found.")
except json.JSONDecodeError:
    print(f"File {filename} corrupted")