Python Data Structures


Previously, we covered the basic Python concepts required to begin writing legal Python code. In this Jupyter notebook, we will introduce additional fundamental concepts that form the basis of many Python programs. These concepts include the built-in Python data structures: string, tuple, list, and dictionary.


Python Data Structures

Python provides built-in support for a number of useful data structures, including the string, tuple, list, and dictionary:

String: A sequence of zero or more characters that are enclosed within either a pair of single quote characters, ', or a pair of double quote characters, ". A Python string is an instance of class str.

A string containing many characters

Tuple: An ordered sequence of zero or more values that are enclosed in parentheses, ( and ). The different values in a tuple are generally separated by commas, although this is not required. A Python tuple is an instance of class tuple.

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

List: An ordered sequence of zero or more values that are enclosed in square brackets, [ and ]. The different values in the list are separated by commas. A Python list is an instance of class list.

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Dictionary: An unordered collection of key-value pairs that are enclosed in curly braces, { and }. A key is separated from its corresponding value by a colon : and the different key-value entries in the collection are separated by commas. A Python dictionary is an instance of class dict.

d = {'name': "Alexander", 'age': 30, 'location': (102.1, 32.1)}


Of these four data structures, the list and the dictionary are mutable, which means the values stored in these structures can be changed. On the other hand, the string and the tuple are immutable, which means the values cannot be changed once created. Instead, a new data structure must be created, either explicitly by the programmer, or implicitly by the Python interpreter. All of these data structures can be displayed by using the Python built-in print function, which converts its arguments to a string, which is then sent to STDOUT (which in this course is generally the space in the notebook immediately following the print statement):

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(data)

The dictionary is a special data structure that maps keys to values, and thus the keys, which can be stored in any order, are used to access the values. The other three structures are ordered sequences, and thus individual elements can be accessed by specifying an index position within square brackets, [], with the caveat that Python is a zero-indexed language. Thus, given an ordered sequence data, the following accesses are legal:

  • a[0]: access the first value
  • a[1]: access the second value
  • a[-1]: access the last value
  • a[-2]: access the second to last value

Slicing

Python supports a rich array of techniques for extracting values from an ordered list beyond the single value access method, known as slicing. Given an ordered sequence data, the basic format is data[start:end:stride] where start and end are the starting and ending index values, respectively, and stride is the number of values to skip when iterating. If start or end are omitted, the default is the first and last value, while the default stride is one. A negative value can be used for either the start or the end index values, which indicates relative to the end value. These concepts are demonstrated in the following code block:

In [1]:
# Edit the start/end/stride values to learn slicing

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(data[0])
print(data[2:-2])
print(data[:-3:2])
1
[3, 4, 5, 6, 7, 8]
[1, 3, 5, 7]

Common Sequence Operations

In addition to slicing, the string, tuple, and list support several common sequence operations. Given a value v, integer n, and similar typed sequences s and t:

Operation Description
v in s True if v is in the sequence s, otherwise False
v not in s False if v is in the sequence s, otherwise True
s + t concatenation of s and t
s * n or n* s n shallow copies of s concatenated
len(s) the number of elements in the sequence s
min(s) the smallest elements in the sequence s
max(s) the largest of elements in the sequence s
s.count(v) number of times v appears in s

These methods are demonstrated below on the data list:

In [2]:
1 in data
Out[2]:
True
In [3]:
0 not in data
Out[3]:
True
In [4]:
data * 2
Out[4]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
In [5]:
data + [11, 12, 13, 14]
Out[5]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
In [6]:
print(len(data), min(data), max(data))
10 1 10

Strings

In Python, a string is a sequence of zero or more characters that are enclosed within either a pair of single quote characters, ', or a pair of double quote characters, ". While it might seem confusing to have two such similar representations, it can be quite handy, as this allows strings to be written that include either single quotes or double quote characters by simply using the opposite type to enclose the string itself:

"A string with a contraction such as don't or a possessive like Python's string."
'A string with a quote, "Four score and seven years ago ..."'

A string can span multiple lines by using either three single or double quote characters to enclose the sequence of characters, just like multi-line comments. Adjacent strings on the same line will be silently combined by the Python interpreter:

data = 'First Name ''Last Name '" Email Address"
print(data)

First Name Last Name  Email Address

A special string in Python is known as a raw string, where backslashes are not handled in a special way when processing the string. This is useful when constructing regular expressions or other strings that might include formatting information, like plot axes that use LaTeX formatting. To create a raw string, simply prefix the string with a lower-case 'r':

data =r'Text with a \ that is left unprocessed.'

The str class defines a large number of functions that can be used to process string data, including testing if the string only contains alphabetical values, numerical values, or alphanumeric values, and functions that can convert a string to all lowercase or uppercase characters. A full description of the string methods is available from the online Python Documentation or by using help(str) at a Python prompt or Jupyter Notebook cell.

Some of the more useful Python string functions are detailed below.

format:

The format method is used to create a new formatted string from a template string and substitution text. The classic example is a form letter, where specific fields are replaced by new data with every string. The format method replaces the previous % string formatting operator. In its basic form, the template string includes identified {} to indicate replacement string locations, and the format method takes arguments that are used to indicate the replacement text. For example,

'Hello {}, you are visitor #{}!'.format('Alexander', 23)

will return

'Hello Alexander, you are visitor #23!'

Alternatively, the curly braces can enclose a number that is used to find the matching variable for substitution in the format method. For example, the previous example could also be written as 'Hello {0}, you are visitor #{1}!'.format('Alexander', 23), or equivalently as 'Hello {1}, you are visitor #{0}!'.format(23, 'Alexander').

A recent modification to the Python language provides support for f-strings, where a variable name is specified within the curly-braces, allow direct variable replacement. The string formatting codes are still supported, but now follow the variable name, as opposed to the ordinal number. For example, the following code example can be written by using an f-string:

name = 'Alexander'
number = 23

f'Hello {name}, you are visitor #{number}!'

find:

The find method locates the first occurrence of a sub-string in the full string, and returns the index position of this first occurrence. For example,

"The brown dog jumped over the quick fox!".find("he")

returns 1.

split:

The split method is very powerful, and will tokenize a string into substrings based on the input arguments, which are whitespace characters by default. For example,

"The brown dog jumped over the quick fox!".split()

returns

['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']

strip:

The strip method is used to remove characters specified as input arguments to the method from the beginning and end of a string. By default, whitespace characters are removed. Two variants of this function: lstrip and rstrip remove leading or trailing characters, respectively.

"    Some text surrounded by white space characters    ".strip()

returns

'Some text surrounded by white space characters'

join:

While strings can be combined by using the + operator, this approach is slow for many additions since each addition requires the construction of a new string to hold the combined result. A more efficient string combination approach is to use the join method, which can quickly combine multiple strings that are contained in an iterable object such as a list or tuple together. The string you use to call the join method provides the glue text between each item in the iterable. For example, the following method will create a new string from a list of strings that are each separated by a comma and a single space character:

data = ['1', '2', '3', '4', '5', '6', '7', '8', '9']

", ".join(data)

returns

'1, 2, 3, 4, 5, 6, 7, 8, 9'

The following code block demonstrates several string operations that you can test, change, and execute.


In [7]:
text = ['The', 'brown', 'dog', 'jumped', 'over', 'the', 'quick', 'fox!']

newtext = " ".join(text)

print(newtext)
The brown dog jumped over the quick fox!

Student Exercise

In the empty Code cell below, first create a new string called 'mystring' that contains at least 12 characters (e.g., mystring = 'This is a demo string.'). Next, write two separate print functions that display the third through sixth characters and the last character from the mystring string.


In [ ]:
 

Another useful built-in function is the input function that can be used to obtain information from the user. The input method has a string argument that is displayed on STDOUT and reads characters from STDIN until a newline character is encountered. This is demonstrated in the following Code cell, where STDIN reads from the keyboard and STDOUT writes the notebook immediately following the respective Code cell.


In [8]:
name = input("Enter your Name: ")

print("Welcome {0}".format(name))
Enter your Name: Alexander
Welcome Alexander

List

A mutable sequence that can hold homogeneous data, [1, 2, 3, 4, 5] or heterogeneous data, [1, '2', 'Three', (4, 5)]. A list can be created in several different ways:

  1. []: An empty list
  2. [1]: A single valued list
  3. [1, 2, 3]: Comma-separated items
  4. list(): using the list class constructor

Since a list is mutable, a list can be changed by adding elements, removing elements, or simply changing existing elements in place. Lists are very powerful data structures and are used extensively in many Python programs. The following table presents some of the more commonly used list functions:

Function Description Example
append add an element to the end of the list data.append(11)
insert insert an element at the specified index data.insert(4, '4')
del delete the value at the specified index del data[4]
remove remove the element containing the value data.remove(11)
clear remove all elements in the list data.clear()
sort sorts list in place data.sort()
reverse reverses list in place data.reverse()

By default, assigning a list to a new variable results in a shallow copy, which means that both variables point to the same underlying list and any changes to one results in changes to the other. For example, after this set of operations:

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
d = data
d[0] = -1

data[0] now contains the value -1. To obtain a deep copy, use the slice notation without any values. For example, after this set of operations:

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
d = data[:]
d[0] = -1

data[0] retains the original value of 1.


In [9]:
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(data)

data.reverse()
print(data)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
In [10]:
# Now we compare shallow and deep copies

# First a deep copy
d = data[:]
d[1] = -1

print(data)

# Now a shallow copy
d = data
d[1] = -1

print(data)
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
[10, -1, 8, 7, 6, 5, 4, 3, 2, 1]

Student Exercise

In the empty Code cell below, first create a new list called 'mylist' that holds at least 10 values. Next, write three separate print functions that display the third element, sixth element, and the last element from the mylist list.


In [ ]:
 

Dictionary

A dictionary is an unordered sequence that can hold values that are referenced by corresponding keys. These key-value pairs are separated by commas, while the value is separated from its key by a colon ':' character. A dictionary can be created in several different ways:

  1. {}: An empty dictionary
  2. {'1': 1}: A single key-value dictionary
  3. {'1': 1, '2': "two", '3': (1, 2, 3)}: Comma-separated key-value pairs
  4. dict(): using the dict class constructor

Since a dictionary is mutable, a dictionary can be changed by adding key-value pairs, by removing key-value pairs, or by simply changing existing values in place. Dictionaries can be very useful data structures, and Python provides a number of useful functions to work with dictionaries as listed in the following table:

Operation Description
v in d True if v is in the dictionary d, otherwise False
v not in d False if v is in the sequence d, otherwise True
del d[k] Deletes the key value pair identified by the key k
d.keys() Returns the keys from the dictionary d
d.values() Returns the values from the dictionary d
d.items() Returns the key-value pairs from the dictionary d
d.clear() Removes all entries from the dictionary d
d.copy() Returns a shallow copy of the dictionary d
len(d) Returns the number of entries in the dictionary d

The following code block presents a simple dictionary, along with several operations that demonstrate these functions.


In [11]:
d = {'1': 1, '2': "two", '3': (1, 2, 3)}

print(d)
print(len(d))

print('1' in d)
print('4' not in d)

a = d.copy()

del a['1']

print(d)
{'1': 1, '2': 'two', '3': (1, 2, 3)}
3
True
True
{'1': 1, '2': 'two', '3': (1, 2, 3)}

Student Exercise

In the empty Code cell below, first create a new dictionary called 'mydict' that contains at least five key-value pairs (e.g., mydict = {'one' : 1, 'two' : 2, 'three' : 3, 'four' : 4, 'five' : 5}). Next, write two separate print functions that display the value of the second key (e.g., two) and the last key (e.g., five) from the mydict dictionary.


In [ ]:
 

Tuple

A tuple is an immutable sequence that can hold homogeneous data, (1, 2, 3, 4, 5) or heterogeneous data, (1, '2', 'Three', (4, 5)). A tuple can be created in several different ways:

  1. (): An empty tuple
  2. 1, or (1, ): A single valued tuple
  3. 1, 2, 3 or (1, 2, 3): Comma-separated items
  4. tuple(): using the tuple class constructor

Note the requirement for the trailing comma to create a single-valued tuple; otherwise the Python interpreter interprets the expression as a value enclosed in parentheses; for example (1) is an integer. Any change to a tuple requires the creation of a new tuple.

Tuples are commonly used to pass information to and from functions, and allow for the assignment of multiple data values simultaneously via unpacking:


In [12]:
# First we create a three element tuple, and then display the tuple
point = (12, 32, 9)
print('point = {0}'.format(point))

x, y, z = point

# Below we use a new, format string to display the point values

print(f'x = {x}, y = {y}, z = {z}')
point = (12, 32, 9)
x = 12, y = 32, z = 9

In the previous Code cell we first print out the tuple by using a format string, which is a powerful technique for creating dynamic strings, which is often a convenient method for displaying dynamic data. The second print functions uses the new Python format string, where we can directly embed the value of a specific variable within the string by (1) preceding the string with an 'f' character, and (2) enclosing the variable to be inserted into the string in curly braces { }.


Other Data Structures

Python now supports a number of other data structures, including the range, set, frozenset, and the collections module container data types. Of these, the range type is frequently used in for loops, described later, to simplify iteration through a sequence data structure. The other data structures are beyond the scope of this lesson.


Ancillary Information

The following links are to additional documentation that you might find helpful in learning this material. Reading these web-accessible documents is completely optional.

  1. The official Python documentation for strings, lists, dictionaries, and tuples.
  2. The book A Byte of Python introduces these data structures.
  3. A discussion on the native data types mentioned in this notebook from the book, Dive into Python.
  4. The book Think Python includes a discussion on these data structures.

© 2017: Robert J. Brunner at the University of Illinois.

This notebook is released under the Creative Commons license CC BY-NC-SA 4.0. Any reproduction, adaptation, distribution, dissemination or making available of this notebook for commercial use is not allowed unless authorized in writing by the copyright holder.