This post is going to cover comprehensions and slicing of data sets in Python. This topic is what makes Python so loved by programmers because it allows us to filter elements from a collection and transform those elements in the filtering process using one very concise expression where other languages (such as Java) may require 5 or 6 lines of code to do the same thing.

// These are notes taken from my lecture at university

The following is what we’re aiming to write in a single expression which as you can see is a simple for loop that implements certain logic if a condition is met inside the loop.

result = []
for val in collection:
    if condition:
        result.append(expression)

List Comprehensions

What we do instead is very intuative and simple to see and understand even at a quick glance. The general structure of this single expression that follows the following basic form.

[expression for value in collection if condition]

Let’s look at an example so that we can get a better understanding. Let’s say that we have a list of strings and we want to filter out the strings that have a length of two or less AND we want to convert the strings to capital letters.

strings = ['a','as','bat','car','dove','python']

# and we're looking to get the following result from this list.

result = ['BAT','CAR','DOVE','PYTHON']

If we were to implement this using a for-loop then the code would be as follows.

strings = ['a','as','bat','car','dove','python']
result= []

for string in strings :
    if len(string) > 2:
        result.append(string.upper())
print(result)

When using list comprehension we can achieve the exact same result by doing the following.

strings = ['a','as','bat','car','dove','python']
result = [string.upper() for s in strings if len(s) > 2]

If we weren’t looking to apply a transformation to each element (changing to uppercase) in the list we could just write the following.

strings = ['a','as','bat','car','dove','python']
result = [s for s in strings if len(s) > 2]

# this would initialise "result" as ['bat','car','dove','python']

We could also transform every element in the list to uppercase without the conditional part (checking to see if the string is longer than 2 characters) by doing the following.

strings = ['a','as','bat','car','dove','python']
result = [string.upper() for s in strings]

# this would initialise "result" as ['A','AS','BAT','CAR','DOVE','PYTHON']

Essentially we’re writing a for loop in a very succinct manner using list comprehension.

Set Comprehensions

If we wanted to do the same for a set (instead of a list) then the syntax is exactly the same, only this time we use curly brackets (instead of square brackets).

{expression for value in collection if condition}

For this we’ll look at a different example. Let’s say that we have a set of strings and we’re looking to create a new set containing the length of each string in the collection. As we know from the previous post Data Structures and Loops in Python we know that sets can only contain unique elements and as such, duplicates will be ignored.

strings = ['a','as','bat','car','dove','python']

# and we're looking to get the following result from this list.

result = {1, 2, 3, 4, 6}

The first thing we can notice about this example is that we’re creating a new set from a python list so the comprehension doesn’t need to be the same data type as the collection. The implementation is as follows.

result = {len(s) for s in strings}

Dict Comprehensions

Dict comprehension is similar to the two previous examples only this time we need to specify both the key and value expressions in the statement as follows.

{key_expression : value_expression for value in collection if condition}

Let’s look at a new example to see how we can use dict comprehension. Let’s say that we wanted to use a dict to allow us to look at each element in a list by assigning it with an index value.

strings = ['a','as','bat','car','dove','python']

# and we're looking to get the following result from this list.

mapping = {'a':0 ,'as':1 ,'bat':2 ,'car':3 ,'dove':4 ,'python': 5}

If we had a list of elements and we wanted to know where the index was for a certain value (i.e. a lookup table) then we would need to know the index for each element in the collection (instead of iterating through the collection each and every time we wanted to find an element).

strings = ['a','as','bat','car','dove','python']
mapping = {value : index for index, value in enumerate(strings)}

The enumerate() method returns a tuple with a key value pair of the index position and value for each element in a collection.

Nested Comprehensions

This is useful when dealing with a collection of tuples or a collection of collections. Nested comprehensions can become quite complicated depending on the number of dimensions we are working with.

Let’s explore an example to understand this kind of comprehension a bit better. Let’s say that we wanted to create a list of all the names that contain two or more e’s in them from a list that contains some names.

all_names= [['John', 'Emily', 'Michael', 'Mary', 'Stephen'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

We are dealing with a list of a list so we would need to loop through the lists and then for each list, count the number of e’s in each name and that would look like the following (using a bit of a longer method).

names_of_interest = []
for name in all_names :
    # name is a list
    has_es = [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(has_es)

#names_of_interest = 'Stephen'

If we wanted to use comprehensions in a better way we could achieve the same result with the following.

[name for names in all_names for name in names if name.count('e') >= 2]

# returns 'Stephen'

If we were dealing with three dimensions i.e. a list of a list of lists then we would probably start to write out for-loops as anything more than two dimensions starts to get a bit mor complicated to see what is going on. There’s always a balance between being succinct and being able to easily understand what is going on with our code.

Another example we could look as was say that we wanted a list of values stores in tuples and we wanted to “flatten out” the data into one single list of integers.

some_tuples = {(1, 2, 3),(4, 5, 6),(7, 8, 9)}
# flattened = [1, 2, 3, 4, 5, 6, 7, 8, 9]

To achieve this we could use the following.

flattened = [value for tuple in some_tuples for value in tuple]
# returns [1, 2, 3, 4, 5, 6, 7, 8, 9]

Slicing

Slicing is an easy way to select sections (or a slice) of sequence types and the basic notation is as follows.

[start: stop]
# stop index not included

Let’s say we wanted a section of a list of numbers we could do the following.

list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list[1:5] # instead of specifying an index, specify a range
# returns [1, 2, 3, 4] and doesn't include the 5th index value

We can also emit the start and stop values and retrieve all the elements before or all the elements after a certain index position.

list[:5] # specify the stop value for elements before
# returns [0, 1, 2, 3, 4] and doesn't include the 5th index value
# same as saying list[0:5]

list[3:] # specify the stop value for elements before
# returns [3, 4, 5, 6, 7, 8, 9] all elements after (and including) index 3

We can also use negative values to slice a list. If we were to slice from the start with a negative value then python would count the number we put in from the back of the sequence.

list[-4:] 
# returns the last 4 elements of the list [6, 7, 8, 9]

list[:-1] # the index of the last element in the list
# drops the last element of the list [0, 1, 2, 3, 4, 5, 6, 7, 8] 

list[-6:-2] # 
# returns -6 from the end to -3 from the end [4, 5, 6]

Say we wanted to get the last element of the list, the difference between Python and Java can be seen in the following.

// Java
int[] list = {1, 2, 3, 4}
int last = list[list.length]
# Python
list = [1, 2, 3, 4]
last = list[-1]

Python is a lot neater and more succinct.

Final thoughts

Strings are also sliceable.

s = 'HELLO'
s[2:4]
# returns 'LL'

Invalid indexes will generate an error. Sets and dicts are not sliceable.

Leave a Comment

Your email address will not be published. Required fields are marked *