This week's lab will introduce strings and lists in Python and some simple Unix commands.
Software tools needed: web browser and Python IDLE programming environment.
See Lab 1 for details on using Python, Gradescope, and Blackboard.
A sequence of characters (i.e. letters, numbers, symbols) is called a string. To indicate a string, we use quotes (either single or double, just as long as they match up) to indicate the start and end (a fancier way to say that is: quotes deliminate the string). For example, for our first program, we wrote:
The string, or sequence of characters, Hello, World!, was printed to the screen. We can also store strings to be used again in the program. For example,
greeting = "Hello, World!" print(greeting)
creates a location in memory that can be accessed by typing the name (or identifier) that we chose: greeting. greeting stores the string Hello, World!. While the quotes are not stored, we will often write "Hello, World!" to make it more clear where the string begins and ends. When we execute the code above, it will create the variable, greeting, and then print out the message:
We can use the variable any number of times. For example, if we wanted to print the message twice, we could use:
greeting = "Hello, World!" print(greeting) print(greeting)
Since strings are used everywhere, there are many built-in functions for strings.
For historic reasons, we start counting at 0, instead of 1, in many computer languages, including Python. When you use find() command on the string ``Hello, World!'', the first character is at 0, the next character at 1, etc.
The find() command gives the location of "ll" which is 2 if you start by counting the first character as 0.
Last week, we used the print() function to write messages to the user of our program. This week, we introduce, input(), to get information from the user. Here's the basic format:
aString = input("Put a message here to show user: ")where the string "Put a message..." is replaced by the prompt you would like the user to see and aString with the name of the string you are using in your program.
Let's write a program that combines the asking the user for input with the string commands a the beginning of the lab. The program will:
To start, open IDLE and start a new file window. Put a comment (lines that begin with '#') that includes your name and a short description of what the program does.
Save your file as you go, and then run it. Try different messages to make sure it works with different inputs. When it works, see the Programming Problem List.
You can also loop through strings. Try running the code below:
Each character has a number assigned to it. When you write a character, it is converted to its number, and that is stored instead to save space. Python uses the standard Unicode encoding (which extends the popular ASCII encoding to new symbols and alphabets). For example, ord('a') give the Unicode number for the character, a, which is 97.
Let's look at the Unicode of the characters in our string:
Modify the program to:
To go the other direction, there's a function chr() which takes numbers and returns the corresponding character. For example, chr(97) returns 'a'. Let's look at the characters with unicode from 65 to 69:
The range() statement has several different options:
You can loop through a list the same way as you loop through a string.
split() function breaks up a string into substrings, throwing away the character or delimiter on which the function splits.
Then you can loop through the list just like you loop through a string to access the individual substrings.
For example, given the string
split('s') will break up the string into a list of days:
["Friday","Saturday","Sunday",""] throwing away the 's' characters on which it splits.
Notice that it will produce an empty string at the end because of the 's' character at the end of "Sundays".
Try running the code below:
Modify the code to remove the 's' at the end of 'Sundays', now run and inspect the list. What changed?
Let's apply what we just learned to some questions from biology. DNA is a molecule that contains instructions for the cell (wiki). We can represent it as a string of four characters: 'A', 'C', 'G', and 'T' corresponding to the four nucleotides that are the building blocks for the sequences. For example,
insulin = "AGCCCTCCAGGACAGGCTGCATCAGAAGAGGCCATCAAGCAGGTCTGTTCCAAGGGCCTTTGCGTCAGGTGGGCTCAGGATTCCAGGGTGGCTGGACCCCAGGCCCCAGCTCTGCAGCAGGGAGGACGTGGCTGGGCTCGTGAAGCATGTGGGGGTGAGCCCAGGGGCCCCAAGGCAGGGCACCTGGCCTTCAGCCTGCCTCAGCCCTGC"is the start of the DNA sequence for insulin in humans.
We have the tools to compute how long the out sequence is as well as GC-content (the fraction of the sequence that is C and G) which is correlated with the stability of the molecule:
There's another way we can loop through strings, using the index of each character. Let's assume that we have:
greeting = "Hello, World!"Before, we printed out the whole string with:
print(greeting)If we wanted to print out only the first letter, we could write:
print(greeting)where the number between the square brackets is the index of the character, in this case, 0, or the very first character of the string.
Try guessing what the following code does and then running it:
The terminal is a program that allows users to type commands (separate from Python) that we can use to communicate with the operating system. It's often called the command line interface or shell. It predates the graphical interface that is now available and is incredibly useful for automating tasks and working on remote servers. We will slowly introduce shell commands over the course of the semester and eventually incorporate them into some programming assignments.
We have already used the shell in the previous lab. When you typed:
$ idleyou were asking the operating system to launch the program idle3. The laptop was configured to first look in folders containing applications, and then to check your home folder, and then to check the current folder you are using to find an application by that name.
Let's create a new directory, or folder, to hold your images (using the shell or command line). Where it says thomasH below, replace with your name. It is easier if you avoid spaces in the names:
$ pwd(The $ represents the prompt at the command line-- no need to type it.)
This tell you the path, or location, of your current working directory. To see what is in the folder, you request a 'listing':
$ mkdir thomasH(replace thomasH with your name to create a directory for you!).
Now, when we list the current directory, we should new item (namely the directory with your name):
$ cd thomasHIf we ask for a listing (ls), we will now see the contents of your directory.
In the next lab, we will explain how to copy and move files between directories.
If you finish the lab early, now is a great time to get a head start on the programming problems due early next week. There's instructors to help you, and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.