Today's lab will focus on mapping data, using pandas, numpy, and folium, and regular expressions at the command line.

Software tools needed: web browser and Python IDLE programming environment with the pandas, numpy, and folium package installed.

Quizzes

See the quiz page for details of the work due this week.

Using Python, Gradescope, and Blackboard

See Lab 1 for details on using Python, Gradescope, and Blackboard.

World Maps

We are going to use Python to make maps. Let's start by mapping cities of the world. Locations in the world are usually indicated by their latitude and longitude.

We will use those ranges to be our coordinates: (-90, -180) to (90,180). Let's map some cities:

In the trinket below, map the following cities (GIS coordinates are rounded to the nearest whole number):

Folium

Folium is a Python package that uses the javascript Leaflet.js library to make interactive maps. Instead of popping up a matplotlib window, folium creates an .html file that you can open (and view interactively) with a browser. After the program runs, open the corresponding .html file in a web browser to see your map.

(Folium is installed on the lab machines, To check to see if your home machine has it, type at the Python prompt: import folium
If you get an error, go to the terminal, and download it:

pip install folium
and the package will download and install.)

Our First Map

To make a map in folium, the process is:

  1. Write a program in Python, using folium.
  2. Run the program, which outputs an .html file.
  3. Open the .html file in a browser.

Here's our first program:

#Import the folium package for making maps
import folium

#Create a map, centered (0,0), and zoomed out a bit:
mapWorld = folium.Map(location=[0, 0],zoom_start=3)

#Save the map:
mapWorld.save(outfile='tempMap.html')

Save this file and run it. It will create a file called tempMap.html. Open your favorite browser (Chrome, Firefox, IE, etc.) and choose "Open File" and then choose the file tempMap.html. You should see a map of the world.

Map of New York City

Let's make another map, focused on New York City. To do that, when we set up the map object, we need to reset the location to New York City and the increase the zoom level:

import folium

mapCUNY = folium.Map(location=[40.75, -74.125], zoom_start=10)

Let's add in a marker for Hunter College:

folium.Marker(location = [40.768731, -73.964915], popup = "Hunter College").add_to(mapCUNY)

and create the .html file:

mapCUNY.save(outfile='nycMap.html')
Save your commands to a file and run via IDLE. Your program will create an HTML file called, nycMap.html. Open it in your favorite browser to make sure it creates a map of NYC with a marker for Hunter College. When you have a running program, see the Programming Problem List.

Plotting from Files

We can combine the mapping of folium with the tools we have used for CSV files.

Let's make an interactive map of the CUNY campuses. We can download a CSV file from data.ny.gov:

(Export as a .csv file and save in the same directory as your programs.) Open the file to make sure you have all the lines (should be 23) and to check if the column headings occur in the first row (they do, so no need to skip rows when reading in the file).

Let's use Pandas to read in the file. We will need to import pandas and folium:

import folium
import pandas as pd

To read in the CSV file, we'll use pandas' CSV reader. We'll print out the campus locations to make sure that all were read in:

cuny = pd.read_csv('cunyLocations.csv')
print(cuny["Campus"])
Note: we saved our CSV file to cunyLocations.csv. If you saved it to a different name, change the input parameters for read_csv() to the name of your file.

Next, let's set up a map, centered on Hunter College:

mapCUNY = folium.Map(location=[40.768731, -73.964915])

We need to add markers for each campus. We're going to iterate through the rows of dataframe to create the markers:

for index,row in cuny.iterrows():
    lat = row["Latitude"]
    lon = row["Longitude"]
    name = row["Campus"]
    newMarker = folium.Marker([lat, lon], popup=name)
    newMarker.add_to(mapCUNY)
For each row in the file, we find the latitude, longitude, and name of the campus, and use those to create a new marker which we add to our map. We repeat for each row, until we have made markers for all 23 campuses in the file.

Lastly, let's save our map:

mapCUNY.save(outfile='cunyLocations.html')

To view your map, open a browser. From the browser, open the file: cunyLocations.html.

Finding Errors

Finding, and fixing errors, in your programs is a very useful skill. Let's look at a program with lots of errors and work through how to identify the issues and fix them. If you cloned the repo above, you will have a copy of errors.py on your computer (you can also download from the webpage). When loaded into IDLE, it does not run:
# errors.py is based on dateconvert2.py from Chapter 5 of the Zelle textbook
#     Converts day month and year numbers into two date formats

def main()
    # get the day month and year
    day, month year = eval(input("Please enter day, month, and year numbers: ")

    date1 = str(month)"/"+str(day)+"/"+str(year)

    months = ["January", "February", "March", "April",
              "May", "June", "July", "August",
              "September", "October", "November", "December"]
    monthStr = months[-1]
    date2 = monthStr+" " + str(day) + ",  + str(year)

    print("The date is" date1, "or", date2+".")

main()
Instead, a dialog box pops up and says "invalid syntax":


The red line indicates where the intepreter has found an error. Can you tell what it is? Syntax is another word for grammar, so, it most likely missing `punctuation' or a misspelling of some sort. We have spelled def correctly and have the right number of parenthesis, so, what else is missing?

The answer is after the parenthesis on a function definition, a colon is required. Add that in:

    def main():
and try to run the program again.

Again, we get a dialog box:



Instead of the whole line being highlighted, only the word year is. The Python intepreter was not expecting year and says there is a grammatical mistake. Since year does not include any grammatical constructs, we need to look before the message to see where the error is. Do you see it?

The answer is lists of variables need commas in between them to distinguish one from the next. Add the comma in:

    day, month, year = ...
and try to run the program again.

Once more we get a dialog box:


It has highlighted the first item, date1 on the line. That is a name and looks fine. So, as above, let's look before the highlighted error to see if there's a problem. The line above it is:

    day, month, year = eval(input("Please enter day, month, and year numbers: ")
It did not highlight this line, so, the problem must be at the end. Do you see it?

The answer is we are missing a closing parenthesis. The line has two left parenthesis but only one right parenthesis. Add the right parenthesis in:

    ... and year numbers: "))
and try to run the program again.

Again, we get a dialog box:


The intepreter does not understand the second " on the line. Why? What is this line doing? It's constructing a string and storing it in the variable date1. How do you build a string out of smaller strings?

The answer is to put smaller strings together (called concatenation) we need to use the plus sign (+). The line is missing a plus sign right before the quotes. Add the plus sign in:

    date1 = str(month) + "/" ...
and try to run the program again.

Again, we get a dialog box, but this one has a different message:


EOL means "End of the line", so, the message says that the end of the line was reached before you finished defining the string. How can you fix this?

The answer is to end the string, using quotation marks. The line is missing a quotation mark at the very end. Add the quotation mark :

    ...+ ",  + str(year)"
and try to run the program again.

Our familiar dialog box returns:


We have seen this type of error before. How do you fix it?

The answer is lists of arguments need commas in between them to distinguish one from the next. Add the comma in:

    ... date is", date1 ...
and try to run the program again.

It runs! Now let's make sure it works. Type in at the prompt:

Please enter day, month, and year numbers: 31, 12, 2014
Uh oh, instead of output, we get the following messages:
Traceback (most recent call last):
  File "/Users/stjohn/public_html/teaching/cmp/cmp230/f14/errors.py", line 18, in 
    main()
  File "/Users/stjohn/public_html/teaching/cmp/cmp230/f14/errors.py", line 13, in main
    monthStr = months[month+1]
IndexError: list index out of range

When you see messages like this, go to the very last line:

IndexError: list index out of range
It says that the index for our list is out of range. An index is the item of the list that we're accessing. For example, months[1] has index 1 and will give us February. The range of the index for a list is 0 to one less than the length of the list. In the case of months, the range is [0,1,2,...,11]. What went wrong when we entered 12 for our month?

The answer is we used month+1 = 12 + 1 = 13 as the index:

    monthStr = months[month+1]
which is out of range. What do we want instead? Instead of adding 1, we should subtract 1. Change it in the program:
    monthStr = months[month-1]
and try to run the program again.

It still runs, but does it work? Let's try the same input again:

Please enter day, month, and year numbers: 31, 12, 2014
The date is 12/31/14 or December 31,  + str(year).

Something odd is happening at the end-- str(year) does not look right. Let's look at the print statement:

date2 = monthStr+" " + str(day) + ",  + str(year)"
The intepreter is treating ", + str(year)" as a string (instead of evaluating str(year)), so, we must have put the quotation mark in the wrong place before. Let's move it:
date2 = monthStr+" " + str(day) + ","  + str(year)
and try to run the program again.

Success! But try a few other inputs, just to make sure. It is always good to try cases that are near the `boundary' of what's allowed, since those are the places we are most likely to make mistakes:

Please enter day, month, and year numbers: 1,1,2015
The date is 1/1/2015 or January 1,2015.

Please enter day, month, and year numbers: 1, 2, 2003
The date is 2/1/03 or February 1,2003.

Please enter day, month, and year numbers: 4, 7, 1976
The date is 7/4/1976 or July 4,1976.

We have removed all the errors, and the program now runs correctly!

More on the Command Line Interface: Regular Expressions

Often it's useful to find strings that match a pattern instead of spelling out the exact names of things. For example, if you wanted to know all the Python programs in your directory that are .py files, you could list out all the files and then look through visually to find them:

$ ls
But an easier way would be to specify that only the files that end in .py be printed. To do that, you can use the "wildcard" or * operator that:
$ ls *.py
The *.py says match any file that starts with any number (including 0) of characters and ends with .py.

You can use the wildcard operator in multiple places. For example, if your directory contained:

$ ls -l 
-rw-r--r--@ 1 stjohn  staff      5308 Mar 21 14:38 quizzes.html
-rw-r--r--  1 stjohn  staff     54013 Apr 20 18:57 zoneDist.csv
-rw-r--r--@ 1 stjohn  staff      1519 Apr 22 15:14 zoneMap.py
-rw-r--r--  1 stjohn  staff  16455174 Mar 20 19:02 zoning2.html
-rw-r--r--  1 stjohn  staff  17343896 Mar 20 18:58 zoningIDS.json
Then the following command:
$ ls z*
prints all files that start with z and are followed by any number of characters.

Similarly,

$ ls *zz*
would all files that start with any number of characters, followed by zz, and end with any number of characters. The only file that contains this "pattern" of zz is quizzes.html.

There are additional ways to write down patterns that fall into the category of "regular expressions". If you're interested in finding more complex patterns, Unix has a built-in function for "global regular expression printing", called grep. We will not cover it this term but for an overview see the grep wiki page.

What's Next?

If you finish the lab early, now is a great time to get a head start on the programming problems. There's instructors to help you, and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.