Today's lab will focus on mapping data, using pandas, numpy, and folium, and regular expressions at the command line.
Software tools needed: web browser and Python IDLE programming environment with the pandas, numpy, and folium package installed.
See Lab 1 for details on using Python, Gradescope, and Blackboard.
We are going to use Python to make maps. Let's start by mapping cities of the world. Locations in the world are usually indicated by their latitude and longitude.
In the trinket below, map the following cities (GIS coordinates are rounded to the nearest whole number):
Folium is a Python package that uses the javascript Leaflet.js library to make interactive maps. Instead of popping up a matplotlib window, folium creates an .html file that you can open (and view interactively) with a browser. After the program runs, open the corresponding .html file in a web browser to see your map.
(Folium is installed on the lab machines, To check to see if your home machine has it, type at the Python prompt:
import folium
If you get an error, go to the terminal, and download it:
pip install foliumand the package will download and install.)
To make a map in folium, the process is:
Here's our first program:
#Import the folium package for making maps import folium #Create a map, centered (0,0), and zoomed out a bit: mapWorld = folium.Map(location=[0, 0],zoom_start=3) #Save the map: mapWorld.save(outfile='tempMap.html')
Save this file and run it. It will create a file called tempMap.html. Open your favorite browser (Chrome, Firefox, IE, etc.) and choose "Open File" and then choose the file tempMap.html. You should see a map of the world.
Let's make another map, focused on New York City. To do that, when we set up the map object, we need to reset the location to New York City and the increase the zoom level:
import folium mapCUNY = folium.Map(location=[40.75, -74.125], zoom_start=10)
Let's add in a marker for Hunter College:
folium.Marker(location = [40.768731, -73.964915], popup = "Hunter College").add_to(mapCUNY)
and create the .html file:
mapCUNY.save(outfile='nycMap.html')Save your commands to a file and run via IDLE. Your program will create an HTML file called, nycMap.html. Open it in your favorite browser to make sure it creates a map of NYC with a marker for Hunter College. When you have a running program, see the Programming Problem List.
Let's make an interactive map of the CUNY campuses. We can download a CSV file from data.ny.gov:
(Export as a .csv file and save in the same directory as your programs.) Open the file to make sure you have all the lines (should be 23) and to check if the column headings occur in the first row (they do, so no need to skip rows when reading in the file).Let's use Pandas to read in the file. We will need to import pandas and folium:
import folium import pandas as pd
To read in the CSV file, we'll use pandas' CSV reader. We'll print out the campus locations to make sure that all were read in:
cuny = pd.read_csv('cunyLocations.csv') print(cuny["Campus"])Note: we saved our CSV file to cunyLocations.csv. If you saved it to a different name, change the input parameters for read_csv() to the name of your file.
Next, let's set up a map, centered on Hunter College:
mapCUNY = folium.Map(location=[40.768731, -73.964915])
We need to add markers for each campus. We're going to iterate through the rows of dataframe to create the markers:
for index,row in cuny.iterrows(): lat = row["Latitude"] lon = row["Longitude"] name = row["Campus"] newMarker = folium.Marker([lat, lon], popup=name) newMarker.add_to(mapCUNY)For each row in the file, we find the latitude, longitude, and name of the campus, and use those to create a new marker which we add to our map. We repeat for each row, until we have made markers for all 23 campuses in the file.
Lastly, let's save our map:
mapCUNY.save(outfile='cunyLocations.html')
To view your map, open a browser. From the browser, open the file: cunyLocations.html.
# errors.py is based on dateconvert2.py from Chapter 5 of the Zelle textbook # Converts day month and year numbers into two date formats def main() # get the day month and year day, month year = eval(input("Please enter day, month, and year numbers: ") date1 = str(month)"/"+str(day)+"/"+str(year) months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"] monthStr = months[-1] date2 = monthStr+" " + str(day) + ", + str(year) print("The date is" date1, "or", date2+".") main()Instead, a dialog box pops up and says "invalid syntax":
The red line indicates where the intepreter has found an error. Can you tell what it is? Syntax is another word for grammar, so, it most likely missing `punctuation' or a misspelling of some sort. We have spelled def correctly and have the right number of parenthesis, so, what else is missing?
The answer is after the parenthesis on a function definition, a colon is required. Add that in:
def main():and try to run the program again.
Again, we get a dialog box:
Instead of the whole line being highlighted, only the word year is. The Python intepreter was not expecting year and says there is a grammatical mistake. Since year does not include any grammatical constructs, we need to look before the message to see where the error is.
Do you see it?
The answer is lists of variables need commas in between them to distinguish one from the next. Add the comma in:
day, month, year = ...and try to run the program again.
Once more we get a dialog box:
It has highlighted the first item, date1 on the line. That is a name and looks fine. So, as above, let's look before the highlighted error to see if there's a problem. The line above it is:
day, month, year = eval(input("Please enter day, month, and year numbers: ")It did not highlight this line, so, the problem must be at the end. Do you see it?
The answer is we are missing a closing parenthesis. The line has two left parenthesis but only one right parenthesis. Add the right parenthesis in:
... and year numbers: "))and try to run the program again.
Again, we get a dialog box:
The intepreter does not understand the second " on the line. Why? What is this line doing? It's constructing a string and storing it in the variable date1. How do you build a string out of smaller strings?
The answer is to put smaller strings together (called concatenation) we need to use the plus sign (+). The line is missing a plus sign right before the quotes. Add the plus sign in:
date1 = str(month) + "/" ...and try to run the program again.
Again, we get a dialog box, but this one has a different message:
EOL means "End of the line", so, the message says that the end of the line was reached before you finished defining the string. How can you fix this?
The answer is to end the string, using quotation marks. The line is missing a quotation mark at the very end. Add the quotation mark :
...+ ", + str(year)"and try to run the program again.
Our familiar dialog box returns:
We have seen this type of error before. How do you fix it?
The answer is lists of arguments need commas in between them to distinguish one from the next. Add the comma in:
... date is", date1 ...and try to run the program again.
It runs! Now let's make sure it works. Type in at the prompt:
Please enter day, month, and year numbers: 31, 12, 2014Uh oh, instead of output, we get the following messages:
Traceback (most recent call last): File "/Users/stjohn/public_html/teaching/cmp/cmp230/f14/errors.py", line 18, inmain() File "/Users/stjohn/public_html/teaching/cmp/cmp230/f14/errors.py", line 13, in main monthStr = months[month+1] IndexError: list index out of range
When you see messages like this, go to the very last line:
IndexError: list index out of rangeIt says that the index for our list is out of range. An index is the item of the list that we're accessing. For example, months[1] has index 1 and will give us February. The range of the index for a list is 0 to one less than the length of the list. In the case of months, the range is [0,1,2,...,11]. What went wrong when we entered 12 for our month?
The answer is we used month+1 = 12 + 1 = 13 as the index:
monthStr = months[month+1]which is out of range. What do we want instead? Instead of adding 1, we should subtract 1. Change it in the program:
monthStr = months[month-1]and try to run the program again.
It still runs, but does it work? Let's try the same input again:
Please enter day, month, and year numbers: 31, 12, 2014 The date is 12/31/14 or December 31, + str(year).
Something odd is happening at the end-- str(year) does not look right. Let's look at the print statement:
date2 = monthStr+" " + str(day) + ", + str(year)"The intepreter is treating ", + str(year)" as a string (instead of evaluating str(year)), so, we must have put the quotation mark in the wrong place before. Let's move it:
date2 = monthStr+" " + str(day) + "," + str(year)and try to run the program again.
Success! But try a few other inputs, just to make sure. It is always good to try cases that are near the `boundary' of what's allowed, since those are the places we are most likely to make mistakes:
Please enter day, month, and year numbers: 1,1,2015 The date is 1/1/2015 or January 1,2015. Please enter day, month, and year numbers: 1, 2, 2003 The date is 2/1/03 or February 1,2003. Please enter day, month, and year numbers: 4, 7, 1976 The date is 7/4/1976 or July 4,1976.
We have removed all the errors, and the program now runs correctly!
Often it's useful to find strings that match a pattern instead of spelling out the exact names of things. For example, if you wanted to know all the Python programs in your directory that are .py files, you could list out all the files and then look through visually to find them:
$ lsBut an easier way would be to specify that only the files that end in .py be printed. To do that, you can use the "wildcard" or * operator that:
$ ls *.pyThe *.py says match any file that starts with any number (including 0) of characters and ends with .py.
You can use the wildcard operator in multiple places. For example, if your directory contained:
$ ls -l -rw-r--r--@ 1 stjohn staff 5308 Mar 21 14:38 quizzes.html -rw-r--r-- 1 stjohn staff 54013 Apr 20 18:57 zoneDist.csv -rw-r--r--@ 1 stjohn staff 1519 Apr 22 15:14 zoneMap.py -rw-r--r-- 1 stjohn staff 16455174 Mar 20 19:02 zoning2.html -rw-r--r-- 1 stjohn staff 17343896 Mar 20 18:58 zoningIDS.jsonThen the following command:
$ ls z*prints all files that start with z and are followed by any number of characters.
Similarly,
$ ls *zz*would all files that start with any number of characters, followed by zz, and end with any number of characters. The only file that contains this "pattern" of zz is quizzes.html.
There are additional ways to write down patterns that fall into the category of "regular expressions". If you're interested in finding more complex patterns, Unix has a built-in function for "global regular expression printing", called grep. We will not cover it this term but for an overview see the grep wiki page.
If you finish the lab early, now is a great time to get a head start on the programming problems due early next week. There's instructors to help you, and you already have Python up and running. The Programming Problem List has problem descriptions, suggested reading, and due dates next to each problem.