# Introduction to Python¶

Note

This tutorial assumes that you are familiar with at least one high-level language like R or Matlab. If you don’t have this background, you may want to read the first 3 sections of the Python tutorialWhetting Your Appetite, Using the Python Interpreter, and An Informal Introduction to Python—first.

It also assumes you have Python (and IPython) installed on your computer. If you have Python installed—but not IPython, you can install it by typing:

$pip install ipython  For additional help with installation, please see the IPython installation page. ## Introduction¶ Useful written references and tutorials: Some introductory video lectures: While working through this tutorial, you should type the example code snippets at an interactive Python terminal. I recommend using either the IPython shell or the IPython notebook. To start an IPython shell, type the following at a BASH prompt: $ ipython


To start an IPython notebook, type

$ipython notebook  ### Objects¶ Everything is an object in Python. Roughly, this means that it can be tagged with a variable and passed as an argument to a function. Often it means that everything has attributes and methods. Certain objects in Python are mutable (e.g., lists, dictionaries), while other objects are immutable (e.g., tuples, strings, sets). Many objects can be composite (e.g., a list of dictionaries or a dictionary of lists, tuples, and strings). ### Variables¶ Variables are not their values in Python (think “I am not my name, I am the person named XXX”). You can think of variables as tags on objects. In particular, variables can be bound to an object of one type and then reassigned to an object of another type without error. ### Modules, files, packages, import¶ While you will often explore things from an interactive Python prompt, you will save your code in files for reuse as well as to document what you’ve done. You can use Python code saved in a plain text file from a Python prompt or other files by importing it. Typically, this is done at the top of a file (if you are working at a prompt, you just need to import it before you want to use the functionality). Here are some examples of importing: import math from math import cos import numpy as np import scipy as sp import matplotlib.pyplot as plt  ### Style¶ Adopting standard coding conventions is good practice. The first link above is the official “Style Guide for Python Code”, usually referred to as PEP8 (PEP is an acronym for Python Enhancement Proposal). There are a couple of potentially helpful tools for helping you conform to the standard. The pep8 package that provides a commandline tool to check your code against some of the PEP8 standard conventions. Similarly, autopep8 provides a tool to automatically format your code so that it conforms to the PEP8 standards. I have used both a little and they seem to work fairly well. The last two links discuss the NumPy docstring [1] standard. Let’s briefly see how you might benefit from NumPy docstrings in practice. In [1]: import numpy as np In [2]: np.ndim? Type: function String form: <function ndim at 0x7fcabd864938> File: /usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py Definition: np.ndim(a) Docstring: Return the number of dimensions of an array. Parameters ---------- a : array_like Input array. If it is not already an ndarray, a conversion is attempted. Returns ------- number_of_dimensions : int The number of dimensions in a. Scalars are zero-dimensional. See Also -------- ndarray.ndim : equivalent method shape : dimensions of array ndarray.shape : dimensions of array Examples -------- >>> np.ndim([[1,2,3],[4,5,6]]) 2 >>> np.ndim(np.array([[1,2,3],[4,5,6]])) 2 >>> np.ndim(1) 0  Exercises • What happens if you type np.ndim?? (i.e., use two question marks)? • Type np.tril? at an IPython prompt. What does np.tril do? • Type np.ndarray? at an IPython prompt. Briefly skim the docstring. ndarray is the basic datastructure provided by NumPy. We will examine it in much more detail in the next chapter. • Type np. followed by the <Tab> key at an IPython prompt. Choose two or three of the completions and use ? to view their docstrings. In particular, pay attention to the examples provided near the end of the docstring and see whether you can figure out how you might use this functionality. Use on them as well. Note Python 2 vs. 3 Python 3 is a new version of Python, which is incompatible with Python 2. We will use Python 2, but Python 3 is the future. Due to the large installed codebase of Python 2, the transition will take years. If you are writing new Python code at this point, require Python 2.7 as the minimum support version. You should also import the following functionality from the __future__ module. from __future__ import (absolute_import, division, print_function, unicode_literals)  While we will be using Python 2 in this tutorial, in the near future you may consider using the future package. [2] The idea is that by using this package and adding a few imports to the top of your Python modules you can write “predominantly standard, idiomatic Python 3 code that then runs similarly on Python 2.6/2.7 and Python 3.3+.” ## Data Structures¶ ### Numbers¶ Python has integers, floats, and complex numbers with the usual operations (beware: division). In [1]: 2/3 Out[1]: 0 In [2]: from __future__ import division In [3]: 2/3 Out[3]: 0.6666666666666666 In [4]: x = 1.1 In [5]: x. x.as_integer_ratio x.hex x.real x.conjugate x.imag x.fromhex x.is_integer In [5]: x * 2 Out[5]: 2.2 In [6]: x**2 Out[6]: 1.2100000000000002 In [7]: 100000**10 Out[7]: 100000000000000000000000000000000000000000000000000L In [8]: 100000**100 Out[8]: 10000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000L In [9]: cos(0) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-6-edaadd132e03> in <module>() ----> 1 cos(1) NameError: name 'cos' is not defined In [10]: import math In [11]: math.cos(0) Out[11]: 1.0 In [12]: math.cos(math.pi) Out[12]: -1.0 In [13]: (type(1), type(1.1), type(1+2j)) Out[13]: (int, float, complex)  The above line is an example of a composite object called a tuple, which we will discuss more below. At an IPython prompt, use type? to see what type does. The math package in the standard library includes many additional numerical operations. In [14]: math. math.acos math.degrees math.fsum math.pi math.acosh math.e math.gamma math.pow math.asin math.erf math.hypot math.radians math.asinh math.erfc math.isinf math.sin math.atan math.exp math.isnan math.sinh math.atan2 math.expm1 math.ldexp math.sqrt math.atanh math.fabs math.lgamma math.tan math.ceil math.factorial math.log math.tanh math.copysign math.floor math.log10 math.trunc math.cos math.fmod math.log1p math.cosh math.frexp math.modf  Exercises Using the section on “Built-in Types” from the official “The Python Standard Library” reference (follow the first link at the top of this section), figure out how to compute: 1. $$3 \le 4$$, 2. $$3 \mod 4$$, 3. $$|-4|$$, 4. $$\left( \left \lceil \frac{3}{4} \right \rceil \times4\right)^3 \mod{2}$$, and 5. $$\sqrt{-1}$$. Questions 1. How do you get the list of completions for x.? 2. What is the difference in the old and new behavior of division? 3. Read the “Truth Value Testing” and “Boolean Operations” subsections at the top of the “Built-in Types” section of the Library reference. How does this compare to how R handles things? ### Strings¶ Strings are immutable sequences of (zero or more) characters. Sequences Unlike numbers, Python strings are container objects. Specifically, it is a sequence. Python has several sequence types including strings, tuples, and lists. Sequence types share some common functionality, which we can demonstrate with strings. • Indexing To see how indexing works in Python let’s use the string containing the digits 0 through 9. In [1]: import string In [2]: string.digits Out[2]: '0123456789' In [3]: string.digits[1] Out[3]: '1' In [4]: string.digits[-1] Out[4]: '9'  Note that indexing starts at 0 (unlike R and Fortran, but like C). Also negative integers index starting from the end of the sequence. You can find the length of a sequence using the len() function. • Slicing Slicing allows you to select a subset of a string (or any sequence) by specifying start and stop indices as well as a step, which you specify using the start:stop:step notation inside of square braces. In [5]: string.digits[1::2] Out[5]: '13579' In [6]: string.digits[9::-1] Out[6]: '9876543210'  • Subsequence testing In [7]: '23' in string.digits Out[7]: True In [16]: '25' not in string.digits Out[16]: True  String methods In [1]: string1 = "my string" In [2]: string1. string1.capitalize string1.islower string1.rpartition string1.center string1.isspace string1.rsplit string1.count string1.istitle string1.rstrip string1.decode string1.isupper string1.split string1.encode string1.join string1.splitlines string1.endswith string1.ljust string1.startswith string1.expandtabs string1.lower string1.strip string1.find string1.lstrip string1.swapcase string1.format string1.partition string1.title string1.index string1.replace string1.translate string1.isalnum string1.rfind string1.upper string1.isalpha string1.rindex string1.zfill string1.isdigit string1.rjust In [2]: string1.upper() Out[2]: 'MY STRING' In [3]: string1.upper? Type: builtin_function_or_method String form: <built-in method upper of str object at 0x7fa136f8ced0> Docstring: S.upper() -> string Return a copy of the string S converted to uppercase. In [4]: string1 + " is your string." Out[4]: 'my string is your string.' In [5]: "*"*10 Out[5]: '**********' In [6]: string1[3:] Out[6]: 'string' In [7]: string1[3:4] Out[7]: 's' In [8]: string1[4::2] Out[8]: 'tig' In [9]: string1[3:5] = 'ts' --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-12-d7a58dc91703> in <module>() ----> 1 string1[3:5] = 'ts' TypeError: 'str' object does not support item assignment In [10]: string1.__ string1.__add__ string1.__len__ string1.__class__ string1.__lt__ string1.__contains__ string1.__mod__ string1.__delattr__ string1.__mul__ string1.__doc__ string1.__ne__ string1.__eq__ string1.__new__ string1.__format__ string1.__reduce__ string1.__ge__ string1.__reduce_ex__ string1.__getattribute__ string1.__repr__ string1.__getitem__ string1.__rmod__ string1.__getnewargs__ string1.__rmul__ string1.__getslice__ string1.__setattr__ string1.__gt__ string1.__sizeof__ string1.__hash__ string1.__str__ string1.__init__ string1.__subclasshook__  Exercises At an interactive Python prompt, type x = The ant wants what all ants want.. Using string indexing, slicing, subsequence testing, and methods, solve the following: 1. Convert the string to all lower case letters (don’t change x). 2. Count the number of occurrences of the substring ant. 3. Create a list of the words occurring in x. Make sure to remove punctuation and convert all words to lowercase. 4. Using only string methods on x, create the following string: The chicken wants what all chickens want. 5. Using indexing and the + operator, create the following string: The tna wants what all ants want. 6. Do the same thing except using a string method instead. Questions 1. How do the string method’s split and rsplit differ? [Hint: use ? to view the method’s docstrings.] 2. What happens when you multiple a string by a number? How does this relate to the string method __mul__? [Hint: look at the docstring.] 3. How does the len() function know how to find the length of a sequence? 4. How do the in and not in operators work? ### Tuples¶ Tuples are immutable sequences of (zero or more) objects. Functions in Python often return tuples. In [1]: x = 1; y = 2 In [2]: xy = (x, y) In [3]: xy Out[3]: (1, 2) In [4]: xy[1] Out[4]: 2 In [5]: xy[1] = 3 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-b22951f8a33e> in <module>() ----> 1 xy[1] = 3 TypeError: 'tuple' object does not support item assignment In [6]: (x, y) Out[6]: (1, 2) In [7]: x, y Out[7]: (1, 2)  Exercises 1. Note that x, y and (x, y) both print the same string. To see why that is assign them to variables and check their type. 2. Create the following x=5 and y=6. Now swap their values. (How would you do this in R?) ### List¶ Lists are mutable sequences of (zero or more) objects. In [1]: dice = [1, 2, 3, 4, 5, 6] In [2]: dice[1::2] Out[2]: [2, 4, 6] In [3]: dice[1::2] = dice[::2] In [4]: dice Out[4]: [1, 1, 3, 3, 5, 5] In [5]: dice*2 Out[5]: [1, 1, 3, 3, 5, 5, 1, 1, 3, 3, 5, 5] In [6]: dice+dice[::-1] Out[6]: [1, 1, 3, 3, 5, 5, 5, 5, 3, 3, 1, 1] In [7]: 1 in dice Out[7]: True  Exercises 1. Create a list of numbers. Reverse the order of the items in the list using slicing. Now reverse the order of the items using a list method. How does using the method differ from slicing? Do you think you think tuples have a method to reverse the order of its items? Why or why not? Check to see if you are correct or not. 2. Using a list method sort your numbers. Create a list of strings and sort it. Put your list of numbers and strings together in one list and sort it. What happened? ### Dictionaries¶ Dictionaries are mutable, unordered collections of key-value pairs. In [99]: students = {"Jarrod Millman": [10, 11, 9], ....: "Thomas Kluyver": [11, 9, 10], ....: "Stefan van der Walt": [12, 9, 9]} In [100]: students Out[100]: {'Jarrod Millman': [10, 11, 9], 'Stefan van der Walt': [12, 9, 9], 'Thomas Kluyver': [11, 9, 10]} In [102]: students.keys() Out[102]: ['Thomas Kluyver', 'Stefan van der Walt', 'Jarrod Millman'] In [103]: students["Jarrod Millman"] Out[103]: [10, 11, 9] In [104]: students["Jarrod Millman"][1] Out[104]: 11  ### Sets¶ Sets are immutable, unordered collections of unique elements. In [1]: x = {1, 2, 4, 1, 4} In [2]: x Out[2]: {1, 2, 4} In [3]: x. x.add x.issubset x.clear x.issuperset x.copy x.pop x.difference x.remove x.difference_update x.symmetric_difference x.discard x.symmetric_difference_update x.intersection x.union x.intersection_update x.update x.isdisjoint  ### And more¶ In [1]: import collections In [2]: collections. collections.Callable collections.MutableSequence collections.Container collections.MutableSet collections.Counter collections.OrderedDict collections.Hashable collections.Sequence collections.ItemsView collections.Set collections.Iterable collections.Sized collections.Iterator collections.ValuesView collections.KeysView collections.defaultdict collections.Mapping collections.deque collections.MappingView collections.namedtuple collections.MutableMapping  ## Built-in functions¶ Python has several built-in functions (you can find a full list using the link above). We’ve already used a few (e.g., len(), type(), print()). Here are a few more that we you will find useful. ### zip¶ In [108]: zip([1, 2], ["a", "b"]) Out[108]: [(1, 'a'), (2, 'b')]  ### enumerate¶ In [109]: enumerate(["a", "b"]) Out[109]: <enumerate at 0x7f5e3e018640> In [110]: list(enumerate(["a", "b"])) Out[110]: [(0, 'a'), (1, 'b')]  Question • What do the built-in functions abs(), all(), any(), dict(), dir(), id(), list(), and set() do? Make sure to use ? from the IPython prompt as well as looking at the documentation in the official Python Standard Library reference (use the above link). ## Control flow¶ ### If-then-else¶ In [44]: x = 2 In [45]: if x < 2: ....: print("Yes") ....: else: ....: print("No") ....: No  ### For-loops (and list comprehension)¶ In [49]: for x in [1,2,3,4]: ....: print(x) ....: 1 2 3 4 In [50]: for x in [1,2,3,4]: ....: print(x, end="") ....: 1234  Building up a list piece-by-piece is a common task, which can easily be done in a for-loop. List comprehension provide a compact syntax to handle this task. In [64]: x = [1, 2, 3, 4] In [65]: zip(x, x[::-1]) Out[65]: [(1, 4), (2, 3), (3, 2), (4, 1)] In [66]: [y for y in zip(x, x[::-1]) if y[0] > y[1]] Out[66]: [(3, 2), (4, 1)]  Exercises • Write a for-loop that produces [(3, 2), (4, 1)] from x. How does it compare to the list comprehension above? • Use print? to see what the end argument to the print function does. Are there any additional arguments to print()? If so, try using the additional arguments. • Find the section on the range() function in Python tutorial. Rewrite the two for-loops above using it rather than explicitly constructing the list of numbers. • See what [1, 2, 3] + 3 returns. Try to explain what happened and why. In R, when you add a scalar to a vector the result is the element-wise addition. > 3 + c(1,2,3) [1] 4 5 6  Use list comprehension to perform element-wise addition of a scalar to a list of scalars. ### Functions¶ In [105]: def add(x, y): .....: return x+y .....: In [106]: add(2, 3) Out[106]: 5 In [105]: def add(x, y=1): .....: return x+y .....: In [106]: add(3) Out[106]: 4  ## Classes¶ In [224]: class Rectangle(object): .....: def __init__(self, height, width): .....: self.height = height .....: self.weight = width .....: def __repr__(self): .....: return "{0} by {1}".format(self.height, self.width) .....: def area(self): .....: return self.height*self.width .....: In [225]: x = Rectangle(10,5) In [228]: x Out[228]: 10 by 5 In [229]: x.area() Out[229]: 50  ## Data formats¶ ### CSV¶ The Python standard library provides a package for reading and writing CSV files. This is a somewhat low-level library, so in practice you will often use NumPy, SciPy, or Pandas CSV functionality. ### JSON¶ However the JSON package in the standard library is much more useful. In [182]: import json In [183]: x = {"name": "Jarrod", "department": "Biostatistics"} In [186]: with open("tmp.json", "w") as outfile: .....: json.dump(x, outfile) .....: In [187]: cat tmp.json {"department": "Biostatistics", "name": "Jarrod"} In [192]: with open("tmp.json") as infile: .....: y = json.load(infile) .....: In [193]: y Out[193]: {u'department': u'Biostatistics', u'name': u'Jarrod'}  Note that cat is not a Python statement. IPython is clever enough to quess that you want it to call out to the underlying operating system. Exercise • One of the nice things above the JSON format is that it so well structured that it easy for a machine to parse, but simple enough that it easy for humans to read. By default json.dump writes everything out to disk without line breaks. For readability purposes, use json.dump? to figure out how to pretty-print the text as well as sort it alphabetically by key. ### HTML¶ We will use Thomas Kluyver’s web scraping example notebook for this section. You can view a rendered version of it here. To get an interactive version of it, you can do the following from your BASH prompt: $ git clone https://github.com/dlab-berkeley/python-fundamentals.git
$cd python-fundamentals/cheat-sheets/$ ipython notebook Web-Scraping.ipynb


## Standard library¶

Python provides a wealth of functionality in its huge standard library. We’ve already seen several (e.g., math, csv, json). If you need some functionality the standard library is one of the first places to look.

Here are a couple packages that you may find useful.

### os¶

In [147]: import os

In [148]: os.getcwd()
Out[148]: '/home/jarrod'

In [149]: pwd
Out[149]: u'/home/jarrod'


Exercise

• Use os? and dir(os) to explore the os package.

### re¶

The re package provides support for regular expressions.

 [1] Docstrings are an important part of a Python program: A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object. All modules should normally have docstrings, and all functions and classes exported by a module should also have docstrings. Public methods (including the __init__ constructor) should also have docstrings. Docstrings also allow for the use of doctests. The doctest module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown.
 [3] You will probably need to explore the data interactively from and IPython prompt and in tandem write your script