UPDATE 8/26/2014:Thanks to Daan Debie, you no longer have to do any of the below if you're using the pelican-bootstrap3 theme. Simply include the liquid-tags.notebook plugin in your pelican plugin list and you're good to go!


In this post, I'm (selfishly) testing the integration between pelican and IPython notebooks.

I've been meaning to do this for awhile, since I have quite a few R and Python tutorials on a variety of subjects written up as IPython notebooks, my distribution platform of choice.

I was having trouble getting the pelican-bootstrap3 theme to play nicely with the liquid-tags pelican plugin. I have to give a hat-tip to Kyle Cranmer who posted his solution (involves chopping up the _nb_header.html and creating a _nb_header_minimal.html, which is referred to in your pelicanconf.py as EXTRA_HEADER instead).

It's crazy to see someone else with a similar workflow! I only discovered his work since I was having the worst time ever trying to get pelican to play nicely with notebooks.

His hack does the job, and now the notebooks are more nicely styled than without the additional css, but it makes me feel a little leery since it has the feel of something that will break with an update. Oh well; that's what you get when you use open-source projects.

Below the line is an example of an ipython notebook embedded in a pelican article. It's as simple as creating a notebook and referring to {% notebook demonstration.ipynb %} in the article post.


Simple examples of IPython notebook features

I wanted to provide some quick examples of how easily IPython notebook allows people to write code to do analysis and share it on the web. This was a simple demonstration notebook that incorporates examples from around the web (with appropriate links leading back to the origin of the example code).

But this is hardly scratching the surface of what these tools offer. Primarily, this notebook was created to demonstrate how easy a static site generator like Pelican can incorporate notebooks [relatively] seamlessly in a page.

In [1]:
total = 2 + 2
total
Out[1]:
4

Some $\LaTeX$ using IPython magic (more examples can be found here)

In [2]:
%%latex
\begin{align}
a^2 + b^2 = c^2
\end{align}
\begin{align} a^2 + b^2 = c^2 \end{align}

A matplotlib demo (taken directly from the source)

In [3]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 3*np.pi, 500)
plt.plot(x, np.sin(x**2))
plt.title('A simple chirp');

A ggplot example courtesy of $\hat{y}$hat

In [4]:
from ggplot import *

ggplot(mtcars, aes('mpg', 'qsec')) + \
  geom_point(colour='steelblue') + \
  scale_x_continuous(breaks=[10,20,30],  \
                     labels=["horrible", "ok", "awesome"])
Out[4]:
<ggplot: (288873745)>

Pandas

In [5]:
import pandas
import pandas.io.data
from pandas import Series, DataFrame
pandas.set_option('display.notebook_repr_html', True)# Allows for pretty html output of data frames

labels = ['a', 'b', 'c', 'd', 'e']
Series([1, 2, 3, 4, 5], index=labels)
Out[5]:
a    1
b    2
c    3
d    4
e    5
dtype: int64
In [6]:
from IPython.display import display
from IPython.display import HTML
DataFrame.from_items([('A', [1, 2, 3]), ('B', [4, 5, 6])])
Out[6]:
A B
0 1 4
1 2 5
2 3 6

3 rows × 2 columns

Statsmodels (simple OLS example taken from the source).

In [7]:
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.sandbox.regression.predstd import wls_prediction_std

np.random.seed(9876789)

# Artificial data
nsample = 100
x = np.linspace(0, 10, 100)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size=nsample)

# Add intercept
X = sm.add_constant(X)
y = np.dot(X, beta) + e

# Inspect
X = sm.add_constant(X)
y = np.dot(X, beta) + e

# Fit and summary
model = sm.OLS(y, X)
results = model.fit()
print results.summary()
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 4.020e+06
Date:                Sat, 19 Apr 2014   Prob (F-statistic):          2.83e-239
Time:                        15:40:38   Log-Likelihood:                -146.51
No. Observations:                 100   AIC:                             299.0
Df Residuals:                      97   BIC:                             306.8
Df Model:                           2                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          1.3423      0.313      4.292      0.000         0.722     1.963
x1            -0.0402      0.145     -0.278      0.781        -0.327     0.247
x2            10.0103      0.014    715.745      0.000         9.982    10.038
==============================================================================
Omnibus:                        2.042   Durbin-Watson:                   2.274
Prob(Omnibus):                  0.360   Jarque-Bera (JB):                1.875
Skew:                           0.234   Prob(JB):                        0.392
Kurtosis:                       2.519   Cond. No.                         144.
==============================================================================

In [8]:
# Object types?
import pprint
types = [
    type(model),
    type(results)
]
pprint.pprint(types)
[<class 'statsmodels.regression.linear_model.OLS'>,
 <class 'statsmodels.regression.linear_model.RegressionResultsWrapper'>]

In [9]:
print "What properties and methods does the statsmodels OLS object have?"
pprint.pprint(dir(model))
What properties and methods does the statsmodels OLS object have?
['__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_data_attr',
 'data',
 'df_model',
 'df_resid',
 'endog',
 'endog_names',
 'exog',
 'exog_names',
 'fit',
 'from_formula',
 'hessian',
 'information',
 'initialize',
 'k_constant',
 'loglike',
 'nobs',
 'normalized_cov_params',
 'pinv_wexog',
 'predict',
 'rank',
 'score',
 'weights',
 'wendog',
 'wexog',
 'whiten']

In [10]:
print "What properties and methods does the statsmodels OLS fitted model have?"
pprint.pprint(dir(results))
What properties and methods does the statsmodels OLS fitted model have?
['HC0_se',
 'HC1_se',
 'HC2_se',
 'HC3_se',
 '_HC0_se',
 '_HC1_se',
 '_HC2_se',
 '_HC3_se',
 '_HCCM',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_cache',
 '_data_attr',
 'aic',
 'bic',
 'bse',
 'centered_tss',
 'compare_f_test',
 'compare_lr_test',
 'conf_int',
 'conf_int_el',
 'cov_params',
 'df_model',
 'df_resid',
 'diagn',
 'el_test',
 'ess',
 'f_pvalue',
 'f_test',
 'fittedvalues',
 'fvalue',
 'get_influence',
 'initialize',
 'k_constant',
 'llf',
 'load',
 'model',
 'mse_model',
 'mse_resid',
 'mse_total',
 'nobs',
 'norm_resid',
 'normalized_cov_params',
 'outlier_test',
 'params',
 'predict',
 'pvalues',
 'remove_data',
 'resid',
 'rsquared',
 'rsquared_adj',
 'save',
 'scale',
 'ssr',
 'summary',
 'summary2',
 't_test',
 'tvalues',
 'uncentered_tss',
 'wresid']

In [11]:
# What modules are loaded into the local namespace?
import sys
print [key for key in locals().keys()
       if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['pprint', 'scales', 'themes', 'sys', 'sm', 'scale_reverse', '_sh', 'pandas', 'geoms', 'utils', 'scale_facet', 'exampledata', 'np', 'plt', 'components']

Final note

For reproducibility reasons, I'm using pkg_resources to reveal the version numbers of imported modules (since all of these were installed from PyPI). This is a pretty heavy handed way to do this; if you're using virtualenv and pip (and you are, right?), it's conventionally considered better practice to pip freeze a requirements.txt file rather than manually printing out each version using pkg_resources.

For the sanity of other readers, folks, please reveal version numbers of the software you use somewhere in your documentation! Not enough people do this, and not only is it an incredible waste of time to try and walk through someone's example code and fail constantly because you have different versions insalled, it discourages people from following your work!

In [12]:
import pkg_resources
print 'Pandas = ' + pkg_resources.get_distribution("pandas").version
print 'Matplotlib = ' + pkg_resources.get_distribution("matplotlib").version
print 'Scipy = ' + pkg_resources.get_distribution("scipy").version
print 'Numpy = ' + pkg_resources.get_distribution("numpy").version
print 'ggplot = ' + pkg_resources.get_distribution("ggplot").version
print 'Statsmodels = ' + pkg_resources.get_distribution("statsmodels").version
print 'IPython = ' + pkg_resources.get_distribution("ipython").version
Pandas = 0.13.1
Matplotlib = 1.3.1
Scipy = 0.13.3
Numpy = 1.8.1
ggplot = 0.5.2
Statsmodels = 0.5.0
IPython = 2.0.0


Comments

comments powered by Disqus