{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Scipy: A fundamental library for scientific computing\n", "\n", "https://docs.scipy.org/doc/scipy/reference/tutorial/index.html" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:08:34.137734Z", "start_time": "2018-10-16T20:08:34.063161Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import scipy as sp\n", "\n", "np.info(sp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:14:11.047530Z", "start_time": "2018-10-16T20:14:11.040757Z" } }, "outputs": [], "source": [ "import scipy.optimize\n", "np.info(sp.optimize)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:15:31.232529Z", "start_time": "2018-10-16T20:15:31.210324Z" }, "scrolled": true }, "outputs": [], "source": [ "import scipy.integrate\n", "np.info(sp.integrate)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:17:34.729513Z", "start_time": "2018-10-16T20:17:34.726318Z" } }, "outputs": [], "source": [ "import scipy.special\n", "np.info(sp.special)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:19:00.719807Z", "start_time": "2018-10-16T20:19:00.716583Z" } }, "outputs": [], "source": [ "import scipy.linalg\n", "np.info(sp.linalg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Integración.\n", "\n", "El paquete scipy.integrate provee muchas tecnicas de integracion, incluyendo integración de EDOs. \n", "\n", " Methods for Integrating Functions given function object.\n", "\n", " quad -- General purpose integration.\n", "\n", " dblquad -- General purpose double integration.\n", "\n", " tplquad -- General purpose triple integration.\n", "\n", " fixed_quad -- Integrate func(x) using Gaussian quadrature of order n.\n", "\n", " quadrature -- Integrate with given tolerance using Gaussian quadrature.\n", "\n", " romberg -- Integrate func using Romberg integration.\n", "\n", " Methods for Integrating Functions given fixed samples.\n", "\n", " trapz -- Use trapezoidal rule to compute integral from samples.\n", "\n", " cumtrapz -- Use trapezoidal rule to cumulatively compute integral.\n", "\n", " simps -- Use Simpson's rule to compute integral from samples.\n", "\n", " romb -- Use Romberg Integration to compute integral from\n", " (2**k + 1) evenly-spaced samples.\n", "\n", " See the special module's orthogonal polynomials (special) for Gaussian\n", " quadrature roots and weights for other weighting factors and regions.\n", "\n", " Interface to numerical integrators of ODE systems.\n", "\n", " odeint -- General integration of ordinary differential equations.\n", "\n", " ode -- Integrate ODE using VODE and ZVODE routines." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La función 'quad' integra funciones de una variable entre dos puntos. por ejemplo: " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:22:51.039728Z", "start_time": "2018-10-16T20:22:51.035347Z" } }, "outputs": [], "source": [ "# Importamos\n", "import scipy as sp\n", "import scipy.integrate as integrate\n", "import scipy.special as special" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:23:09.123341Z", "start_time": "2018-10-16T20:23:09.094674Z" } }, "outputs": [], "source": [ "special.beta?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:22:57.008056Z", "start_time": "2018-10-16T20:22:56.986716Z" } }, "outputs": [], "source": [ "special.beta(1, 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:23:50.475843Z", "start_time": "2018-10-16T20:23:50.470054Z" } }, "outputs": [], "source": [ "sp.integrate.quad(lambda x: sp.special.beta(1, x), 2, 4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:25:13.665430Z", "start_time": "2018-10-16T20:25:13.660546Z" } }, "outputs": [], "source": [ "# Ejemplo, integrar la función de bessel.\n", "result, error = integrate.quad(lambda x: special.jv(2.5, x), 0, 4.5)\n", "result, error" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:25:34.403969Z", "start_time": "2018-10-16T20:25:34.400330Z" } }, "outputs": [], "source": [ "from numpy import sqrt, sin, cos, pi\n", "I = sqrt(2 / pi) * (18.0 / 27 * sqrt(2) * cos(4.5) - 4.0 / 27 * sqrt(2)\n", " * sin(4.5) + sqrt(2 * pi) * special.fresnel(3 / sqrt(pi))[0])\n", "print(abs(result - I))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "'quad' recibe tres argumentos, el primero de ellos es una función, y los dos siguientes son los límites de integración.\n", "Retorna una tupla, cuyo primer elemento estima el valor de la integral, y el segundo es una cota superior para el error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cuando la función recibe parámetros adicionales, se pueden proveer en el argumento 'args' \n", "\n", "por ejemplo, para calcular\n", "\n", "$I(a,b) = \\int_0^1 (ax^2 + b) dx$\n", "\n", "puede ser evaluada de la siguiente forma:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:26:53.318587Z", "start_time": "2018-10-16T20:26:53.314329Z" } }, "outputs": [], "source": [ "from scipy.integrate import quad\n", "\n", "\n", "def integrand(x, a, b):\n", " return a * x**2 + b\n", "\n", "\n", "a = 2\n", "b = 1\n", "I = quad(integrand, 0, 1, args=(a, b))\n", "I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Integración multiple\n", "\n", "Los mecanismos de integración doble y tripe están en las funciones 'dblquad' y 'tplquad'. Estas funciones reciben la función a integrar y cuatro, o seis, argumentos respectivamente. Los límites de todas las integrales internas necesitan ser definidas como funciones.\n", "\n", "Un ejemplo se presenta a continuación para integrar\n", "\n", "$I = \\int_{y=0}^{1/2} \\int_{x=0}^{1-2y} (xy) dxdy$\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:30:10.800002Z", "start_time": "2018-10-16T20:30:10.795850Z" } }, "outputs": [], "source": [ "from scipy.integrate import dblquad\n", "\n", "area = dblquad(lambda x, y: x * y, 0, 0.5, lambda x: 0, lambda y: 1 - 2 * y)\n", "area" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimización\n", "\n", "https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html\n", "El paquete scipy.optimize provee algoritmos comunmente usados en optimización, entre ellas:\n", "- Minimización con o sin restricciones para funciones escalares (métodos BFGS, Newton conjugate gradient, Nelder-Mead Simplex, etc)\n", "- Rutinas de fuerza bruta\n", "- Mínimos cuadrados y algoritmos de ajuste de curvas\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En el ejemplo que se muestra a continuación se minimiza la función de Rosenbrok de dos variables.\n", "El algoritmo usado puede ser 'nelder-mead' o bien 'powell'. Lo bueno de estos métodos es que sólo se necesita evaluar la función y no usa evaluación del gradiente. (es por ello que es más lenta)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:32:30.739871Z", "start_time": "2018-10-16T20:32:30.733973Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy.optimize import minimize" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:34:20.397320Z", "start_time": "2018-10-16T20:34:20.394573Z" }, "scrolled": true }, "outputs": [], "source": [ "def rosen(x):\n", " return sum(100.0 * (x[1:] - x[:-1]**2.0)**2.0 + (1 - x[:-1])**2.0)\n", "\n", "x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:35:11.101849Z", "start_time": "2018-10-16T20:35:11.082278Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, options={'maxiter': 100, 'disp': True})\n", "print(res)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:39:53.178454Z", "start_time": "2018-10-16T20:39:53.139657Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='powell', options={'xtol': 1e-8, 'disp': True})\n", "print(res)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:40:51.317886Z", "start_time": "2018-10-16T20:40:51.313456Z" } }, "outputs": [], "source": [ "rosen(res.x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:41:28.276519Z", "start_time": "2018-10-16T20:41:28.252762Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='nelder-mead', options={'xtol': 1e-8, 'disp': True})\n", "print(res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para obtener un mejor desempeño (converger en menos iteraciones al óptimo), se implementará un método para optimizar que usa el gradiente de la función, por ello se define la función rosen_der, que corresponde al gradiente.\n", "\n", "Luego se llama al algoritmo 'BFGS', el cual recibe como argumento el gradiente de la función en 'jac'.\n", "\n", "En caso de no recibir explicitamente el jacobiano, el método lo obtiene numéricamente por diferencias finitas.\n", "\n", "Para minimizaciones de problemas más grandes, se puede usar el método de Newton-CG, el cual necesita el Hessiano. No se explicita aquí." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:45:56.723791Z", "start_time": "2018-10-16T20:45:56.719202Z" } }, "outputs": [], "source": [ "def rosen_der(x):\n", " xm = x[1:-1]\n", " xm_m1 = x[:-2]\n", " xm_p1 = x[2:]\n", " der = np.zeros_like(x)\n", " der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm)\n", " der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0])\n", " der[-1] = 200*(x[-1]-x[-2]**2)\n", " return der" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:46:12.724637Z", "start_time": "2018-10-16T20:46:12.707587Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='CG', jac=rosen_der, options={'disp': True})\n", "print(res)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:46:13.091324Z", "start_time": "2018-10-16T20:46:13.077772Z" }, "scrolled": true }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='BFGS', jac=rosen_der, options={'disp': True})\n", "print(res)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:46:13.481448Z", "start_time": "2018-10-16T20:46:13.468271Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='L-BFGS-B', jac=rosen_der, options={'disp': True})\n", "print(res)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:46:13.942418Z", "start_time": "2018-10-16T20:46:13.925485Z" } }, "outputs": [], "source": [ "res = minimize(rosen, x0, method='Newton-CG', jac=rosen_der, options={'disp': True})\n", "print(res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Minimización con restricciones de funciones escalares multivariables \n", "\n", "La función minimize también posee algoritmos de optimización con restricciones. El algoritmo Sequential Least Squares Programming se considerará acá, el cual permite trabajar con minimización con restricciones de la forma:\n", "\n", "\n", "\\begin{align}\n", "\\text{min} \\quad & f(x)\\\\\n", "\\text{subject to } & C_i(x) = 0, \\ i=1,...MEQ,\\\\\n", " & C_j(x) \\geq 0,\\ j=MEQ+1,...,M\\\\ \n", " & x_L \\leq x \\leq x_U, \n", "\\end{align}\n", " \n", "Por ejemplo, consideremos el problema de maximizar la funcion $f(x, y) = 2 x y + 2 x - x^2 - 2 y^2$\n", "\n", "sujeta a $x^3 - y = 0,\\ y-1 \\geq 0$.\n", "\n", "Se presenta la función y su derivada:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:48:56.328713Z", "start_time": "2018-10-16T20:48:56.322905Z" } }, "outputs": [], "source": [ "def func(x, sign=1.0):\n", " \"\"\" Objective function \"\"\"\n", " return sign * (2 * x[0] * x[1] + 2 * x[0] - x[0]**2 - 2 * x[1]**2)\n", "\n", "\n", "def func_deriv(x, sign=1.0):\n", " \"\"\" Derivative of objective function \"\"\"\n", " dfdx0 = sign * (-2 * x[0] + 2 * x[1] + 2)\n", " dfdx1 = sign * (2 * x[0] - 4 * x[1])\n", " return np.array([dfdx0, dfdx1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "El parámetro signo se introduce para multiplicar por -1 para minimizar o maximizar la función.\n", "\n", "Las restricciones se definen como una secuencia de diccionarios, con keys: 'type', 'fun', y 'jac'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:51:12.305810Z", "start_time": "2018-10-16T20:51:12.302115Z" } }, "outputs": [], "source": [ "cons = ({'type': 'eq',\n", " 'fun' : lambda x: np.array([x[0]**3 - x[1]]),\n", " 'jac' : lambda x: np.array([3.0*(x[0]**2.0), -1.0])},\n", " {'type': 'ineq',\n", " 'fun' : lambda x: np.array([x[1] - 1]),\n", " 'jac' : lambda x: np.array([0.0, 1.0])})" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:50:55.250830Z", "start_time": "2018-10-16T20:50:55.246188Z" } }, "outputs": [], "source": [ "# minimización sin restriccion\n", "res = minimize(func, [-1.0,1.0], args=(-1.0,), jac=func_deriv, method='SLSQP', options={'disp': True})\n", "\n", "print(res.x)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:51:13.826524Z", "start_time": "2018-10-16T20:51:13.820264Z" } }, "outputs": [], "source": [ "# minimizacion con restriccion\n", "res = minimize(func, [-1.0,1.0], args=(-1.0,), jac=func_deriv, constraints=cons, method='SLSQP', options={'disp': True})\n", "\n", "print(res.x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Minimización de minimos cuadrados\n", "\n", "Scipy es capaz de resolver problemas no lineales de minimoza cuadrados sujetos a restricciones de la forma\n", "\n", "\\begin{align}\n", "\\text{min } \\frac{1}{2} \\sum_{1=1}^{m} \\rho (f_i(x)^2)\\\\\n", "\\text{subject to } l_b \\leq x \\leq u_b\n", "\\end{align}\n", "donde $f_i$ son funciones suaves de $R^N$ a $R$. El proposito de $\\rho$ es reducir la infuencia de outliers.\n", "\n", "Todos los métodos específicos de minimos cuadrados utilzian la matriz Jacobiana de $f$, y es muy recomendado computar esta matriz y pasarsela 'least_squares', de otra forma será estimado numericamente.\n", "\n", "Se considerará el 'Analysis of an Enzyme Reaction'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:54:09.730555Z", "start_time": "2018-10-16T20:54:09.644532Z" } }, "outputs": [], "source": [ "from scipy.optimize import least_squares\n", "\n", "\n", "def model(x, u):\n", " return x[0] * (u ** 2 + x[1] * u) / (u ** 2 + x[2] * u + x[3])\n", "\n", "\n", "def fun(x, u, y):\n", " return model(x, u) - y\n", "\n", "\n", "def jac(x, u, y):\n", " J = np.empty((u.size, x.size))\n", " den = u ** 2 + x[2] * u + x[3]\n", " num = u ** 2 + x[1] * u\n", " J[:, 0] = num / den\n", " J[:, 1] = x[0] * u / den\n", " J[:, 2] = -x[0] * num * u / den ** 2\n", " J[:, 3] = -x[0] * num / den ** 2\n", " return J\n", "\n", "\n", "u = np.array([4.0, 2.0, 1.0, 5.0e-1, 2.5e-1, 1.67e-1,\n", " 1.25e-1, 1.0e-1, 8.33e-2, 7.14e-2, 6.25e-2])\n", "y = np.array([1.957e-1, 1.947e-1, 1.735e-1, 1.6e-1, 8.44e-2,\n", " 6.27e-2, 4.56e-2, 3.42e-2, 3.23e-2, 2.35e-2, 2.46e-2])\n", "x0 = np.array([2.5, 3.9, 4.15, 3.9])\n", "res = least_squares(fun, x0, jac=jac, bounds=(0, 100), args=(u, y), verbose=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:54:45.105354Z", "start_time": "2018-10-16T20:54:45.101776Z" } }, "outputs": [], "source": [ "res.x" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-10-16T20:55:05.523917Z", "start_time": "2018-10-16T20:55:05.365748Z" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.rcParams['figure.figsize'] = (15,10)\n", "\n", "u_test = np.linspace(0, 5)\n", "y_test = model(res.x, u_test)\n", "plt.plot(u, y, 'o', markersize=4, label='data')\n", "plt.plot(u_test, y_test, label='fitted model')\n", "plt.xlabel(\"u\")\n", "plt.ylabel(\"y\")\n", "plt.legend(loc='lower right')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolación\n", "\n", "Hay muchas facilidades para interpolación en Scipy, para datos en 1,2 o más dimensiones.\n", "- Una clase que interpola en 1D es 'interp1d', el cual ofrece muchos métodos de interpolación.\n", "- La función 'griddata' ofrece una simple interfaz de interpolación para N dimeiosnes.\n", "- Splines cubicas para interpolar en 1 ó 2 dimensiones." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:54:26.243985Z", "start_time": "2017-08-23T23:54:26.200529Z" } }, "outputs": [], "source": [ "# Interpolación en 1D\n", "from scipy.interpolate import interp1d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La función 'interp1d' es un método conveniente para crear una función basada en datos fijos que puede ser evaluada en cualquier punto al interior del dominio dada los datos usando interpolación lineal.\n", "\n", "El siguiente ejemplo demuestra su uso, para el caso de interpolación lineal y spline cúbico:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:54:32.982402Z", "start_time": "2017-08-23T23:54:32.713023Z" } }, "outputs": [], "source": [ "x = np.linspace(0, 10, num=11, endpoint=True)\n", "y = np.cos(-x**2 / 9.0)\n", "plt.scatter(x, y)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:56:00.617075Z", "start_time": "2017-08-23T23:56:00.282359Z" } }, "outputs": [], "source": [ "f = interp1d(x, y) # linear\n", "f2 = interp1d(x, y, kind='cubic') # quadratic cubic nearest\n", "xx = np.linspace(0, 10, num=1000, endpoint=True)\n", "\n", "plt.plot(x, y, '*r', xx, f(xx), 'g', xx, f2(xx), 'b')\n", "# plt.plot(xx,f(xx))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolación multivariada\n", "\n", "Para el caso de tener datos multidimensionales, por ejemplo para una función subyacente f(x,y) en la que sólo se conoce su valor en puntos (x_i, y_i) que no forman una grilla regular.\n", "\n", "Supongamos que queremos interpolar la función 2D en una grilla [0,1]x[0,1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:57:00.228616Z", "start_time": "2017-08-23T23:57:00.223022Z" } }, "outputs": [], "source": [ "def func(x, y):\n", " return x * (1 - x) * np.cos(4 * np.pi * x) * np.sin(4 * np.pi * y**2)**2\n", "\n", "\n", "grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "En la que se conocen su valor en 1000 puntos" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:57:01.916516Z", "start_time": "2017-08-23T23:57:01.911796Z" } }, "outputs": [], "source": [ "points = np.random.rand(1000, 2)\n", "values = func(points[:,0], points[:,1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Para interpolar se usa la función griddata, y se usan diferentes métodos" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:57:17.774744Z", "start_time": "2017-08-23T23:57:17.709166Z" } }, "outputs": [], "source": [ "from scipy.interpolate import griddata\n", "grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest')\n", "grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear')\n", "grid_z2 = griddata(points, values, (grid_x, grid_y), method='cubic')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y se grafican " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:57:48.659123Z", "start_time": "2017-08-23T23:57:48.179493Z" } }, "outputs": [], "source": [ "plt.subplot(221)\n", "plt.imshow(func(grid_x, grid_y).T, extent=(0, 1, 0, 1), origin='lower')\n", "#plt.plot(points[:, 0], points[:, 1], 'k.', ms=1)\n", "plt.title('Original')\n", "\n", "plt.subplot(222)\n", "plt.imshow(grid_z0.T, extent=(0, 1, 0, 1), origin='lower')\n", "plt.title('Nearest')\n", "\n", "plt.subplot(223)\n", "plt.imshow(grid_z1.T, extent=(0, 1, 0, 1), origin='lower')\n", "plt.title('Linear')\n", "\n", "plt.subplot(224)\n", "plt.imshow(grid_z2.T, extent=(0, 1, 0, 1), origin='lower')\n", "plt.title('Cubic')\n", "\n", "plt.gcf().set_size_inches(6, 6)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpolación Spline\n", "\n", "La interpolación spline requiere dos pasos escenciales: 1. una representación spline de la curva es computada y 2. la spline se evalúa en los puntos deseados. Para encontrar la representación spline, hay dos formas distintas de hacerlo: directa y paramétrica.\n", "El método directo encuentra la representación spline de una curva en un espacio plano dos-dimensional usando la función 'splrep'. \n", "\n", "Para curvas en espacios N-dimensionales, la función 'spl\n", "prep' permite definir la curva paramétricamente.\n", "\n", "El parámetro 's' es usado para especificar el grado de suavidad a considerar durante el ajuste de la curva. Si no se desea suavidad, el valor 's=0' debe ser especificado.\n", "\n", "Para evaluar la spline se usa 'splev', y sus derivadas se evalúan con 'splev' y 'spalde'.\n", "\n", "A continuación se muestra un uso." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:58:00.864127Z", "start_time": "2017-08-23T23:58:00.861084Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import interpolate" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:58:20.912110Z", "start_time": "2017-08-23T23:58:20.598484Z" } }, "outputs": [], "source": [ "x = np.arange(0, 2 * np.pi + np.pi / 4, 2 * np.pi / 8)\n", "y = np.sin(x)\n", "tck = interpolate.splrep(x, y, s=0)\n", "xnew = np.arange(0, 2 * np.pi, np.pi / 50)\n", "ynew = interpolate.splev(xnew, tck, der=0)\n", "# help(interpolate.splev0)\n", "\n", "plt.figure()\n", "plt.plot(x, y, 'x', xnew, ynew, xnew, np.sin(xnew), x, y, 'b')\n", "plt.legend(['Linear', 'Cubic Spline', 'True'])\n", "plt.axis([-0.05, 6.33, -1.05, 1.05])\n", "plt.title('Cubic-spline interpolation')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y la derivada de la curva:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-23T23:59:27.929590Z", "start_time": "2017-08-23T23:59:27.621531Z" } }, "outputs": [], "source": [ "yder = interpolate.splev(xnew, tck, der=1)\n", "plt.figure()\n", "plt.plot(xnew, yder, xnew, np.cos(xnew), '--')\n", "plt.legend(['Cubic Spline', 'True'])\n", "plt.axis([-0.05, 6.33, -1.05, 1.05])\n", "plt.title('Derivative estimation from spline')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "La integral de la spline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:02.501535Z", "start_time": "2017-08-24T00:00:02.495282Z" } }, "outputs": [], "source": [ "def integ(x, tck, constant=-1):\n", " x = np.atleast_1d(x)\n", " out = np.zeros(x.shape, dtype=x.dtype)\n", " for n in range(len(out)):\n", " out[n] = interpolate.splint(0, x[n], tck)\n", " out += constant\n", " return out" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:03.348189Z", "start_time": "2017-08-24T00:00:03.006952Z" } }, "outputs": [], "source": [ "yint = integ(xnew, tck)\n", "plt.figure()\n", "plt.plot(xnew, yint, xnew, -np.cos(xnew), '--')\n", "plt.legend(['Cubic Spline', 'True'])\n", "plt.axis([-0.05, 6.33, -1.05, 1.05])\n", "plt.title('Integral estimation from spline')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y las raices de la spline" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:35.172646Z", "start_time": "2017-08-24T00:00:35.164720Z" } }, "outputs": [], "source": [ "x = np.linspace(-np.pi / 4, 2. * np.pi + np.pi / 4, 21)\n", "y = np.sin(x)\n", "tck = interpolate.splrep(x, y, s=0)\n", "interpolate.sproot(tck)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Spline paramétrica" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:37.882386Z", "start_time": "2017-08-24T00:00:37.557028Z" } }, "outputs": [], "source": [ "t = np.arange(0, 1.1, .1)\n", "x = np.sin(2 * np.pi * t)\n", "y = np.cos(2 * np.pi * t)\n", "tck, u = interpolate.splprep([x, y], s=0)\n", "unew = np.arange(0, 1.01, 0.01)\n", "out = interpolate.splev(unew, tck)\n", "plt.figure()\n", "plt.plot(x, y, 'x', out[0], out[1], np.sin(\n", " 2 * np.pi * unew), np.cos(2 * np.pi * unew), x, y, 'b')\n", "plt.legend(['Linear', 'Cubic Spline', 'True'])\n", "plt.axis([-1.05, 1.05, -1.05, 1.05])\n", "plt.title('Spline of parametrically-defined curve')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Representación de spline en dos dimensiones\n", "\n", "Para un ajuste suave de una spline a una superficie en dos dimensiones, se usa la función 'bisplrep'.\n", "Para evaluar la spline dos-dimensional y sus derivadas parciales, se usa la función 'bisplev'.\n", "\n", "Se define una función sobre una grilla poco densa y la graficamos" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:48.266958Z", "start_time": "2017-08-24T00:00:47.905110Z" } }, "outputs": [], "source": [ "x, y = np.mgrid[-1:1:20j, -1:1:20j]\n", "z = (x + y) * np.exp(-6.0 * (x * x + y * y))\n", "plt.figure()\n", "plt.pcolor(x, y, z)\n", "plt.colorbar()\n", "plt.title(\"Sparsely sampled function.\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Y se interpola sobre una grilla de 70x70, y se grafica" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:00:53.783063Z", "start_time": "2017-08-24T00:00:53.254649Z" } }, "outputs": [], "source": [ "xnew, ynew = np.mgrid[-1:1:70j, -1:1:70j]\n", "tck = interpolate.bisplrep(x, y, z, s=0)\n", "znew = interpolate.bisplev(xnew[:, 0], ynew[0, :], tck)\n", "plt.figure()\n", "plt.pcolor(xnew, ynew, znew)\n", "plt.colorbar()\n", "plt.title(\"Interpolated function.\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Usando funciones de base radial para suavizado/interpolación\n", "\n", "Las funciones de base radial pueden ser usadas para suavizado/interpolación de datos dispersados en N-dimensiones, pero debe ser usada con precaución para extrapolación fuera del área de datos observados.\n", "\n", "En el siguiente ejemplo se muestra cómo interpolar datos dispersos en 2D." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:24.006437Z", "start_time": "2017-08-24T00:01:24.000084Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy.interpolate import Rbf\n", "import matplotlib.pyplot as plt\n", "from matplotlib import cm\n", "# 2-d tests - setup scattered data\n", "x = np.random.rand(100) * 4.0 - 2.0\n", "y = np.random.rand(100) * 4.0 - 2.0\n", "z = x * np.exp(-x**2 - y**2)\n", "ti = np.linspace(-2.0, 2.0, 100)\n", "XI, YI = np.meshgrid(ti, ti)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:25.441341Z", "start_time": "2017-08-24T00:01:24.757195Z" } }, "outputs": [], "source": [ "# use RBF\n", "rbf = Rbf(x, y, z, epsilon=2)\n", "ZI = rbf(XI, YI)\n", "# plot the result\n", "plt.subplot(1, 1, 1)\n", "plt.pcolor(XI, YI, ZI, cmap=cm.jet)\n", "plt.scatter(x, y, 100, z, cmap=cm.jet)\n", "plt.title('RBF interpolation - multiquadrics')\n", "plt.xlim(-2, 2)\n", "plt.ylim(-2, 2)\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Algebra Lineal\n", "\n", "'Scipy.linalg' contiene todas las funciones de 'Numpy.linalg', pero es más avanzada.\n", "\n", "Las clases que representan matrices y operaciones básicas matriciales son parte de 'numpy'. Por conveniencia se resumen las diferencias entre 'numpy.matrix' y 'numpy.ndarray'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "'numpy.matrix' es una clase de matrices que tiene una mejor interfaz de operatoria que 'numpy.ndarray' para matrices. Esta clase tiene una sintaxis similar a la de Matlab, la multiplicación matricial por default es '*', posee los operadores inversa ('.I') y la traspuesta es ('.T').\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:39.266701Z", "start_time": "2017-08-24T00:01:39.259905Z" } }, "outputs": [], "source": [ "# Ejemplo de numpy.matrix\n", "import numpy as np\n", "A = np.mat('[1 2; 3 4]')\n", "A\n", "A.I\n", "b = np.mat('[5 6]')\n", "b.T\n", "A * b.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A pesar de su conveniencia, el uso de la clase 'numpy.matrix' no se recomienda, desde que no tiene nada que no se pueda lograr con arreglos 2-D de 'numpy.ndarray', y puede llevar a confusiones de cuál es la clase que se está usando, por ejemplo, el código antes mencionado puede ser escrito:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:41.355332Z", "start_time": "2017-08-24T00:01:41.346580Z" } }, "outputs": [], "source": [ "A = np.array([[1, 2], [3, 4]])\n", "AA = np.floor(100 * np.random.rand(100, 100))\n", "b = np.array([[5, 6]])\n", "b.T\n", "A * b # NO la multiplicación matricial\n", "A.dot(b.T) # Sí multiplicación matricial\n", "# A.dot(b) #no importa por multiplicación trasponer o no.\n", "b2 = np.array([5, 6])\n", "A.dot(b2.T)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rutinas básicas\n", "\n", "### Encontrando la inversa" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:43.918868Z", "start_time": "2017-08-24T00:01:43.909594Z" }, "scrolled": false }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "\n", "A = np.array([[1, 3, 5], [2, 5, 1], [2, 3, 8]])\n", "print(A)\n", "print(linalg.inv(A))\n", "\n", "print(A.dot(linalg.inv(A))) # double check" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resolviendo un sistema lineal\n", "\n", "De la forma:\n", "\n", "\\begin{align}\n", "x + 3y + 5z = 10\\\\ \n", "2x + 5y + z = 8\\\\ \n", "2x + 3y + 8z = 3\n", "\\end{align}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:47.614560Z", "start_time": "2017-08-24T00:01:47.598667Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1, 2], [3, 4]])\n", "print(A)\n", "\n", "b = np.array([[5], [6]])\n", "print(b)\n", "\n", "linalg.inv(A).dot(b) # slow\n", "\n", "A.dot(linalg.inv(A).dot(b)) - b # check\n", "\n", "np.linalg.solve(A, b) # fast\n", "\n", "A.dot(np.linalg.solve(A, b)) - b # check" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Encontrando el determinante" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:50.283535Z", "start_time": "2017-08-24T00:01:50.277626Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1, 2], [3, 4]])\n", "linalg.det(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculando normas\n", "\n", "Es posible calcular distintas normas, se presentan algunas normas matriciales a continuación" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:52.030777Z", "start_time": "2017-08-24T00:01:52.020461Z" } }, "outputs": [], "source": [ "import numpy as np\n", "from scipy import linalg\n", "A = np.array([[1, 2], [3, 4]])\n", "A\n", "linalg.norm(A)\n", "\n", "linalg.norm(A, 'fro') # frobenius norm is the default\n", "\n", "linalg.norm(A, 1) # L1 norm (max column sum)\n", "\n", "linalg.norm(A, -1)\n", "\n", "linalg.norm(A, np.inf) # L inf norm (max row sum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problema de los mínimos cuadrados \n", "\n", "El problema lineal de los minimos cuadrados aparece en muchas ramas de las matemáticas. EN este problema un conjunto de coeficientes lineales se desean encontrar para poder ajustar el modelo a los datos.\n", "EN particular, se supone que los datos $y_i$ se relacionan con los datos $x_i$ por un conjunto de coeficientes $c_j$ y una función modelo $f_j(x_i)$ de la forma:\n", "\n", "\\begin{align}\n", "y_i = \\sum_{j} c_j f_j(x_i) + \\epsilon_i\n", "\\end{align}\n", "\n", "donde $\\epsilon_i$ representa incertezas en los datos. La estrategia de los mínimos cuadrados es escoger coeficientes $c_j$ que minimicen el error cuadrático, es decir\n", "\n", "\\begin{align}\n", "J(c) = \\sum_{i} \\left| y_i - \\sum_j c_j f_j (x_i) \\right| ^2\n", "\\end{align}\n", "\n", "Definiendo $\\{ A\\}_{ij} = f_j (x_i)$, notemos que el problema se puede escribir $y = Ac + \\epsilon$, y el problema posee solución cuando $A^H A$ es invertible y entonces\n", "\n", "\\begin{align}\n", "c = \\left( A^H A \\right)^{-1} A^H y = A^{\\dagger} y\n", "\\end{align}\n", "\n", "\n", "El comando 'linalg.lstsq' resolverá el problema de los mínimos cuadrados para $c$ dado $A$ e $y$. Para calcular $A^{\\dagger}$ se puede usar las funciones 'linalg.pinv' o 'linalg.pinv2'.\n", "\n", "El siguiente ejemplo muestra el uso de las funciones antes mencionadas para el modelo\n", "\n", "\\begin{align}\n", "y_i = c_1 e^{-x_i} + c_2 x_i\n", "\\end{align}\n", "con $x_i=0.1$, $i = 1,...,10$, $c_1 =5, c_2 = 4$. Se añade ruido a $y_i$ y los coeficientes $c_1, c_2$ son estimados usando minimos cuadrados lineal" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:54.251316Z", "start_time": "2017-08-24T00:01:54.246247Z" } }, "outputs": [], "source": [ "# Least Square Method\n", "\n", "import numpy as np\n", "from scipy import linalg\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:55.912676Z", "start_time": "2017-08-24T00:01:55.906946Z" } }, "outputs": [], "source": [ "c1, c2 = 5.0, 2.0\n", "i = np.r_[1:11]\n", "xi = 0.1 * i\n", "yi = c1 * np.exp(-xi) + c2 * xi\n", "zi = yi + 0.05 * np.max(yi) * np.random.rand(len(yi))\n", "print(yi)\n", "print(zi)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:57.453183Z", "start_time": "2017-08-24T00:01:57.444724Z" } }, "outputs": [], "source": [ "A = np.c_[np.exp(-xi)[:, np.newaxis], xi[:, np.newaxis]]\n", "print(A)\n", "c, resid, rank, sigma = linalg.lstsq(A, zi)\n", "\n", "print(c, resid, rank, sigma)\n", "print(c)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:01:58.387787Z", "start_time": "2017-08-24T00:01:57.985895Z" } }, "outputs": [], "source": [ "xi2 = np.r_[0.1:1.0:100j]\n", "yi2 = c[0] * np.exp(-xi2) + c[1] * xi2\n", "plt.plot(xi, zi, 'x', xi2, yi2)\n", "plt.axis([0, 1.1, 3.0, 5.5])\n", "plt.xlabel('$x_i$')\n", "plt.title('Data fitting with linalg.lstsq')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Estadística\n", "\n", "En esta sección se discuten algunos aspectos de 'scipy.stats'. \n", "\n", "## Variables Aleatorias\n", "\n", "Hay dos clases de distribución que tienen implementadas variables aleatorias continuas y discretas. \n", "Todas las funciones estadísticas están alocadas en el subpaquete $\\text{scipy.stats}$ y una lista de tales funciones se puede obtener vía 'info(stats)'.\n", "\n", "Se importa el paquete de la siguiente forma:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:19.765003Z", "start_time": "2017-08-24T00:03:19.758348Z" } }, "outputs": [], "source": [ "from scipy import stats\n", "from scipy.stats import norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Los métodos más usados para variables aleatorias continuas son:\n", "\n", " rvs: Random Variates\n", " pdf: Probability Density Function\n", " cdf: Cumulative Distribution Function\n", " sf: Survival Function (1-CDF)\n", " ppf: Percent Point Function (Inverse of CDF)\n", " isf: Inverse Survival Function (Inverse of SF)\n", " stats: Return mean, variance, (Fisher’s) skew, or (Fisher’s) kurtosis\n", " moment: non-central moments of the distribution\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:23.139062Z", "start_time": "2017-08-24T00:03:23.132330Z" } }, "outputs": [], "source": [ "# Por ejemplo, una variable aleatoria normal evaluada en ciertos puntos:\n", "norm.cdf(np.array([-1., 0, 1]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:28.067230Z", "start_time": "2017-08-24T00:03:28.061840Z" } }, "outputs": [], "source": [ "# metodos generalmente útiles:\n", "norm.mean(), norm.std(), norm.var()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:44.581969Z", "start_time": "2017-08-24T00:03:44.575695Z" } }, "outputs": [], "source": [ "# Para encontrar la media de una distribución se puede usar la 'percent\n", "# point function 'ppf'', la cual es la inversa de 'cdf'\n", "norm.ppf(0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kernel Density Estimation\n", "\n", "Una tarea común en estadística es estimar la función de densidad de probabilidad (PDF) de una vatiable aleatoria a partir de un conjunto de muestra de datos. Esta tarea se denomina estimación de densidad (density estimation). La forma mejor conocida para hacer esto es un histograma, el cual es una herramienta útil de visualización, pero no usa los datos disponibles muy eficientemente. La estimación de la densidad de nucleo (kernel density estimation) es una herramienta más eficiente para la misma tarea.\n", "\n", "El 'gauusian_kde' estimador puede ser usado para estimar la PDF de una univariada, así como una multivariada. Funciona mejor si los datos son unimodales." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Estimación univariada\n", "\n", "Se empieza con una cantidad mínima de datos para ver cómo 'gaussian_kde' funciona, y las diferentes opciones para ancho de banda hacen. Los datos sampleados desde PDF se muestran como guiones azules al fondo de la figura" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:46.708564Z", "start_time": "2017-08-24T00:03:46.377907Z" } }, "outputs": [], "source": [ "from scipy import stats\n", "import matplotlib.pyplot as plt\n", "\n", "x1 = np.array([-7, -5, 1, 4, 5], dtype=np.float)\n", "kde1 = stats.gaussian_kde(x1)\n", "kde2 = stats.gaussian_kde(x1, bw_method='silverman')\n", "\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111)\n", "\n", "ax.plot(x1, np.zeros(x1.shape), 'b+', ms=20) # rug plot\n", "x_eval = np.linspace(-10, 10, num=200)\n", "ax.plot(x_eval, kde1(x_eval), 'k-', label=\"Scott's Rule\")\n", "ax.plot(x_eval, kde2(x_eval), 'r-', label=\"Silverman's Rule\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Se ve que hay una pequeña diferencia entre 'Scott's Rule' y 'Silverman's Rule', y el ancho de banda escogido con una cantidad limitada de datos es probablemente muy ancha. Se puede definir una propia función de anchi de banda para obtener un resultado menos suavizado" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:03:49.196915Z", "start_time": "2017-08-24T00:03:49.191051Z" } }, "outputs": [], "source": [ "def my_kde_bandwidth(obj, fac=1. / 5):\n", " \"\"\"We use Scott's Rule, multiplied by a constant factor.\"\"\"\n", " return np.power(obj.n, -1. / (obj.d + 4)) * fac" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:10.047009Z", "start_time": "2017-08-24T00:04:09.753408Z" } }, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111)\n", "\n", "ax.plot(x1, np.zeros(x1.shape), 'b+', ms=20) # rug plot\n", "kde3 = stats.gaussian_kde(x1, bw_method=my_kde_bandwidth)\n", "ax.plot(x_eval, kde3(x_eval), 'g-', label=\"With smaller BW\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Se ve que si el ancho de banda es muy angosto, se obtiene que el estimado de la función de densidad de probabilidad (PDF) es simplemente la suma de Gaussianas alrededor de cada uno de los puntos.\n", "\n", "Tomemos ahora un ejemplo más realista y veamos la diferencia entre las dos elecciones de ancho de banda. Estas reglas son conocidas que funcionan bien para (o cercano a) distribuciones normales, pero incluso para distribuciones unimodales que son un poco fuertemente no-normales funcionan razonablemente bien.\n", "\n", "Como una distribución no-normal tomaremos una T-Student distribución con 5 grados de libertad." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:11.948775Z", "start_time": "2017-08-24T00:04:11.596110Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import stats\n", "\n", "\n", "np.random.seed(12456)\n", "x1 = np.random.normal(size=200) # random data, normal distribution\n", "xs = np.linspace(x1.min() - 1, x1.max() + 1, 200)\n", "\n", "kde1 = stats.gaussian_kde(x1)\n", "kde2 = stats.gaussian_kde(x1, bw_method='silverman')\n", "\n", "fig = plt.figure(figsize=(8, 6))\n", "\n", "ax1 = fig.add_subplot(211)\n", "ax1.plot(x1, np.zeros(x1.shape), 'b+', ms=12) # rug plot\n", "ax1.plot(xs, kde1(xs), 'k-', label=\"Scott's Rule\")\n", "ax1.plot(xs, kde2(xs), 'b-', label=\"Silverman's Rule\")\n", "ax1.plot(xs, stats.norm.pdf(xs), 'r--', label=\"True PDF\")\n", "\n", "ax1.set_xlabel('x')\n", "ax1.set_ylabel('Density')\n", "ax1.set_title(\"Normal (top) and Student's T$_{df=5}$ (bottom) distributions\")\n", "ax1.legend(loc=1)\n", "\n", "x2 = stats.t.rvs(5, size=200) # random data, T distribution\n", "xs = np.linspace(x2.min() - 1, x2.max() + 1, 200)\n", "\n", "kde3 = stats.gaussian_kde(x2)\n", "kde4 = stats.gaussian_kde(x2, bw_method='silverman')\n", "\n", "ax2 = fig.add_subplot(212)\n", "ax2.plot(x2, np.zeros(x2.shape), 'b+', ms=12) # rug plot\n", "ax2.plot(xs, kde3(xs), 'k-', label=\"Scott's Rule\")\n", "ax2.plot(xs, kde4(xs), 'b-', label=\"Silverman's Rule\")\n", "ax2.plot(xs, stats.t.pdf(xs, 5), 'r--', label=\"True PDF\")\n", "\n", "ax2.set_xlabel('x')\n", "ax2.set_ylabel('Density')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tomemos ahora una distrubución bimodal con una característica gaussiana ancha y una delgada. Esperamos que esto sea más dificil de aproximar, debido a las diferentes anchos de banda requeridos para resolver precisamente cada característica.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:35.911235Z", "start_time": "2017-08-24T00:04:35.908148Z" } }, "outputs": [], "source": [ "from functools import partial" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:36.279127Z", "start_time": "2017-08-24T00:04:36.274537Z" } }, "outputs": [], "source": [ "loc1, scale1, size1 = (-2, 1, 175)\n", "loc2, scale2, size2 = (2, 0.2, 50)\n", "x2 = np.concatenate([np.random.normal(loc=loc1, scale=scale1, size=size1),\n", " np.random.normal(loc=loc2, scale=scale2, size=size2)])" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:36.646282Z", "start_time": "2017-08-24T00:04:36.643426Z" } }, "outputs": [], "source": [ "x_eval = np.linspace(x2.min() - 1, x2.max() + 1, 500)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:37.062770Z", "start_time": "2017-08-24T00:04:36.966246Z" } }, "outputs": [], "source": [ "kde = stats.gaussian_kde(x2)\n", "kde2 = stats.gaussian_kde(x2, bw_method='silverman')\n", "kde3 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.2))\n", "kde4 = stats.gaussian_kde(x2, bw_method=partial(my_kde_bandwidth, fac=0.5))\n", "\n", "pdf = stats.norm.pdf\n", "bimodal_pdf = pdf(x_eval, loc=loc1, scale=scale1) * float(size1) / x2.size + \\\n", " pdf(x_eval, loc=loc2, scale=scale2) * float(size2) / x2.size\n", "\n", "fig = plt.figure(figsize=(8, 6))\n", "ax = fig.add_subplot(111)\n", "\n", "ax.plot(x2, np.zeros(x2.shape), 'b+', ms=12)\n", "ax.plot(x_eval, kde(x_eval), 'k-', label=\"Scott's Rule\")\n", "ax.plot(x_eval, kde2(x_eval), 'b-', label=\"Silverman's Rule\")\n", "ax.plot(x_eval, kde3(x_eval), 'g-', label=\"Scott * 0.2\")\n", "ax.plot(x_eval, kde4(x_eval), 'c-', label=\"Scott * 0.5\")\n", "ax.plot(x_eval, bimodal_pdf, 'r--', label=\"Actual PDF\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:04:37.654877Z", "start_time": "2017-08-24T00:04:37.501705Z" } }, "outputs": [], "source": [ "ax.set_xlim([x_eval.min(), x_eval.max()])\n", "ax.legend(loc=2)\n", "ax.set_xlabel('x')\n", "ax.set_ylabel('Density')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Como era de esperar, el KDE no es tan cercano como la verdadera PDF como se podría esperar debido a las diferentes caracteristicas de tamaño de las dos componentes de distribución bimodal.\n", "\n", "Reduciendo a la mitad el ancho de banda defecto ('Scott*0.5') se puede lograr algo mejor, mientras que usando un factor 5 veces más pequeño que el ancho de banda por defecto no suaviza demasiado. Lo que se necesita en este caso es un ancho de banda no uniforme (adaptativo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Usando la función 'gaussian_kde' se puede operar estimación multivariada al igual que la univariada. Se muestra a continuación un caso bivariado. En primer lugar se generan datos aleatorios con un modelo en que dos variables están correlacionadas." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:07.838261Z", "start_time": "2017-08-24T00:05:07.819501Z" } }, "outputs": [], "source": [ "def measure(n):\n", " \"\"\"Measurement model, return two coupled measurements.\"\"\"\n", " m1 = np.random.normal(size=n)\n", " m2 = np.random.normal(scale=0.5, size=n)\n", " return m1 + m2, m1 - m2\n", "\n", "\n", "m1, m2 = measure(2000)\n", "xmin = m1.min()\n", "xmax = m1.max()\n", "ymin = m2.min()\n", "ymax = m2.max()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:08.958870Z", "start_time": "2017-08-24T00:05:08.429425Z" } }, "outputs": [], "source": [ "X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]\n", "positions = np.vstack([X.ravel(), Y.ravel()])\n", "values = np.vstack([m1, m2])\n", "kernel = stats.gaussian_kde(values)\n", "Z = np.reshape(kernel.evaluate(positions).T, X.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:09.158120Z", "start_time": "2017-08-24T00:05:08.960257Z" }, "scrolled": true }, "outputs": [], "source": [ "fig = plt.figure(figsize=(8, 6))\n", "ax = fig.add_subplot(111)\n", "\n", "ax.imshow(np.rot90(Z), cmap=plt.cm.gist_earth_r,\n", " extent=[xmin, xmax, ymin, ymax])\n", "ax.plot(m1, m2, 'k.', markersize=2)\n", "\n", "ax.set_xlim([xmin, xmax])\n", "ax.set_ylim([ymin, ymax])\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:09.688505Z", "start_time": "2017-08-24T00:05:09.355823Z" } }, "outputs": [], "source": [ "xx = np.r_[-3: 3: 1000j] #np.linspace(-3, 3, 1000)\n", "plt.plot(xx, norm.cdf(xx))\n", "plt.show()\n", "norm.cdf(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:10.511241Z", "start_time": "2017-08-24T00:05:10.052868Z" }, "scrolled": true }, "outputs": [], "source": [ "np.histogram(xx,10)\n", "plt.hist(norm.rvs(size=100000),100)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:17.806687Z", "start_time": "2017-08-24T00:05:17.796127Z" } }, "outputs": [], "source": [ "norm.stats(loc = 3, scale = 4, moments = 'mv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:18.528739Z", "start_time": "2017-08-24T00:05:18.045492Z" } }, "outputs": [], "source": [ "plt.hist(norm.rvs(size=100000, loc = 3, scale = 1000),100)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:19.091058Z", "start_time": "2017-08-24T00:05:19.084559Z" } }, "outputs": [], "source": [ "from scipy.stats import uniform\n", "uniform.cdf([0, 1, 2, 3, 4, 4], loc = 0.1)#, scale = 400)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:21.587693Z", "start_time": "2017-08-24T00:05:21.575349Z" } }, "outputs": [], "source": [ "from scipy.stats import gamma\n", "gamma.numargs\n", "gamma.shapes\n", "\n", "gamma(1, scale=2.).stats(moments=\"mv\")\n", "\n", "gamma(a=1, scale=2.).stats(moments=\"mv\")\n", "\n", "rv = gamma(1, scale=2.)\n", "\n", "rv.mean(), rv.std()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:31.934817Z", "start_time": "2017-08-24T00:05:29.236957Z" }, "scrolled": true }, "outputs": [], "source": [ "rgh = stats.gausshyper.rvs(0.5, 2, 2, 2, size=100)\n", "print(rgh)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:35.559347Z", "start_time": "2017-08-24T00:05:35.552941Z" } }, "outputs": [], "source": [ "from scipy import stats\n", "class deterministic_gen(stats.rv_continuous):\n", " def _cdf(self, x):\n", " return np.where(x < 0, 0., 1.)\n", " def _stats(self):\n", " return 8., 0., 0., 8." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:36.010089Z", "start_time": "2017-08-24T00:05:36.002567Z" } }, "outputs": [], "source": [ "deterministic = deterministic_gen(name=\"deterministic\")\n", "deterministic.cdf(np.arange(-3, 3, 0.5))\n", "deterministic.pdf(np.arange(-3, 3, 0.5))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:36.481456Z", "start_time": "2017-08-24T00:05:36.477458Z" } }, "outputs": [], "source": [ "help(stats.ttest_ind)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2017-08-24T00:05:36.898896Z", "start_time": "2017-08-24T00:05:36.893199Z" } }, "outputs": [], "source": [ "rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)\n", "rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)\n", "stats.ttest_ind(rvs1, rvs2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }