Speeding up code with Numba and Numpy

Most of the time, when people finish up a python project the sit down and wonder.

Sure…BUT…Can it run faster?

Well the good news are there are many paths one can go down to speed up something in Python. Usually most performance gains come from using efficient data structures and simplifying processes. Then comes the hard part:

For true speed you will inevitably go down the C road(that will be Cython or pure C/C++), with or without multi-processing/multi-threading on top. But there is one option, provided you are doing numerical calculations that results in substantial performance gains(1-2 orders of magnitude) while writing code that does not diverge from Python’s syntax, therefore easy to read, understand and maintain. This is where the symbiotic relationship between numpy and numba come into play. By re-writing most core heavy functions of your code in pure numpy imlemetations and using @jit numba decorator you have almost instant performance gains! How so? Lets us see some examples:

from numba import jit
import numpy as np
import pandas as pd
import time

df = pd.read_csv(your_local_path)
prices_np = df[your_price_col].to_numpy()

#Slow pandas exponential moving average
ema = df[your_price_col].ewm(span = n).mean() 

#Slow pandas percentage returns
pct_returns = df[your_price_col].pct_change()

#FAST exponential moving average
@jit(nopython=True, cache=True)
def ewm_numba(data, span):
    alpha = 2 / (span + 1)
    weights = np.power(1 - alpha, np.arange(len(data)-1, -1, -1))
    weights /= np.sum(weights)
    
    ema = np.zeros_like(data)
    ema[0] = data[0]
    
    for i in range(1, len(data)):
        ema[i] = alpha * data[i] + (1 - alpha) * ema[i-1]
    
    return ema

#FAST percentage returns
@jit(nopython=True, cache=True)
def pct_change(series):

    pct_returns = np.zeros(len(series))

    for i in range(len(series)-1):
        change = (series[i+1] - series[i]) / series[i+1]
        pct_returns[i+1] = change

    return pct_returns

Now you can test each of these functions in a loop like so and observe the results

# Start the timer
start_time = time.time()

# Code to be timed
for _ in range(1000):
    pct_change(prices_np)

# Stop the timer
end_time = time.time()

# Calculate the elapsed time
elapsed_time = end_time - start_time

print("Elapsed time:", elapsed_time, "seconds")