Python Generator Functions for massive Performance Improvements with Lists

Python

Video is ready, Click Here to View ×


This video covers the Python Generator functions, both premade ones and how to make your own python generator. The idea and purpose of a generator is to iterate through a list as efficiently as possible. To do this, each item is generated lazily, or on the spot, then thrown away after use.

Sentdex.com
Facebook.com/sentdex
Twitter.com/sentdex

11 thoughts on “Python Generator Functions for massive Performance Improvements with Lists

  1. Great video, but why does it work better?

    So sum(addNumGen(1000000)) will wait until all 1000000 calls to the function are made before continuing, so how (or where) is python doing the optimization? If anything, the one call to addNums seems like it would be faster.. so what's happening underneath the hood to make it faster?

  2. Note to people watching this video in 2015.

    In Python 3 range() returns a memory efficient iterable, not a list as in 2.x). It can handle any large number range you throw at it, and do so at seemingly instant speeds, and with near zero memory used.

    In Python 2 range() when used with large numbers is very slow and will eat your memory like an all you can eat buffet.

    For more information, give the following pages a look:
    https://wiki.python.org/moin/Python2orPython3
    http://stackoverflow.com/questions/30081275/why-is-1000000000000000-in-range1000000000000001-so-fast-in-python-3

  3. I would really, really love an updated course on generators (or some sort of advanced iterator). Also, side note, very interested in a course on data streams (like socket) and how to effectively keep accepting data from some stream while your program still functions.

    (I'm trying to perfect a twitch bot, and these would both help me)

    Thanks, H! You the best teach on python I've had, along with @nedbat on twitter.

  4. hm, just sum(range(n)) seems to outperform sum(generator(n)) here….
    Also, avoid function calls within conditionals for loops. The len(allN) is being reevaluated every loop, even though it's result won't change. calling it once and storing the length gives a 25-43% time drop.

    While generators have their place, I'm not sure it's in summing numbers…More for generating numbers from an algorithm in a controlled manner without guessing how many you need (think an infinite sequence of numbers like Fibonacci).

    # script
    from time import time

    def addNums(n):
        allN = range(n)
        x = 0
        curNum = 0
        while x < len(allN): #note len(allN) is always just n.
          curNum += x
          x += 1
        return curNum
        
    def smartAddNums(n):
        allN = range(n)
        x = 0
        curNum = 0
        length = len(allN) #still just n, but now len only called once!
        while x < length:
          curNum += x
          x += 1
        return curNum

    def simpleAdd(n):
        sum = 0
        while n > 0:
            sum += n
            n -= 1
        return sum

    def sumNums(n):
      import sys
      if sys.version_info.major == 2:
        return sum(xrange(n)) # slight advantage? maybe?
      else:
        return sum(range(n))

    def genNums(n):
        curNum = 0
        while curNum < n:
            yield(curNum)
            curNum += 1

    amt = 10000000

    print('starting addNums')
    start = time()
    x = addNums(amt)
    print('addNums time:',time()-start,'n')

    print('starting addNums')
    start = time()
    x = smartAddNums(amt)
    print('smartAddNums time:',time()-start,'n')

    print('starting simpleAdd')
    start = time()
    y = simpleAdd(amt)
    print('simpleAdd time:',time()-start,'n')

    print('starting sumNums')
    start = time()
    z = sumNums(amt)
    print('sumNums time:',time()-start,'n')

    print('starting sum(genNums))')
    start = time()
    a = sum(genNums(amt))
    print('genNums time:',time()-start,'n')

    # my results on 1.5ghz quadcore amd with 6 gb ram
    [Path Was Here]>C:Python27python.exe gen_bench.py
    ('addNums time:', 6.4649999141693115, 'n')
    ('smartAddNums time:', 4.881999969482422, 'n')
    ('simpleAdd time:', 4.533999919891357, 'n')
    ('sumNums time:', 2.3479998111724854, 'n')
    ('genNums time:', 4.565999984741211, 'n')

    [Path Was Here]>C:Python33python.exe gen_bench.py
    addNums time: 6.533373117446899
    smartAddNums time: 3.725213050842285
    simpleAdd time: 3.689210891723633
    sumNums time: 1.3920798301696777
    genNums time: 4.394252061843872

  5. In the first example you realized that reassigning x was skewing the data from your time results. However, in your second example you did the same thing with startT and curNum variables. Is the difference negligible in this situation?

  6. hello,your video is absolutely good but i want to know something command line option parsing in twitter i mean how we will use if you send some code then it will be good for me…thanks jiban

Leave a Reply

Your email address will not be published. Required fields are marked *