Ugly Python Hacks For Beautiful People: Part 1

https://dev.to/rly

I sometimes tweet out small Python tricks under the “Ugly Python Hacks For Beautiful People” label. I thought I’d collect some of them into a blog post.

These range from great, useful tricks that aren’t mentioned that much in the docs, to ugly answers to questions nobody ever asked that you probably won’t ever need to use in practice. I’ll go through them in arbitrary order of usefulness - not coincidentally, the most useful tricks are probably the least obscure!

1. Unpacking inside displays

This isn’t exactly obscure or a “hack”, but the Python docs don’t mention it very explicitly. Most Python devs know about unpacking arguments in function calls and assignments:

>>> def foo(a, b, *rest):
...     print(f"a: {a}")
...     print(f"b: {b}")
...     print(f"rest: {rest}")
>>> l = [1, 2, 3, 4, 5]
>>> foo(*l)
a: 1
b: 2
rest: (3, 4, 5)
>>> a, *b, c = range(4)
>>> a
0
>>> b
[1, 2]
>>> c
3

With Python 3.5 implementing PEP-448, you can now use the unpacking syntax in tuple, list, set, and dictionary displays. You no longer need the old hack with dict(old_dict, **new_dict) or chaining .update() calls; you can just do this:

>>> d = {"a": 1, "b": 2, "c": 3}
>>> e = {"c": 0, "d": 1, "e": 2}
>>> f = {**d, **e}
>>> f
{'a': 1, 'b': 2, 'c': 0, 'd': 1, 'e': 2}

Since building a collection with unpacking compiles to a single bytecode instruction, it’s also by far the fastest way to merge dicts!

2. Using _names to avoid polluting the namespace on from foo import *

This one is a lesson from the standard library’s playbook!

With foo.py with the following contents:

# foo.py
import os


def cwd():
  return os.getcwd()

It turns out that, since os is declared at the top-level scope (effectively the same as if we did os = __import__("os"), or, for that matter, os = "virtually anything"), Python treats it the same as any other name, and imports it into the global scope. This can, sometimes, go wrong:

>>> os = "Linux"
>>> from foo import *
>>> cwd()
'/home/fox'
>>> os
<module 'os' from '/usr/lib/python3.8/os.py'>

Uh-oh! This is also why it’s often discouraged to import *. However, when you write libraries that other people may use, it’s not really up to you to judge them on what they do in their code - and so, the standard library offers this neat little solution:

# foo.py
import os as _os

def cwd():
  return _os.getcwd()

Quoting from the docs on import statements:

If __all__ is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ('_').

With this trick, the code now behaves as we want:

>>> os = "Linux"
>>> from foo import *
>>> cwd()
'/home/fox'
>>> os
'Linux'
>>> _os
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name '_os' is not defined

3. Class bodies are a scope like any other

Class definitions don’t have a special meaning in Python syntax. This is a well-known, but oft-underutilized fact! They can contain any of the same Python statements that you can put at the outer scope of a file.

# foo.py

class Foo:
    import sys

    if sys.version_info < (3, 7):
        @staticmethod
        def py_38(self):
            return False
    else:
        @staticmethod
        def py_38(self):
            return True

It does exactly what you’d expect:

Python 3.8.1 (default, Jan 22 2020, 06:38:00)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from foo import Foo
>>> Foo.py_38()
True
>>> Foo.sys
<module 'sys' (built-in)>

The documentation has this to say about the practice:

Function defined outside the class

def f1(self, x, y):
    return min(x, x+y)

class C:
    f = f1

    def g(self):
        return 'hello world'

    h = g

Now f, g and h are all attributes of class C that refer to function objects, and consequently they are all methods of instances of C — h being exactly equivalent to g. Note that this practice usually only serves to confuse the reader of a program.

While this is generally true, and methods defined outside the class make the code harder to read, conditional class variables or even method definitions often come in really handy. It’s also sometimes useful to pull in names from an outer scope into the class namespace to provide a nicer API for your class’s users.

4. Module __getattr__ compatibility for Python versions older than 3.7

Python 3.7 brought us, among other wonderful features, PEP-562 - module __getattr__ and __dir__. This means we can have dynamic module members! However, the code won’t work on older Python versions, and it often makes sense to support not just 3.6, but even 3.5:

# foo.py

def __getattr__(name):
    return name

If we run it under 3.7 or newer, all works well:

Python 3.8.1 (default, Jan 22 2020, 06:38:00)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from foo import bar
>>> bar
'bar'

On older Pythons, however:

Python 3.6.10 (default, Jan 21 2020, 12:42:23)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from foo import bar
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ImportError: cannot import name 'bar' from 'foo'

To work around this, we can use this neat little trick:

# foo.py
import sys as _sys


def __getattr__(name):
    return name


# compatibility hack
if _sys.version_info < (3, 7):
    class _ModuleWrapper:
        def __getattr__(self, item):
            return __getattr__(item)

    _sys.modules[__name__] = _ModuleWrapper()

The idea for this comes from PEP-562 itself, since it mentions ` sys.modules` briefly at the very end:

To use a module global with triggering __getattr__ (for example if one wants to use a lazy loaded submodule) one can access it as:

sys.modules[__name__].some_global

By replacing sys.modules[__name__] with a proxy object (in this case, our _ModuleWrapper), we effectively substitute that object for our actual module. Logically, then, since it’s an instance of a class that defines __getattr__, attribute lookups will work as we want them to:

Python 3.6.10 (default, Jan 21 2020, 12:42:23)
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from foo import bar
>>> bar
'bar'

5. A better NewType

Python’s typing library is great, but it’s far from fulfilling its “zero runtime cost” promise. With Python’s function dispatch performance hit, despite NewType and cast effectively being no-ops, it’s still not great to have to use them in performance-critical code.

I have a love-hate relationship with [NewType][]. It’s invaluable when working with types that are identical in code, but are not interchangeable semantically - a good example is bitmasks (ints where each bit is treated as a boolean flag) and normal integers; they’re both represented as ints, but it doesn’t make sense to eg. add and subtract normal integers to and from bitmasks.

Here’s where a problem arises:

# foo.py
from typing import NewType

Bitmask = NewType('Bitmask', int)

def do_stuff_with_bitmask(bitmask: Bitmask):
    ...

bitmask = Bitmask(0b0010)
do_stuff_with_bitmask(bitmask | 0b0001)

See the problem yet? If not, here’s what happens when we run mypy:

$ mypy foo.py
foo.py:9: error: Argument 1 to "do_stuff_with_bitmsk" has incompatible type 
"int"; expected "Bitmask"
Found 1 error in 1 file (checked 1 source file)``

Oops! NewType doesn’t change the return values of its wrapped type’s methods. It makes sense (and it would be very problematic if it did otherwise), but in this case, it’s making our life significantly harder. Bitmasks are generally used for performance or interaction with legacy code; neither is a situation where we want to slow down our code with explicitly casting the result of every operation to Bitmask again!

Luckily, there does exist a perfect solution to this problem. mypy understands declarations and definitions, but it doesn’t understand del - and that works in our favor.

Instead of using a NewType, we can define a class inheriting from the type we want to wrap, declare the signatures of its methods… and then get rid of them, not to bog down the great performance of the builtin primitive types with having to go through our Python code! Here’s how the trick works:

class Bitmask(int):
    def __lshift__(self, other) -> 'Bitmask': ...
    del __lshift__

    def __rshift__(self, other) -> 'Bitmask': ...
    del __rshift__

    def __and__(self, other) -> 'Bitmask': ...
    del __and__

    def __xor__(self, other) -> 'Bitmask': ...
    del __xor__

    def __or__(self, other) -> 'Bitmask': ...
    del

Mypy reads our method signatures, and so knows that eg. a Bitmask bitwise-ORed with anything else still returns a Bitmask. However, immediately after defining our magic methods, we delete them from the scope, so that effectively, at class definition time, the body of Bitmask is exactly equivalent to:

class Bitmask(int):
  pass

This way, we don’t have to hurt our performance by wrapping the underlying int methods with Python code that does nothing except call the int methods, but very slowly - and we don’t have to slow down our performance heavy code by wrapping the result of every operation on Bitmasks in Bitmask(...).

The other solution is directly annotating the Bitmask methods:

class Bitmask(int):
    __lshift__: Callable[['Bitmask', int], 'Bitmask']
    __rshift__: Callable[['Bitmask', int], 'Bitmask']
    __and__:    Callable[['Bitmask', int], 'Bitmask']
    __xor__:    Callable[['Bitmask', int], 'Bitmask']
    __or__:     Callable[['Bitmask', int], 'Bitmask']

I find the del trick slightly more readable, just because of how low the signal-to-noise ratio of Callable is. It still makes sense to remember both of these solutions, just in case mypy starts being able to understand del in the future.

6. Finding where a function was called from

Do you know what’s the easiest and most reliable way, when reading through a Python codebase, to tell that you’re about to step into some real arcane shit? [inspect][] gets imported.

inspect is an incredibly powerful library. So powerful, in fact, that reaching for it is almost always ill-advised. Nevertheless, here’s one of the most common situations I find where inspect is invaluable. Beware, here be dragons.

Let’s say you’re writing a function that needs to log some useful information. “Actually”, you think to yourself, “it would be pretty handy if I knew where the function was called from”. inspect comes to the rescue!

# foo.py
import inspect


def foo():
    # Get outer FrameInfo; ie. one level back in the call stack
    caller_frameinfo = inspect.stack()[1]
    caller_name = caller_frameinfo.function
    print(f"Caller name: {caller_name}")
    # This will fail if the caller's name isn't in the caller's scope. This can
    # happen eg. with lambdas. You'd probably want to wrap this in a `try:`
    # block, or skip this completely and work with the name, rather than a
    # reference to the actual caller. I'm including this here just for
    # completeness' sake.
    caller = caller_frameinfo.frame.f_globals[caller_name]
    print(f"Caller: {caller}")

def bar():
  foo()

bar()

Now, running the file does exactly what it says on the tin:

$ python foo.py
Caller name: bar
Caller: <function bar at 0x10f19e160>

7. Nesting f-strings

This one is particularly ugly. I also struggle to find a valid usecase for it, except maybe some crazy code generation. The ugly hack here is this: you can actually nest f-strings! They just can’t contain backslash-escapes, but that can be worked around trivially:

# foo.py

newline = "\n"

print(f"""{
    newline.join(f'''{
        " ".join(f"{y}{x}" for x in range(4))
    }''' for y in range(4))
}""")

Ever wanted your Python to look like Perl? Well, now it can! Here’s what running that code snippet results in:

$ python foo.py
00 01 02 03
10 11 12 13
20 21 22 23
30 31 32 33

Neat! Ultimately useless, but neat.

Honorable mentions:

The walrus operator did not qualify for this list, since it’s far from being obscure - in fact, it was one of the most talked-about additions to Python in recent versions! It’s definitely a trick you want to keep in your book, though.