PySpark timezone offset from ISO 8601 without UDF

Background

I was quite disappointed to learn that Spark 3.5.3’s only timezone-capable timestamp type always stores timestamps converted to UTC, and always displays these timestamps converted to the session-global timezone.

Admittedly, the situation is equally bad with PostgreSQL’s TIMESTAMPTZ type, leading to designs where the actual timezone offset must be stored as a separate timezone offset column.

Having a timezone-aware timestamp type that could store the input timezone natively would have been much more useful.

Use direnv for uv with out-of-source virtual environments

The new Python package and project manager uv is in fact amazing.

I say that, because it’s really fast, but more importantly because this single tool does a whole lot, really fast: Installing Python binaries, installing and running packages in self-contained environments like pipx, managing virtual environments.

However, I’ve been avoiding it so far due to one flaw: uv defaults to installing its virtual environment and all dependencies into the .venv sub-directory of your project, almost exactly like the notorious node_modules.

Configure Thunderbird 128 e2e encryption with GnuPG

It took me longer than I would have liked to setup the latest Thunderbird 128 (Supernova!) to use my existing GnuPG-based encryption setup, for a large part because TB defaults to its own more straight-forward built-in defaults for key management, and so I’m going to publish the recipe here to save you some time, hopefully.

All the details, at various levels of obviousness, can be found on this Mozilla wiki page, but here I’m going to make the whole sequence more obvious.

Light-weight setup of LF console file manager with image, source code and archive previews

lf, or “list files”, is a single binary file manager, inspired by the ranger file manager, but written in Go. Using this tool, you can navigate really quickly, build up a mental model of the filesystem layout and make modifications with ease.

Out of the box, this is super useful on remote machines or even docker containers where you don’t have access to your normal full configuration, in my case Emacs with dired.

AI screenshot renamer with ollama LLaVA, GPT-4o and macOS OCR

Last week Microsoft had to deal with some criticism, because they announced “Recall”, a new feature, available only on their new Copilot+ AI-enabled laptops, that makes regular screenshots as the computer is used and uses on-device models to generate descriptions of these images that can be stored in a database (sqlite of course) and later searched, so that a user can effectively go back in time to find almost anything. They have stated that everything happens and is stored on-device, and that the whole feature can be easily disabled.

Performance comparison of six different LTTB (visual downsampling for timeseries data) algorithm implementations for Python

LTTB, or Largest-Triangle-Three-Buckets, is a fantastic little algorithm that you can use for the visual downsampling of timeseries data.

Let’s say your user is viewing a line chart of some timeseries data. The time-period they have selected contains 50000 points, but their display is only 4K so they have a maximum of 3840 pixels available horizontally. With LTTB, we can automatically select the 3840 or fewer points from those 50000 points that will produce a line graph which is visually very similar to what they would see if they were to try and render all 50000 points.