PySpark timezone offset from ISO 8601 without UDF
Background
I was quite disappointed to learn that Spark 3.5.3’s only timezone-capable timestamp type always stores timestamps converted to UTC, and always displays these timestamps converted to the session-global timezone.
Admittedly, the situation is equally bad with PostgreSQL’s TIMESTAMPTZ
type, leading to designs where the actual timezone offset must be stored as a separate timezone offset column.
Having a timezone-aware timestamp type that could store the input timezone natively would have been much more useful.