PySpark timezone offset from ISO 8601 without UDF
Background
During my work at Stone Three upgrading some of our data pipelines with Azure DataBricks, I was quite disappointed to learn that Spark 3.5.3’s only timezone-capable timestamp type always stores timestamps converted to UTC, and always displays these timestamps converted to the session-global timezone.
Admittedly, the situation is equally bad with PostgreSQL’s TIMESTAMPTZ
type, leading to designs where the actual timezone offset must be stored as a separate timezone offset column.
Having a timezone-aware timestamp type that could store the input timezone natively, and which would always display with its full timezone information, would have been much more useful. In our case, we not only need to know the exact timepoint that one of our vision sensors generated a measurement, but we also need to know what the exact local time was for that specific measurement.