Data structure: Datetime, Date, Timezone¶
Overview¶
datetime is just a revesed work for a composite data structure to storage time.
flowchart LR
library[`datetime` library] --> datetime
library --> date
library --> timedelta
library --> timezone The element of datetime is various
The structure data type with datetime included
Detail Concept¶
It better if has the element
datetime.date,
datetime.datetime,
a str of any format recognized by dateutil.parser.parse(),
or a float or int representing a POSIX timestamp.
The current epoch translates to Date 01/23/2024 @ 3:08am UTC 2024-01-23T03:08:46+00:00 ISO 8601 Tue, 23 Jan 2024 03:08:46 +0000 RFC 822, 1036, 1123, 2822 Tuesday, 23-Jan-24 03:08:46 UTC RFC 2822 2024-01-23T03:08:46+00:00 RFC 3339 What is the unix time stamp? The unix time stamp is a way to track time as a running total of seconds. This count starts at the Unix Epoch on January 1st, 1970 at UTC. Therefore, the unix time stamp is merely the number of seconds between a particular date and the Unix Epoch. It should also be pointed out (thanks to the comments from visitors to this site) that this point in time technically does not change no matter where you are located on the globe. This is very useful to computer systems for tracking and sorting dated information in dynamic and distributed applications both online and client side.
Human Readable Time Seconds 1 Hour 3600 Seconds 1 Day 86400 Seconds 1 Week 604800 Seconds 1 Month (30.44 days) 2629743 Seconds 1 Year (365.24 days) 31556926 Seconds What happens on January 19, 2038? On this date the Unix Time Stamp will cease to work due to a 32-bit overflow. Before this moment millions of applications will need to either adopt a new convention for time stamps or be migrated to 64-bit systems which will buy the time stamp a "bit" more time.
Common Task¶
Transfer between type¶
Type hint in coding¶
The libraries working¶
Other data handle datetime¶
Standard in the Data World¶
Detail Some Concept of library¶
The default library is datetime
Problem when working with datetime¶
How to create a new
The patterns
https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements How to create a new
The patterns
https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements
Working datetime on Excel¶
Overview¶
The 1900 and 1904 date systems The ‘date system’ of an XLSX file determines how dates and times in the single number representation are interpreted. XLSX files always use one of two possible date systems:
In the 1900 date system (the default), the reference date (with number 1) is 1900-01-01. In the 1904 date system, the reference date (with number 0) is 1904-01-01. Complications arise not only from the different start numbers of the reference dates, but also from the fact that the 1900 date system has a built-in (but wrong) assumption that the year 1900 had been a leap year. Excel deliberately refuses to recognize and display dates before the reference date correctly, in order to discourage people from storing historical data.
More information on this issue is available from Microsoft: https://docs.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year In workbooks using the 1900 date system, openpyxl behaves the same as Excel when translating between the worksheets’ date/time numbers and Python datetimes in January and February 1900. The only exception is 29 February 1900, which cannot be represented as a Python datetime object since it is not a valid date.
You can get the date system of a workbook like this:
import openpyxl
wb = openpyxl.Workbook()
if wb.epoch == openpyxl.utils.datetime.CALENDAR_WINDOWS_1900:
print("This workbook is using the 1900 date system.")
This workbook is using the 1900 date system.
Reference¶
https://openpyxl.readthedocs.io/en/stable/datetime.html
- Document for datetime in https://developer.mozilla.org/en-US/docs/Web/HTML/Date_and_time_formats#examples
Troubleshooting¶
Error¶
Python fromtimestamp OSError
https://stackoverflow.com/questions/37494983/python-fromtimestamp-oserror
>>> from datetime import datetime
>>> from datetime import datetime
>>> datetime.fromtimestamp(1722768765523)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> datetime.fromtimestamp()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: fromtimestamp() missing required argument 'timestamp' (pos 1)
>>> datetime.fromisoformat(str(1722768765523))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Invalid isoformat string: '1722768765523'
>>> timestamp = 1722768765523
>>> datetime.fromtimestamp(timestamp)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
>>> datetime.fromtimestamp(timestamp)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument
f you get this error and you're not using an obviously wrong timestamp, check your units.
fromtimestamp expects a timestamp in seconds, whereas it's quite common to get timetstamps in milliseconds (e.g. I found this when trying to parse a timestamp produced from Moment.js in a calendar widget).
Take the timestamp 1523443804214 - it's 11th April 2018, about 15 minutes before I made this post. According to Epoch Converter, no problem, but note: "Assuming that this timestamp is in milliseconds:".
In Python this returns an OSError:
In [15]: datetime.fromtimestamp(1523443804214.0)¶
OSError Traceback (most recent call last)
In [17]: datetime.fromtimestamp(1523443804.214) Out[17]: datetime.datetime(2018, 4, 11, 11, 50, 4, 214000) the result is what we expect.
https://stackoverflow.com/a/49773492
Reference¶
https://docs.rs/chrono/latest/chrono/format/strftime/index.html
https://www.unixtimestamp.com/
Control timestamp¶
BigQuery Doesn't Store Timezones Alongside Timestamps
bookmark_border Problem User loads data with time values into BigQuery. These values are converted to timestamps and stored, however, the information about the timezone they came from is lost. Environment BigQuery dataset with loaded time data. Solution Storing timezone information is not best practice. The application should store all timestamps as a single TZ (ideally UTC) and present them as needed.
If the user still wants to preserve the information about the timezone, or any additional information related to time data, they need to store it separately, for example, in a new column of their dataset.
Cause The timestamps are being stored as long integers (basically the timestamp is a number of milliseconds passed since 01 January 1970 (UTC), so it does not depend on timezone). When user loads a timestamp to BigQuery, it is automatically converted to a long integer using timezone provided (or assuming UTC if no timezone provided).
https://cloud.google.com/knowledge/kb/bigquery-doesn-t-store-timezones-alongside-timestamps-000004468
The origin of flow component¶
s1[Source Asia/HoChiMInh] -- convert timestamp UTC --> lake[Datalake using only UTC] -- parse back --> target
Python control¶
Create the timestamp to UTC
Using native
At the lake, it can be load back and transfer to timestampe
at the element componenbt