Headstart¶
Overview¶
The space to headstart with polars
Components¶
Data types and structures¶
The composite elements of polars
flowchart TB
subgraph structure[Structure]
direction TB
dataframe
series
end
subgraph type[Type]
direction LR
data_types
internal_data_types
end
dataframe -- included --> series -- contain --> data_types -- implement --> internal_data_types
The core base data structures provided by polars
are series and dataframes1:
-
A series is a 1-dimensional homogeneous data structure.
-
A dataframe is a 2-dimensional heterogeneous data structure that contains uniquely named series.
Note
The term "homogeneous" means that all elements in a series have the same data type.
polars
provides the following data types, which can be categorized as follows:
Category | Types |
---|---|
Numeric | signed integers, unsigned integers, floating point numbers, and decimals |
Nested data types | lists, structs, and arrays |
Temporal | dates, datetimes, times, and time deltas |
Miscellaneous | strings, binary data, Booleans, categoricals, enums, and objects. |
All types support missing values represented by the special value null
.
polars
utilizes the Arrow Columnar Format for its data orientation for the internals.
For the detail:
Type | Details |
---|---|
Boolean | Boolean type that is bit packed efficiently |
Int8, Int16, Int32, Int64 | Varying-precision signed integer types |
UInt8, UInt16, UInt32, UInt64 | Varying-precision unsigned integer types |
Float32, Float64 | Varying-precision signed floating point numbers |
Decimal | Decimal 128-bit type with optional precision and non-negative scale |
String | Variable length UTF-8 encoded string data, typically Human-readable |
Binary | Stores arbitrary, varying length raw binary data |
Date | Represents a calendar date |
Time | Represents a time of day |
Datetime | Represents a calendar date and time of day |
Duration | Represents a time duration |
Array | Arrays with a known, fixed shape per series; akin to numpy arrays |
List | Homogeneous 1D container with variable length |
Object | Wraps arbitrary Python objects |
Categorical | Efficient encoding of string data where the categories are inferred at runtime |
Enum | Efficient ordered encoding of a set of predetermined string categories |
Struct | Composite product type that can store multiple fields |
Null | Represents null values |
-
The introduction of polars about data types and structure ↩