Skip to content

Headstart

Overview

The space to headstart with polars

Components

Data types and structures

The composite elements of polars

flowchart TB
  subgraph structure[Structure]
    direction TB
    dataframe
    series
  end
  subgraph type[Type]
    direction LR
    data_types
    internal_data_types
  end

  dataframe -- included --> series -- contain --> data_types -- implement --> internal_data_types

The core base data structures provided by polars are series and dataframes1:

  • A series is a 1-dimensional homogeneous data structure.

  • A dataframe is a 2-dimensional heterogeneous data structure that contains uniquely named series.

Note

The term "homogeneous" means that all elements in a series have the same data type.

polars provides the following data types, which can be categorized as follows:

Category Types
Numeric signed integers, unsigned integers, floating point numbers, and decimals
Nested data types lists, structs, and arrays
Temporal dates, datetimes, times, and time deltas
Miscellaneous strings, binary data, Booleans, categoricals, enums, and objects.

All types support missing values represented by the special value null.

polars utilizes the Arrow Columnar Format for its data orientation for the internals.

For the detail:

Type Details
Boolean Boolean type that is bit packed efficiently
Int8, Int16, Int32, Int64 Varying-precision signed integer types
UInt8, UInt16, UInt32, UInt64 Varying-precision unsigned integer types
Float32, Float64 Varying-precision signed floating point numbers
Decimal Decimal 128-bit type with optional precision and non-negative scale
String Variable length UTF-8 encoded string data, typically Human-readable
Binary Stores arbitrary, varying length raw binary data
Date Represents a calendar date
Time Represents a time of day
Datetime Represents a calendar date and time of day
Duration Represents a time duration
Array Arrays with a known, fixed shape per series; akin to numpy arrays
List Homogeneous 1D container with variable length
Object Wraps arbitrary Python objects
Categorical Efficient encoding of string data where the categories are inferred at runtime
Enum Efficient ordered encoding of a set of predetermined string categories
Struct Composite product type that can store multiple fields
Null Represents null values

  1. The introduction of polars about data types and structure