SEC Filling Metadata¶
Overview¶
SEC Filling metadata has been published by FinnHub through Kaggle on website from 1994 to 2020
Prerequisite¶
Tool | Description | Access |
---|---|---|
Kaggle CLI | Interactive with kaggle, implemted for Python | Official GitHub |
fsspec | Filesystem interfaces for Python | Documentation |
Method¶
[1] Declare environment
Declare requirements both for selected phase: production or development-only
then execute to set up
[2] Download dataset:
Using command from copy from Kaggle UI with path and unzip support.
You can see the progress with the total resources downloaded (in MB) and the download rate.
[2] Using fsspec
to interactive with folder and multiple read using polars
*) Declare dependencies to working with
a) Get CSV file path in the download folder
b) Read by read_csv
and merge it together by concat
There are 3 strategy of concat by polars:
flowchart LR
polars_method[Polars Concat Method]
polars_method --> vertical[vertical] --> vd[applies multiple vstack operations]
polars_method --> diagonal[diagonal] --> dd[finds a union between the column schemas and fills missing column values with null]
polars_method --> horizontal[horizontal] --> hd[stacks Series from DataFrames horizontally and fills with nulls if the lengths dont match]
As you can see, total of 115 files yield 21M record rows of 8 columns
c) Basic Look on dataframe
d) Data
Response Attributes:
acceptedDate Accepted date %Y-%m-%d %H:%M:%S.
accessNumber Access number.
cik CIK.
filedDate Filed date %Y-%m-%d %H:%M:%S.
filingUrl Filing's URL.
form Form type.
reportUrl Report's URL.
symbol Symbol.
e) Answer some question
Total