Data sources¶
Data source instance¶
A data source instance is a template that helps normalize and consolidate data from heterogeneous data sets. Data source templates make it easy to implement standardized processes and data models that can be shared between projects.
By defining data sources, you build a data catalog that references all your datasets in a standardized way. Data source instances can be shared across projects, removing the need to define normalization and data acquisition processes for data consolidated with the same pipeline.
All datasets consolidated from the same data source instance share the same table in the feature store and can be analyzed with TranzAI no-code analytics components.
Data lineage¶
Thanks to the management of data source instances and the mapping relationships with the master data, raw data that are consolidated in the feature store automatically benefits from TranzAI data lineage.
The metadata of data source instances store all the initial information about the raw data and the transformation process applied to it.
The semantic data lineage is automatically built from the work done at the ontology and master data levels.
Data source inference¶
You can always create a data source instance by defining its schema manually. However, the TranzAI platform provides several inference components that make creating new data source faster and less error-prone.
Data source inference from file¶
You can use the inference form to provide the information about the file used for inference. The TranzAI will automatically save column names and infer data types. The data source is automatically created. You can always manually edit the data source to optimize the data types and minimize memory consumption at the data acquisition and normalization levels (download our cheat sheet on data types optimization for Parquet).
Data source inference from API¶
If your raw data are available through an API, you can define a data acquisition process that maps the schema of the source API to a feature store table representing your normalized data source.
Data source inference from Table¶
If you get access to a remote database, you can infer the schema of the data source from the metadata of the source table.