Data Model Features

The bippLang data model supports various features that allow users to define reusable data structures in an easy but efficient manner.

Autogenerated datasets from Data source

The default dataset definition in bippLang is autogenerated whenever tables from a datasource are added to the project. Definitions for the original tables and columns are already available. For example, the definition for a table stock_goods from a MySQL database warehouse_operations would automatically be available in the data model as follows. The default dataset name will be same as the project name and may be changed to an appropriate name.

project Operations
  dataset default
    table stock_goods
  table stock_goods
    data_source MySQLDoc
    sql `warehouse_operations`.`stock_goods`
    //column definitions would follow here

This can subsequently be edited, to make it more suitable for generating reports.

Column Types

The data type of columns in the source database is mapped to its equivalent bippLang data type when generating a model. The mapping between different SQL data types and bippLang data types is as follows.

bippLang Data Type SQL Data Types
Integer TinyInt, SmallInt, Int, Int2, Int4, Integer, BigInt,UInt8, UInt16, UInt32, UInt64, Int8, Int16, Int32, Int64
Float Decimal, Double, Float, Float32, Float64, Numeric, Real
DateTime Time, TimeStamp, Date, DateTime
Bool Bool, Boolean
String Char, Varchar, Character, String, Text, FixedString, Any other data type not defined above.

Derived Tables

Derived tables can be defined as subsets or extensions of tables available from the data source. These consist of the SQL statement that would be used to retrieve the data from the data source and definitions of the columns retrieved. Following example illustrates the definition for a derived table. It consists of just two columns from the original table but with date-format and aggregation operations applied on them.

table shipping_ledger_view
    data_source MySQLDoc
    sql """ SELECT DATE_FORMAT(ShipDate, "%m, %Y") AS monthAndYear, 
            DATE_FORMAT(ShipDate, "%b, %Y") AS monthAndYearName, 
            Count(*) AS num_anomalies 
            FROM `warehouse_operations`.`shipping_ledger` WHERE ShipDate IS NOT NULL GROUP BY 1, 2  """
    //column definitions for retrieved columns would follow here

Dynamic Tables

Dynamic tables are not based on any table from the data source. They may be included by the data analyst to provide additional metadata to the model. Following is an example of a dynamic table.

  table indicator_names
    data_source MySQLDoc
    sql """ (select 'SE Indicator' as indicator_name) 
    union all (select 'DE Indicator' as indicator_name)
    union all (select 'Overall Performance Indicator' as indicator_name) """
    column indicator_name
      sql indicator_name
      type string    

Table Joins

As seen in the project structure described previously, joins allow you to define the relationship between any two or more tables that are included in the data model. In case data is being retrieved from these two tables, the defined join would be used when fetching the data. The following example shows the data model definition with included tables and joins.

project Operations
  dataset OpsDS
    table efficiency_data shipping_statement_view
    join shipping_statement_view join efficiency_data
        on shipping_statement_view.WeekYear = efficiency_data.WeekYear

When visualizing data in the data viewer, columns may be selected from the tables efficiency_data and shipping_statement_view. In this situation, bipp will automatically generate the appropriate SQL using the JOIN definition given above.

Derived columns

Columns may be used as-is from the data source or may be defined as operations on existing columns. These operations could be defined using one or many other columns and include any SQL operation supported on the database. Following example shows one direct and one derived column.

column ArrivalTime  //user defined column name. May or may not be the same as DB column name.
  sql ArrivalTime  //DB column name
  type string
column ArrivalDate
  sql """ STR_TO_DATE(ArrivalTime,'%m/%e/%Y %k:%i') """ //SQL Expression
  type datetime

bippLang intelligently evaluates the given SQL expression as atomic or non-atomic. In the first example above, sql ArrivalTime, the SQL is atomic and bippLang interprets it to be a column in the backend database (“db namespace”).

The SQL in the second example for ArrivalDate is non-atomic and bippLang interprets that the expression refers to the users namespace. So ArrivalTime in the expression in the second example refers to the ArrivalTime in the first example which is in the users namespace.

bippLang also allows users to create constant columns or columns which have the same value in all rows. Due to the above interpretation for atomic and non-atomic sql, for a constant column with sql defined as sql 42, bippLang would search for a column named 42 in the database which does not exist. A workaround for this would be to use the expression sql (42) instead of sql 42.

Parameter Based Columns

bippLang table definitions may include parameters with allowed values. The actual value of the parameter would be entered by the user while visualizing the dataset. Additionally columns based on these parameters may be defined such that the column value will be calculated based on the selected value for the parameter. The following code shows the definition for a parameter FromYear and a column Age that depends on this parameter. Parameters are defined at the same level as the column.

Parameter Definition

parameter FromYear
    allowed_value 2000
    allowed_value 2010
    allowed_value 2020
    default_value 2020

Column Definition

column Age
    sql " (case when ($FromYear - Year > 20) then 'Old' else 'Recent' end) "
    type string

Column Properties

Column definitions may be enhanced by using the following formatters which are usually required in reports or charts.


This can be used to provide an optional label to the column and may include spaces. Following is an example of a column with label specified.

column SLIndicator
    label SLIndicator in %


This can be used to indicate how the column data should be formatted. bippLang number formatting is based on Excel number formats. Custom formatting of numbers may be achieved using the following format specification.

"prefix###000.000###,suffix" Here, “#” indicates optional digits and “0”:indicates the must have digits.

For example decimal formatting for currencies may be applied as follows.

column SEIndicator
    sql SEIndicator
    type float
    format $0.00

Using this expression, a column value of 1.122e2 would be displayed as $112.20. Similarly the following format expressions would yield the corresponding results.

Value Format Result
123456789 0.00,, M 123.45 M
1.122e12 $#,##0.00,, M $1,122,000.00 M
123456789 0.00##,, M 123.4567 M
1234.56 $0.00, K $1.23 K

Format may be applied to date data types as well. Date format may be specified using a combination of letters to indicate days, month, year etc. Following are the options available

  • d: Day of the month as digits; no leading zero for single-digit days.
  • dd: Day of the month as digits; leading zero for single-digit days.
  • ddd: Day of the week as a three-letter abbreviation.
  • dddd: Day of the week as its full name.
  • m: Month as digits; no leading zero for single-digit months.
  • mm: Month as digits; leading zero for single-digit months.
  • mmm: Month as a three-letter abbreviation.
  • mmmm: Month as its full name.
  • yy: Year as last two digits; leading zero for years less than 10.
  • yyyy: Year represented by four digits.
  • h: Hours; no leading zero for single-digit hours (12-hour clock).
  • hh: Hours; leading zero for single-digit hours (12-hour clock).
  • H: Hours; no leading zero for single-digit hours (24-hour clock).
  • HH: Hours; leading zero for single-digit hours (24-hour clock).
  • M: Minutes; no leading zero for single-digit minutes.
  • MM: Minutes; leading zero for single-digit minutes.
  • s: Seconds; no leading zero for single-digit seconds.
  • ss: Seconds; leading zero for single-digit seconds.
  • l: Milliseconds; gives 3 digits.
  • L: Milliseconds; gives 2 digits.
  • t: Lowercase, single-character time marker string: a or p.
  • tt: Lowercase, two-character time marker string: am or pm.
  • T: Uppercase, single-character time marker string: A or P.
  • TT: Uppercase, two-character time marker string: AM or PM.

Following are some examples for using the date format

Value Format Result
2020-04-06T19:05:02Z ddd mmm dd yyyy HH:MM:ssM Mon Apr 06 2020 19:05:02
2020-04-06T19:05:02Z mmmm d, yyyy April 6, 2020
2020-04-06T19:05:02Z HH:MM:ss 19:05:02
2020-04-06T19:05:02.456Z HH:MM:ss L 19:05:02 46

Color Format

This property may be used to format the data cell when data is presented in the data viewer. In its most basic form, the property can be used to apply color based on the value of data in the cell. It provides multiple advanced options to define the format as follows.

  • Format conditions may be based on a fixed value or range of possible values of data.
  • The color to be applied may also be specified as a single color or a range of hex codes for different colors.
  • The color is applied to the text by default but may be applied to the background instead.
  • The value of the data itself may be treated as a color to be applied.
  • This property can be also used to pass a fixed text value that should be used to fill the cell.

Following are the sub-functions or attributes used to define the color_format

  1. val: This function is used to define the data values to be compared. Function argument may be a fixed value or range of values. The “-” sign may be used at one end if the range does not have an upper or lower limit. Following are a few examples for defining the val function.

    • val(1000) : Condition satisfied if data cell value = 1000
    • val(1001 to 5000) : Condition satisfied if data cell value is between 1001 and 5000
    • val(100 to -) : Condition satisfied if data cell value is greater than 99
    • val(- to 600) : Condition satisfied if data cell value is less than 601
  2. color: This function is used to define the color or range of colors to be used for formatting. When specified as a range, the color to be applied is calculated based on how far the actual value of data is from the minimum value specified in the val function. For example, if color format is defined as

     color_format """val(100 to 500) color(#00FF00 to #0000FF)"""

    In this case, if the data value is 450, the color chosen will be closer in shade to #0000FF than it would be to #00FF00

  3. background: By default the color identified by the color function will be applied to the cell text. In order to apply it to the cell background instead of the text, the background attribute may be specified.

  4. datacolor: If this attribute is specified, bippLang ignores the val and color functions and assumes that the value in the data cell itself is a color value and should be applied to the cell.

  5. transpose: This function is used to define a text value that should be written into the cell if the criteria specified in the val function is satisfied. It may be used to specify certain symbols or constant text that needs to be displayed in the cell.

Following is a simple example of a column definition with color_format specified.

    column SEIndicator
        sql """ SUM(TimeDiff)/COUNT(DISTINCT PL) """
        type float
        color_format """ val(0.0 to 34.0) color(red) """
        color_format """ val(35.0 to 44.0) color(greenyellow) """
        color_format """ val(45.0 to -) color(green) """  

Following examples combine the functions and attributes described above to achieve the results as described.

color_format definition Meaning
"”” val(100) color(green) background "”” If value is 100, the background of the containing data cell is green
"”” val(10001 to 20000) color(#FF0000) transpose(⬤) "”” If value is between 10001 to 20000, change the text in the cell to ⬤
"”” datacolor "”” Read the data cell value and apply the corresponding text color to it
"”” transpose(▇) "”” Change all data cell values to ▇


This may be used to hide specific columns in the data viewer and should be set to true to hide columns. The hidden columns will not be available for selection in the data viewer.

column ArrivalTime
    sql ArrivalTime
    type string
    hidden true

Default Aggregate

By default integer columns are aggregated in the viewer as a sum of values. This aggregate may be changed to other functions like count, max, min or avg. Default aggregate may also be disabled in the model or in the viewer. Following example shows a column with default_aggregate of “max”.

column SourceHours
    sql SourceHours
    type int
    default_aggregate max


Datasets often contain hierarchical relationships between data elements. These could be used to represent geographical hierarchy or organizational hierarchy. The drilldown property of columns can be used to indicate this relation between any two columns. By defining such a relation, users can drilldown data for these columns when viewing this dataset.

Consider a table casedata which contains the geographical distribution of disease cases by region, country and state. You want to view the total number of cases at the region level with the ability to drilldown to the number of cases at the country or state level. The drilldown property may be included in the columns Region and Country as follows.

column Region
    sql Region
    type string
    drilldown dl_region
        table casedata
        column Country
column Country
    sql Country
    type string
    drilldown dl_region
        table casedata
        column State
column State
    sql State
    type string
column Cases
    sql Cases
    type int

Support for Multiple Databases

bippLang allows connection to different types of databases or data sources. Following source databases are supported by bippLang.

  • BigQuery
  • BQPub
  • RedShift
  • SQL Server 2017+
  • MySQL 5.7+
  • Impala 5.x+
  • Teradata 15.x+
  • Hive 2.x.x+
  • Postgres 9.x+
  • Snowflake 3.27+
  • Oracle 12c+
  • mongoDB 3.2+
  • MS Excel

Database specific SQL Syntax.

bippLang enforces that, SQL syntax used in queries and operations should be valid for the type of database selected for that project. For example consider the following column definition for a project using MySQL data source.

    column StDate 
        sql """ STR_TO_DATE(StartDate, '%m/%d/%Y') """
        type datetime

Here STR_TO_DATE is a MySQL function used with an appropriate syntax.

Another project that uses a SQL Server data source, would require the following column definition in a similar scenario.

    column StDate 
        sql """ CONVERT(datetime, StartDate, 101) """
        type datetime

Here, Convert is the corresponding MSSQL function and 101 is the format specifier corresponding to %m/%d/%y format used in the MySQL function.

Note: Even if different tables in the model have different data sources specified, ideally they should all reference the same type of database to enable joins between the tables. However, this need not be the case, when Virtualization is turned On. When Virtualization is turned on, all the database types are “virtualized” into a common database called BippDb.