bippLang Reference
Here is a reference of the common keywords used in bippLang. They can be categorized as one of these types:
-
Attribute keywords used in the bippLang data model.
-
bipp specific functions to extend standard SQL.
-
meta functions for working with derived tables. These are similar to macros, where they are based on a parameter and evaluated at run-time before the SQL statement is auto-generated.
The Category is included in each entry.
Verifying Queries
The SQL queries and expressions used in the bippLang model are evaluated only when visualizing data in Sheets. To ensure the queries are correct before, during development use the SQL Runner tool. You can also use the Run a Query option:
-
Open
Datasources.
-
Click
to the right of the Datasource and select Run Query. The Query window opens using the selected Datasource. You can change the Datasource with the dropdown.
-
Write your query.
-
Click Run query. You can click Save in a project if you anticipate re-using the query.
-
When you are finished, click
to close the Query window.
Keyword Reference
allowed_value
Category: Attribute for parameter
Where used: table
Required? Optional
Sub-Options: optional attribute label
Syntax
allowed_value value
Allowed values specify the values a user can choose. For example, if you specify value of True and False, a user can only pick one of those options.
At run-time, a parameter can only hold a single allowed_value.
Example
table t
parameter p
allowed_value True
allowed_value False
label Enter a Value
See Also
default_value, label, parameter
apply_filter
Category: attribute for drilldown
Where used: table
Required? Optional
Syntax
apply_filter value
Enables the filter option for visualizations using drilldown option. If value is true then filter is available with drilldown.
Example
column region_countrycode
sql region
type string
label Region_
drilldown drilldown_name
table hrc_fa_forecast_accuracy
column country_code
apply_filter false
See Also
bipp_cumulative_percent
Category: bipp specific SQL window function
Where used: column
Required? Optional
Syntax
bipp_cumulative_percent(numericExpression)
For every row of the returned result, this is the cumulative sum of the numericExpression from the first row through this row. It is expressed as a percentage of the total sum of the numericExpression over all rows in the result.
Example
column VCCumulativePercent
sql """ bipp_cumulative_percent(VictimCount) """
type float
format auto%
label CumulativeVictimCountPercentage
Sample Output
VictimCount | CumulativeVictimCountPercentage |
---|---|
1 | 6.67% |
2 | 20% |
3 | 40% |
4 | 66.67% |
5 | 100% |
Result is calculated as cumulative sum for column in given order/Total sum of that column.
See Also
bipp_percent, bipp_total, bipp_cumulative_total, Format
bipp_cumulative_total
Category: bipp specific SQL window function
Where used: column
Required? Optional
Syntax
bipp_cumulative_total(numericExpression)
For every row of the returned result, this is the cumulative sum of the numericExpression from the first row through this row.
Example
column VCCTotal
sql """ bipp_cumulative_total(VictimCount) """
type int
label VictimCountTillNow
Sample Output
VictimCount | VictimCountTillNow |
---|---|
1 | 1 |
2 | 3 |
3 | 6 |
4 | 10 |
5 | 15 |
See Also
bipp_percent, bipp_total, bipp_cumulative_percent
$bipp_equals
Category: bipp-specific meta function SQL
Where used: derived table
Required? Optional
Syntax
$bipp_equals(string1, string2)
Creates a Boolean condition in the SQL for a derived table. Typically used in conjunction with $bipp_if.
Evaluates to True if the case-sensitive string1 and string2 are equal.
Example
sql """$bipp_if( $bipp_equal("$kind", "TAPE"),
"{tape_transactions}", "{scissor_transactions}" )"""
Sample Output
The above expression will evaluate to “{scissor_transactions}”.
See Also
$bipp_if
$bipp_if
Category: bipp-specific meta function SQL
Where used: Table SQL
Required? Optional
Creates an if-then-else construct in the SQL for a table. An example is to make SQL conditional on parameter values.
Syntax
$bipp_if(Condition, Then, Else)
Condition: defines the condition to be met.
Then: value to use if Condition is True. Can be a nested $bipp_if condition.
Else: value to use if Condition is False. Can be a nested $bipp_if condition.
Example
sql """$bipp_if( $bipp_equal("$kind", "TAPE"),
"tape_transactions",
"scissor_transactions" )"""
Example using nested bipp_if
sql """$bipp_if( $bipp_equal("$kind", "TAPE"),
"tape_transactions",
$bipp_if ($bipp_equal("$kind", "SCISSOR"),
"scissor_transactions",
"other_table") ))"""
Sample Output
Above expression functions like case statement, and would populate “Other_table”.
See Also
$bipp_join
Category: bipp-specific meta function SQL
Where used: Table SQL
Required? Optional
Creates a new string by concatenating the list of strings enclosed in brackets [ ] separated by the specified delimiterString.
Syntax
$bipp_join([string1, string2,...], delimiterString)
Example
sql """$bipp_join( ["a", "b"], "x") )
Evaluates to: "axb"
Example with $bipp_if, $bipp_equal and $bipp_join
In the Learning bipp tutorials, you created the Tennis Stats dataset and visualizations. This example demonstrates using the $bipp functions to determine wins and losses for any player.
The ResultType parameter is added to the dataset to enable users to specify wins or losses when visualizing the results. The MatchesWon and MatchesLost views get the respective data from the Matches table in the database.
table MatchesLost
data_source TennisData
sql """ SELECT m.tournament, m.Surface, p.fname, p.lname, 'Lost' as Result
FROM players.atp_matches m join players.atp_players p on m.LoserId = p.Id """
//column definitions
table MatchesWon
data_source TennisData
sql """ SELECT m.tournament, m.Surface, p.fname, p.lname, 'Won' as Result
FROM players.atp_matches m join players.atp_players p on m.winnerId = p.Id """
//column definitions
table PlayerMatchResult
data_source TennisData
sql """ $bipp_if(
$bipp_equal(
$bipp_join(
['atp_players.$ResultType', ''],
''
),
'Won'
),
'{MatchesWon -}',
'{MatchesLost -}'
) """
Sample Output
The above expression generates data for match results along with winning player details and does union with match results along with losing player details.
Bipp_join works as a concatenation of values provided, with each string value separated by a delimiter value provided.
See Also
bipp_percent
Category: bipp-specific meta function SQL
Where used: Table SQL
Required? Optional
Syntax
bipp_percent(numericExpression)
For every row of the returned result, this is the value of the numeric expression for the row. It is expressed as a percentage of the sum of the numeric expression over all rows in the returned result.
Example
column VCPercent
sql """ bipp_percent(VictimCount) """
type float
format auto%
label VictimCountPercentage
Sample Output
VictimCount | VictimCountPercentage |
---|---|
1 | 6.67% |
2 | 13.33% |
3 | 20% |
4 | 26.67% |
5 | 33.33% |
See Also
bipp_cumulative_percent, bipp_cumulative_total, bipp_total
bipp_total
Category: bipp-specific meta function SQL
Where used: Table SQL
Required? Optional
Syntax
bipp_total(numericExpression)
For every row of returned result, this will be the sum of the numeric expression over all rows in the returned result. Note that by definition, the value of this will be the same in each row of the returned result.
Example
column VCTotal
sql """ bipp_total(VictimCount) """
type float
label VictimCountTotal
Sample Output
VictimCount | VictimCountTotal |
---|---|
1 | 15 |
2 | 15 |
3 | 15 |
4 | 15 |
5 | 15 |
See Also
bipp_cumulative_percent, bipp_cumulative_total, bipp_percent
color_format
Category: Attribute
Where used: Column Property
Required? Optional
color_format changes the cell color when data is presented in the the sheet. In its most basic form, the property applies color based on the value of data in the cell. There are advanced options you can use to define the format:
-
Format conditions based on a fixed value or range of possible values of data.
-
Color can be specified as a single color or a range of hex codes for different colors.
-
Color can be applied to the text or the background.
-
Can apply color based on the value of a different column.
-
Value of the data may be treated as a color to be applied.
-
Can pass a fixed text value to be used to fill the cell.
Additional functions and attributes used to define color_format:
-
val: Defines the data values to be compared. Function argument may be a fixed value or range of values. The - sign may be used at either end if the range does not have an upper or lower limit. For example:
-
val(1000): Condition satisfied if data cell value = 1000
-
val(1001 to 5000): Condition satisfied if data cell value is between 1001 and 5000
-
val(100 to -): Condition satisfied if data cell value is greater than or equal to 100
-
val(- to 600): Condition satisfied if data cell value is less than or equal to 600
-
-
color: Defines the color or range of colors to use. When specified as a range, the color to be applied is calculated based on how far the actual value of data is from the minimum value specified in the val function. For example:
color_format """val(100 to 500) color(#00FF00 to #0000FF)"""
If the data value is 450, the color used is closer to #0000FF than it is to #00FF00
-
background: Specifies the color to be applied to the cell background. By default, the color identified by color is applied to the cell text.
-
source: If the column specified as the value of this property is present in the data, the value that gets compared against the val() function will be taken from the column.
-
datacolor: Instructs bippLang to ignore the val and color functions and use the value in the data cell itself as the color value to apply.
-
transpose: Defines a text value to write to the cell if the criteria specified by the val function is satisfied. For example, you can specify certain symbols or constant display in the cell.
Example 1
column SEIndicator
sql """ SUM(TimeDiff)/COUNT(DISTINCT PL) """
type float
color_format """ val(0.0 to 34.0) color(red) """
color_format """ val(35.0 to 44.0) color(greenyellow) """
color_format """ val(45.0 to -) color(green) """
These examples combine the functions and attributes for advanced results:
""" val(100) color(green) background """
If the value is 100, the background of the containing data cell is green.
""" val(10001 to 20000) color(#FF0000) transpose(⬤) """
If the value is between 10001 to 20000, change the text in the cell to ⬤.
""" datacolor """
Read the data cell value and apply the corresponding text color.
""" transpose(▇) """
Change all data cell values to ▇.
Example 2
""" val(745) color(green) background source(sum_fare_amount) transpose(⬤) """
Show ⬤ in the cell. If the value of sum_fare_amount is 745 for the row, apply green background to the cell.
column
Category: Attribute
Where used: Table
Required? Optional
Each instance of column corresponds to a data column, either native to the base table or derived from a SQL expression composed of other columns, constants and SQL operators.
The base table itself may correspond to a table in the backend database or a select statement.
These are the additional column attributes:
color_format
default_aggregate
drilldown
format
hidden
join_as
label
order_by
partition_by, partition_by_excluding, partition_by_including
sql
type
Example
column amount
sql sum(invoice_qnty*unit_price)
type float
format $0.00
label ItemTotal
count_distinct
Category: bipp-specific SQL aggregation function
Where used: Table SQL
Required? Optional
Syntax
count_distinct(expression)
Counts the first instance of expression, eliminating the subsequent appearance of the same data. This function is the same as standard SQL: count(distinct expression)
Example
This example counts the distinct number of countries from the retail transactions table:
column number_countries
sql count_distinct(country)
See Also
count_if
Category: bipp-specific SQL aggregation function
Where used: Table SQL
Required? Optional
Syntax
count_if(condition)
Counts the item if the condition is met. ThIs function is the same as standard SQL:
sum(case when *condition* then 1 else 0 end)
Example
column c1
sql count_if(c2 = "XYZ")
Sample Output
Id | c2 |
---|---|
1 | XYZ |
2 | ABC |
3 | ABC |
4 | XYZ |
5 | XYZ |
c1 |
---|
3 |
See Also
data_source
Category: attribute
Where used: Table
Required? Optional
Syntax
data_source identifier
Specifies the data source for the table. The value is the identifier for a defined data source.
Example
table stocks
data_source retail_ds
dataset
Category: attribute
Where used: Table
Required? Required
Syntax
dataset name
Defines a dataset called name to use in your bippLang data model. A dataset is a collection of tables and optional join rules.
Example
This example creates a dataset named retailDataset, containing the tables stocks and transactions:
dataset retailDataset
table stocks transactions
default_aggregate
Category: attribute
Where used: Column
Required? Optional
Syntax
default_aggregate type
Automatically applies the specified aggregate type on a numeric column when the column is selected in the Data Viewer. These are standard SQL aggregate functions. If no aggregate is needed, specify none.
Here are the values for type:
sum: total of the values. This is the default value if type is not specified.
count: number of items found.
max: maximum value found.
min: minimum value found.
avg: average of the values.
none: no aggregate applied.
Example
column customer_id
sql customer_id
type int
default_aggregate none
default_value
Category: attribute
Where used: Parameter
Required? Required
Syntax
default_value value
Sets the default value of a parameter to value. The value must be consistent with the specified type (int, string, float).
Example
table transactions
parameter item
type string
default_value "none selected"
See Also
drilldown
Category: attribute
Where used: Column
Required? Optional
Syntax
drilldown column_name
Datasets often contain hierarchical relationships between data elements. For example, they can represent a geographical hierarchy or an organizational hierarchy.
The drilldown attribute is used to indicate this relationship between two columns. Once the relationship is defined, users can drilldown the data for these columns when visualizing the dataset.
Example
column Region
sql Region
type string
drilldown dl_region
table casedata
column Country
column Country
sql Country
type string
drilldown dl_region
table casedata
column State
column State
sql State
type string
Sample Output
Drilldown will generate data using three columns mentioned in the above expression. It will populate countries associated with respective region and will populate states associated with each country.
filter
Category: attribute
Where used: Table
Required? Optional
Syntax
filter string
If string is specified, any use of the table always includes the filter in its where and/or having clauses.
format
Category: attribute
Where used: Column
Required? Optional
Syntax
format specification
Indicates how the numeric or date column data should be formatted. bippLang number formatting is based on Microsoft Excel number formats.
Custom number formatting can be achieved using the format specification:
prefix###000.000###,suffix
where:
prefix: static information to appear at the beginning of the result, for example: $
# indicates optional digits
0 indicates required digits
. indicates the position of the decimal point
, adds a comma when specified in the ### sequence (#,##0) or divides by 1000 if added at the end of the number string prior to a suffix (0.00,, M)
suffix: static information to appear at the end of the result, for example: M
For example, to specify decimal formatting for currency:
column SEIndicator
sql SEIndicator
type float
format $0.00
Using this expression, a column value of 1.122e2 is displayed as $112.20. Here are some additional examples:
Value | Format | Result |
---|---|---|
123456789 | 0.00,, M | 123.45 M |
1.122e12 | $#,##0.00,, M | $1,122,000.00 M |
123456789 | 0.00##,, M | 123.4567 M |
1234.56 | $0.00, K | $1.23 K |
Custom date format is specified using a combination of letters to indicate days, month, year and time. Here are the options:
d: Day of the month as digits; no leading zero for single-digit
days.
dd: Day of the month as digits; leading zero for single-digit days.
ddd: Day of the week as a three-letter abbreviation.
dddd: Day of the week as its full name.
m: Month as digits; no leading zero for single-digit months.
mm: Month as digits; leading zero for single-digit months.
mmm: Month as a three-letter abbreviation.
mmmm: Month as its full name.
yy: Year as last two digits; leading zero for years less than 10.
yyyy: Year represented by four digits.
h: Hours; no leading zero for single-digit hours (12-hour clock).
hh: Hours; leading zero for single-digit hours (12-hour clock).
H: Hours; no leading zero for single-digit hours (24-hour clock).
HH: Hours; leading zero for single-digit hours (24-hour clock).
M: Minutes; no leading zero for single-digit minutes.
MM: Minutes; leading zero for single-digit minutes.
s: Seconds; no leading zero for single-digit seconds.
ss: Seconds; leading zero for single-digit seconds.
l: Milliseconds; gives 3 digits.
L: Milliseconds; gives 2 digits.
t: Lowercase, single-character time marker string: a or p.
tt: Lowercase, two-character time marker string: am or pm.
T: Uppercase, single-character time marker string: A or P.
TT: Uppercase, two-character time marker string: AM or PM.
Here are some examples:
Value | Format | Result |
---|---|---|
2020-04-06T19:05:02Z | ddd mmm dd yyyy HH:MM:ssM | Mon Apr 06 2020 19:05:02 |
2020-04-06T19:05:02Z | mmmm d, yyyy | April 6, 2020 |
2020-04-06T19:05:02Z | HH:MM:ss | 19:05:02 |
2020-04-06T19:05:02.456Z | HH:MM:ss L | 19:05:02 46 |
The default time zone for the column is set based on the data source timestamp. However, users can visualise the reports, dashboards in their local timezone by using format function like below:
column invoice_date
sql invoice_date
type datetime
format """local"""
Example, when local used from GMT +5:30:
Invoice_Date | Format | Formatted invoice_date |
---|---|---|
2021-10-30T19:05:02Z | local | 2021-10-31T00:35:02 |
In the above example, source timezone of invoice_date is Coordinated Universal Time (UTC) and by assigning “local” to the format keyword, we are converting to the user local time zone, i.e. GMT +5:30.
See Also
hidden
Category: attribute (boolean)
Where used: Column
Required? Optional
Syntax
hidden value
Hides the column in the Data Viewer if value is true. If value is not specified, it defaults to false.
Example
column _invoice_date
sql invoice_date
type string
hidden true
inherits
Category: attribute
Where used: Table
Required? Optional
Syntax
inherits table_name
Applies all of the attributes from the specified table_name to the table you are defining. Any further specifications within your table definition overwrite the specific inherited attribute.
Example
In this example, table t2 inherits all attributes of table t1.
table t2
inherits t1
join
Category: attribute
Where used: Dataset
Required? Optional
Syntax
join table1 joinType1 table2
on sqlExpression1
[then joinType2 table3
on sqlExpression2 ...]
where joinType can be any of the following:
-
inner, inner join, join OR <->
-
left, left join, left outer join OR -->
-
right, right join, right outer join OR <--
-
full, full join, full outer join OR >-<
Defines a SQL JOIN relationship between two or more tables. The auto-generated SQL automatically uses any appropriate JOIN relationships, if defined.
Example 1
project Operations
dataset OpsDS
table efficiency_data shipping_statement_view
join shipping_statement_view join efficiency_data
on shipping_statement_view.WeekYear = efficiency_data.WeekYear
Example 2
project p
dataset ds
table t1 t2 t3
join t1 inner t2
on t1.c1 <= t2.c2
then outer t3
on t1.c1=t3.c3 AND t2.c2=t3.c3
join_as
Category: attribute
Where used: Column
Required? Optional
Syntax
join_as name
This is a shortcut for a simple inner join using the equality (=) operator. If specified, the join_as name for columns implicitly induce a join relationship if, and only if, no explicit join relationships already exist in the corresponding dataset.
Typically, you specify explicit join relationships in a dataset and join_as is not used.
Example 1
The data model contains:
table t1
column c1
join_as x
table t2
column c2
join_as x
The dataset is defined as:
dataset ds
table t1 t2
The two instances join_as (containing the same name x) to product the join relationship:
join t1 inner t2
on t1.c1 = t2.c2
Example 2
In this example, the dataset has an explicit join, so the join_as instances instances are ignored. No new join relationship with t1.c1 or t2.c2 is created.
The dataset is defined as:
dataset ds
table t1 t2 t3
join t1 outer t3
on someSQLexpression
See Also
label
Category: attribute
Where used: Dataset, Table, Column, Parameter, allowed_value
Required? Optional
Syntax
label name
Applies the specified name to the item where it is defined. The name is used in the UI when the item is used. It can be nested table, dataset, parameter, column, or allowed_value. If label is not specified, a label is automatically created from the name or value of the item.
Example
column SLIndicator
label SLIndicator in %
on
Category: attribute
Where used: join, or then nested under a join
Required? Optional
Syntax
on expression
Specifies columns and condition to define the join.
Example
table tape_transactions
sql """SELECT t.invoice_num AS invoice_num,
t.stock_code AS stock_code,
t.unit_price AS unit_price, t.country AS country
FROM `online_retail`.`stocks` AS s
JOIN `online_retail`.`transactions` AS t
ON s.code = t.stock_code
WHERE s.description like '%TAPE%'"""
order_by
Category: attribute
Where used: column
Required? Optional
This is used for sorting a column in data model based on another column.
Example
column account
label Account
sql account
type string
order_by _
table hrc_gp_cnis
column seq
parameter
Category: attribute
Where used: table
Required? Optional
Syntax
parameter Name
type type
default_value DefaultValue
label Label
allowed_value Allowed_Value_1
string_format Format
where
type (required): can be int, float, datetime, string or bool.
default_value (required): is the value if no value is specified.
label (optional): specifies a description label to use in the Data Viewer.
allowed_value (optional): one or more allowed values for user selection.
string_format (optional, used with type string): specifies the format to use for the string.
A parameter behaves like a required filter. A parameter’s value is passed on to the auto-SQL generator engine as a filter like tableName.parameterName = value.
Parameters can be shared across tables. If parameter p is defined in table t, it can be referred to in the SQL for a column in another table as t.p.
Example
table tapes_or_scissors
parameter kind
type string
default_value TAPE
allowed_value TAPE
allowed_value SCISSOR
partition_by, partition_by_excluding, partition_by_including
Category: attribute
Where used: column
Required? Optional
Syntax
column columnName
sql AggregateSQLexpression
partition_by* ListOfColumns
where
partition_by* indicates one of the three options: partition_by, partition_by_excluding, partition_by_including.
ListOfColumns: column list separated by white space. The list of columns can be empty only for partition_by. The list cannot be empty for partition_by_excluding or partition_by_including.
The partition constructs enable bipp to support window functions, also called Analytic Functions. These functions are dynamic and agnostic of the SQL dialect. There are more details and examples in the Dynamic Analytic Functions section.
Example 1
column baf1
sql sum(c1)
partition_by c2, c3
Computes baf1 partitioned by (c2, c3)
Example 2
column baf2
sql sum(c1)
partition_by_excluding c4
Computes baf2 by partitioning on user-selected columns after excluding c4 if c4 was selected by the user.
Example 3
column baf3
sql sum(c1)
partition_by_including c2, c4
Computes baf3 by partitioning on user-selected columns after including c2 and c4 if they are not already selected by the user.
reftable
Category: attribute
Where used: dataset
Required? Optional
Syntax
reftable table_name
Specifies a table to be used for internal reference in the dataset. The table does not appear in the Data Viewer.
Example
dataset DataAnalysis
table PlayerMatchResult
reftable MatchesWon MatchesLost
sql
Category: attribute
Where used: table or column
Required? Optional
Syntax
sql AggregateSQLexpression
The sql statement can be used in both the table and column definitions:
Table: the sql statement should be either the name of a backend database in a format required by the backend, or a valid SELECT statement.
Column: the sql statement should be a valid SQL-fragment that is either the (unqualified) name of a backend column, or a valid SQL expression in the SQL-dialect of the backend. How the statement is handled for a column depends on the specification:
-
If the sql specification is atomic (consists of a single SQL token such as the name of a backend column, string, or numeric/date constant), it is interpreted in the backend namespace.
-
If the sql specification is non-atomic (consists of more than one SQL token), any column names are interpreted in the bippLang data model namespace.
Example
1 column c1
2 sql c1
3 column c2
4 sql c1 + 42
where
c1 in line 2 refers to a column in the backend namespace
c1 in line 4 refers to a column in the bippLang data model
The backend name for the bippLang column c1 also happens to be c1. If the column definition for c1 (starting in line 1) was to be removed from the data model, line 4 would produce an error although there is a backend column called c1.
type
Category: attribute
Where used: column, parameter
Required? Optional
Syntax
type data_type
where
data_type can be: int, float, datetime, string or bool.
Specifies the alphanumeric data type the column contains, or the type of value for a parameter.
Example
column _invoice_date
sql invoice_date
type string
hidden true
column invoice_date
sql cast(_invoice_date as DATE)
type datetime
column unit_price
sql unit_price
type float
format $0.00
column customer_id
sql customer_id
type int
default_aggregate none
See Also
variables, var, value
Category: attribute
Where used: any bippLang property
Required? Optional
Syntax
variables
var name
value value
This is a mechanism for macro-substitution in bippLang.
Variables are defined with var-value pairs under a variables section.
Variables are referred to by their names enclosed in {{ and }}.
Example
table t
variables
var x
value 42
var y
value "fortytwo"
column c1
sql {{x}}
column c2
sql concat({{y}}, "foo")
expands to:
table t
column c1
sql 42
column c2
sql concat("fortytwo", "foo")
Advanced Features
Dynamic Analytic Functions
Analytic Functions (also known as Window Functions or Partitioning Functions) are part of ANSI SQL and widely supported by many dialects, including BigQuery, Microsoft, Oracle, and Snowflake.
An analytic function operates on a group of rows, returning a single result for each row. An aggregate function also operates on a group of rows, returning a single result for the group.
Basic Analytic Function
Here is an example of an analytic function, using BigQuery syntax.
Data Table Internally grouped by ID.
ID | X | Region | Country | M |
---|---|---|---|---|
1 | 10 | NA | USA | 1 |
1 | 20 | NA | USA | 10 |
2 | 30 | NA | USA | 100 |
2 | 40 | NA | USA | 1000 |
2 | 50 | NA | USA | 10000 |
3 | 60 | NA | USA | 100000 |
3 | 70 | NA | USA | 1000000 |
Query
select Region, sum(M) over (partition by ID) as S
Result
The sum aggregation is executed on the rows by ID. The number of rows in the result is the same as the data table.
Region | S |
---|---|
NA | 11 |
NA | 11 |
NA | 11100 |
NA | 11100 |
NA | 11100 |
NA | 1100000 |
NA | 1100000 |
Dynamic Column Specification
Using special constructs in bippLang enables you to dynamically specify the columns in the internal group by.
bippLang Table Definition
table T
column c1
sql ...
partition_by_including c2
column c2
column c3
sql ...
partition_by c4 c5
column c4
column c5
column c6
sql ...
partition_by_excluding c4
Column c1 and c6 have a dynamic list of columns for the partitioning:
-
c1: partition by c2 PLUS all other columns selected from table T
-
c6: partition by all columns selected from table T MINUS c4
Column c3 is the expected partitioning as illustrated in the Basic Analytic Function example, using an aggregate function in the sql expression:
... over (partition_by c4, c5) ...
Constraints
-
If a column has a partition specification, the sql should be an aggregate (measure), though the column is considered a non-aggregate (analytic) column.
-
Columns listed in a partition specification should be non-aggregates (dimensions or analytics).
-
The keyword partition_by can be followed by an empty list -- in which case all rows of the table form a single group in the internal GROUP BY. But dynamic partitions cannot have empty lists.
Dynamic User Selections
bippLang Table Definition
table t1
column c1
column c2
column c3
column m1
sql min(c3)
partition_by_including c2
Here is the implementation when a user selects c1 and sum(m1)
or a visualization:
select t1.c1 as c1, sum(t11.m1) as sum_m1
from t1
inner join (
select c1, c2, min(c3) as m1 from t1 group by c1, c2
) as t11
on t1.c1 = t11.c1 AND t1.c2 = t11.c2
group by c1
Using the inner join construct is advantageous over avoiding the join and using an outer select around the inner select:
-
at least as fast
-
works for partition_by_excluding and partition_by
-
generalizes naturally to simultaneous selection of multiple analytic functions
-
handles cases where multiple tables are joined in the original query
Recursive Analytic Functions
Analytic functions can be recursive, in other words, they can refer to other analytic functions.
bippLang Table Definition
table t1
column c1
column c2
column c3
column m1
sql min(c3)
partition_by_including c2
column m2
sql avg(m1)
partition_by_excluding c4
column c4
In this case, the user selects c1, c4 and sum(m2). The solution is to define a table corresponding to the selection of c1, c4 and m1. bippLang automatically creates the t1_c1_c4_m1 table for you when sum(m2) is selected.
Here is the pseudo-code for created table t1_c1_c4_m1 so you can see the underlying implementation logic:
table t1_c1_c4_m1
sql """select t1.c1 as c1, t1.c4 as c4, sum(t11.m1) as m1
from t1
inner join (
select c1, c2, c4, min(c3) as m1 from t1
group by c1, c2, c4
) as t11
on t1.c1 = t11.c1 AND t1.c2 = t11.c2 AND t1.c4=t11.c4
group by c1"""
column c1
column c4
column m1
column m2
sql avg(m1)
partition_by_excluding c4
Now select c1, c4 and sum(m2) from the new table using the same algorithm as the bippLang Dynamic User Selections example.
Level of Detail (LOD) Expressions
If you use Tableu’s LODs, you will find the bippLang semantics are different. LODs are non-SQL constructs that can behave differently from a mathematical/SQL perspective. bippLang handles LOD expressions completely in SQL. Tableau takes a meta-SQL approach that is not quite a SQL equivalent.
Here is an example using bippLang to perform the same tasks as an LOD in Tableau:
LOD_MIN_M ← { FIXED [ID] : MIN([M]) }
If you are using Tableau, you cannot select Region and LOD_MIN_M, as LODs must be aggregated. For this example, we are assuming that expression would work and return this result:
Region | LOD_MIN_M |
---|---|
NA | 1 |
NA | 1 |
NA | 100 |
NA | 100 |
NA | 100 |
NA | 100000 |
NA | 100000 |
If you select Region and SUM(LOD_MIN_M) in Tableau, the results might not match your expectations:
Region | SUM(LOD_MIN_M) mathematical | SUM(LOD_MIN_M) Tableau |
---|---|---|
NA | 200302 | 100101 |
To simulate this function in bippLang, define a count(*) column, with an identical partition type and partitioning columns:
bippLang Table Definition
table T
column ID
column X
column Region
column Country
column M
column A1
sql MIN(M)
partition_by T
column A2
sql COUNT(*)
partition_by T
Tableau’s behavior for selecting Region and sum(MIN(M over ID)) is simulated by:
select Region, sum(A1/A2) from T. You would truncate or round the resulting floating-point numbers as needed.