MARC Expression Language (MEL)
MARC Expression Language (MEL) is a small boolean predicate DSL used to evaluate boolean predicates over MARC 21 records for searching, faceting, grouping, and display logic.
The following topic details the programmatic considerations for MEL, which can be used to generate custom TOMs from scratch. The Type of Material (TOM) Rules topic provides the MEL for our default TOMs and some example custom TOMs. It may be easier to copy and modify the predicate text from existing TOMs, than to write new predicate text manually.
Conventions
-
Terminal strings are quoted; regex literals are delimited with slashes (e.g. /pattern/flags).
-
Whitespace between tokens is insignificant: Any amount of whitespace (spaces, tabs, newlines, etc.) may appear between tokens. Multiple whitespace characters are treated the same as a single space. Leading and trailing whitespace in an expression is ignored.
-
TAG alone is shorthand for TAG[*] (any occurrence).
-
The underscore character {_} may be used as a placeholder inside a quantified WHERE clause to refer to "the current occurrence" being tested (for example ANY 007 WHERE ({_} matches /.../)). Use of _ outside a quantified WHERE is a syntax error.
Case Sensitivity
All MEL keywords and operators are parsed case-insensitively, meaning you may write them in any mix of upper, lower, or mixed case when authoring expressions. However, the grammar defines canonical forms for documentation and display purposes:
-
Quantifier keywords (
ANY, ALL, WHERE, COUNT) are canonically rendered in UPPERCASE. -
Boolean operators (
not, and, or) are canonically rendered in lowercase. -
Comparison operators (
=, ==, in, cin, matches) are canonically rendered in lowercase for textual operators. -
Relational operators (
>, <, >=, <=, !=) are symbols and have no case variations.
Examples showing equivalent expressions:
-
ANY 007 WHERE (007/00 = 'v')(canonical) -
any 007 where (007/00 = 'v')(also valid) -
LDR/06 = 'a' and LDR/07 = 'b'(canonical) -
LDR/06 = 'a' AND LDR/07 = 'b'(also valid)
Grammar (BNF)
BNF
<Expr> ::= <OrExpr>
<OrExpr> ::= <AndExpr> ( "or" <AndExpr> )*
<AndExpr> ::= <NotExpr> ( "and" <NotExpr> )*
<NotExpr> ::= "not" <Primary> | <Primary>
<Primary> ::= "(" <Expr> ")" | <Comparison> | <Quantified> | <CountExpr>
<FieldRef> ::= <LeaderRef> | <ControlRef> | <DataRef>
<PlaceholderRef> ::= "_"
<FieldRefOrPlaceholder> ::= <FieldRef> | <PlaceholderRef>
<LeaderRef> ::= "LDR" "/" <TwoDigit>
<ControlRef> ::= <ControlTag> ( "[" <IndexOrStar> "]" )? ( "/" <PosRange> )?
<ControlTag> ::= "00" Digit
<ThreeDigitTag> ::= Digit Digit Digit
<IndexOrStar> ::= "*" | Number
<TwoDigit> ::= Digit Digit
<PosRange> ::= Number | Number "-" Number
<DataRef> ::= <DataTag> "$" <SubfieldCode>
<DataTag> ::= <ThreeDigitTag> ; where the first two digits are not "00"
<SubfieldCode> ::= <Letter> | <Digit>
<Letter> ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<Comparison> ::= <FieldRefOrPlaceholder> <CompOp> <Value>
| <FieldRefOrPlaceholder> "matches" <RegexLiteral>
<CompOp> ::= "=" | "==" | "in" | "cin"
<Value> ::= <QuotedString> | <CharList>
<QuotedString>::= "'" { any-char-except-' } "'"
<CharList> ::= "[" "'" Char "'" ( "," "'" Char "'" )* "]"
<Quantified> ::= ("ANY" | "ALL") <ControlRefWithoutIndex> ( "WHERE" <Comparison> )?
<ControlRefWithoutIndex> ::= <ControlTag> ( "[" "*" "]" )? ( "/" <PosRange> )?
<CountExpr> ::= "COUNT" <ControlRefWithoutIndex> <RelOp> Number
<RelOp> ::= ">" | "<" | ">=" | "<=" | "=" | "!="
<RegexLiteral>::= "/" { any-not-/ } "/" [ "i" ]
Digit ::= "0" | "1" | ... | "9"
Number ::= Digit { Digit }
Char ::= any single character
When parsing MEL expressions, invalid tags may result in specific exceptions:
-
Invalid control-field tags (for example, tags that do not match <ControlTag> ::= "00" Digit) may cause the parser to throw Polaris.Marc.Runtime.Mel.Exceptions.MelInvalidControlTagException (error code InvalidControlTag).
-
Invalid data-field tags (for example, data tags that start with "00" or are otherwise malformed) may cause the parser to throw Polaris.Marc.Runtime.Mel.Exceptions.MelInvalidDataTagException (error code InvalidDataTag).
Other syntax issues continue to produce a System.FormatException or Polaris.Marc.Runtime.Mel.Exceptions.MelInvalidTagException with a descriptive message.
Data field references (DataRef)
The grammar includes a <DataRef> production to reference MARC data fields and their subfields. A data field reference is written as a three-digit tag immediately followed by a dollar sign ($) and a subfield code. The subfield code may be a lowercase letter (a z) or a digit (0 9).
Data field references behave the same as other FieldRef productions: they may be used in comparisons (=, ==, in, cin), regex matches expressions, quantified ANY/ALL WHERE clauses, and in COUNT expressions where appropriate. Case-sensitivity and membership semantics described below apply to comparisons against data field values.
Examples:
-
753$a = 'wii' -
753$a matches /.wii./i
Comparison semantics (case)
-
By default, equality and membership comparisons are case-insensitive. Implementations SHOULD perform Unicode case-folding when evaluating = and in to preserve existing behaviour.
-
==is a case-sensitive equality operator (compare raw characters exactly, no case folding). -
cinis the case-sensitive membership operator (case-sensitive counterpart to in).
Examples
-
LDR/06 = 'a' #case-insensitive (matches 'A' or 'a') -
LDR/06 == 'A' #case-sensitive (matches only uppercase 'A') -
007/00 in ['s','t'] #membership, case-insensitive -
007/00 cin ['S','T']#membership, case-sensitive
Semantics (short)
-
LDR/06 = 'a'checks the single leader position value (case-insensitive). -
007[*]/00 = 'v'is true if any 007 occurrence has char at pos 0 equal to 'v'. -
ANY 007 WHERE 007/00 = 'v'is equivalent to 007[*]/00 = 'v' (quantifier is optional; 007 alone implies any occurrence). -
ALL 008 WHERE 008/21 = 'a'requires every 008 occurrence to have char at pos 21 equal 'a'. -
COUNT 007 > 0is true when at least one 007 occurrence exists. -
007 matches /vd.v/iruns the regex against each 007 occurrence's full text (ANY implied if no index). -
007 matches /vd.v/iruns the regex against each 007 occurrence's full text (ANY implied if no index). -
ANY 007 WHERE (_ matches /^cr/i)is shorthand for ANY 007 WHERE (007 matches /^cr/i); the _ placeholder refers to the current occurrence being tested and allows more concise WHERE clauses.