|
Search Method Three different types of searching can be performed. FileBoss supports smart white space matching where the white space (spaces and tabs) does not have to be exact for a match to occur. This allows the search pattern to specify one space or tab and FileBoss will still match even if the source text contains two or more consecutive spaces or tabs or a mixture of both. The type of search to be done is specified by choosing on of the following options: Exact Matches the search pattern exactly (no wild cards have special meaning) except for capitalization and white space. Capitalization is controlled by the 'Match capitalization' check box and all white space is considered to be the same, e.g. spaces, tabs and new lines are considered to be the same. In addition multiple spaces will match one space and vice-a-versa. Normal In normal searching, FileBoss recognizes two wild card characters: the question mark and the asterisk. The question mark (?) matches any one character. The asterisk (*) matches any number of any characters up until the next character in the search pattern. For instance, the search pattern F*ss, would match FileBoss To enter an asterisk or question mark as literals, i.e. without special meaning, precede them with a backslash.
See Unix Style Searches for a complete definition of FileBoss's Unix style search implementation. Match Capitalization Selecting this option tells FileBoss that searches should be case sensitive. Thus, if this option is selected and "Blue" is searched for, FileBoss will find "Blue" but not "blue" or "bLue." If this option is not selected, FileBoss will match any combination of capitals and lowercase letters, no matter what was used for the search string. Note that this option does not affect searches when Unix mode is turned on. Text to be Searched Enter the text you want to search. Include examples of words and phrases your search should find and some close variant that you search should not find. Format Pattern to Find Enter the pattern you want to search for. How the search is performed using this pattern is determined by the Search Method chosen at the top of this dialog. Using Regular Expressions Unix search formats can be used when creating Virtual Folders or searching for specific files within a Virtual Folder. Unix style search patterns allow specifying very precise and/or complex searches. The method of forming the search patterns in FileBoss is very similar to the standard notation defined for the Unix editor ed. A Unix search pattern is made up of one or more regular expressions (RE). An RE is a string that specifies a character or group of characters that should be matched. For instance, in the following search pattern, [A-Za-z]* 1994 there are six REs which make up the search string, they are: [A-Za-z]* a space the four digits 1, 9, 9 and 4. These, along with all the other forms of REs recognized by FileBoss are detailed below. For users who are already familiar with the use of REs in Unix, at the end of this section is a list of the differences between FileBoss's implementation of REs and the common implementations in the Unix utilities awk, ed, grep, lex and regex.
Simple Regular Expressions The simplest form of a RE is a single character or an escaped character (a character preceded by a backslash such as \.) which matches one character. There are three types of simple REs: Ordinary Characters A one-character RE that matches itself. The range of characters is 0-256. Period (.) A period is a one-character RE that matches any character. Backslash (\) A backslash followed by a special character is an RE that makes the special character into an ordinary character. Thus the RE \. (a backslash followed by a period) will match a period not a backslash and any character. Character Classes Character classes are REs which specify a range of characters which can be matched such as all capital letters or all lower case letters between 'a' and 'm'. Character classes are specified by a string enclosed in square brackets ([]). The characters which make up the enclosed string can specify ranges, single characters and more. Each of these is explained below. dash (-) The dash may be used to indicate a range of consecutive ASCII characters. For example, [0-9] is equivalent to [0123456789], [A-Z] is equivalent to all upper case letters and [A-Z0-9] is equivalent to all upper case letters and all digits. Note that the dash loses its special meaning whenever: it is the first character after the opening square bracket it occurs after the initial circumflex (^). it is the last character before the closing square bracket it is the first character after a character range. For instance the RE [0-9-A] would match any digit, a dash or the letter 'A'. circumflex (^) If the first character of the string is a circumflex ^, the RE matches any character except what the RE would otherwise match. The ^ has this special meaning only if it occurs first in the string. For example, [^0-9] would match any character which is not a digit. Note that the circumflex affects all the following characters within the square brackets. Thus [^A-Z0-9?] would match any character which is not a capital letter, not a digit and not the question mark character. Complex Regular Expressions Complex REs are collections of REs which can be treated as a whole. The most frequent complex REs use parentheses and the asterisk (*) or plus (+) characters to specify grouping and repetition matching. The following rules may be used to construct REs from other REs: () REs enclosed within parentheses are treated as a single RE for instance the RE (ab)* specifies one or more occurrences of 'ab' but with out the parentheses, i.e. ab*, would specify the letter 'a' followed by one or more occurrences of the letter 'b'. | REs separated by a vertical bar | form an RE that will be matched by strings in the text that match any of the REs that make up the complex RE. (as)|(ax)|(az) will be matched by either as, ax, or az. * An RE followed by an asterisk * matches zero or more occurrences of the RE. Note that the * will find the longest match. ab(ba)*cb Searches for all occurrences of 'ab' followed by zero or more occurrences of 'ba' followed by 'cb'. The patterns 'abbacb', 'abbabacb', 'abbabababacb', and 'abcb' would all be treated as matching this RE. ab(ba)* Searches for all occurrences of 'ab' followed by zero or more occurrences of ba. If more than one sequence of 'ba' follows an 'ab' in the text, the match will be made to the entire sequence. (ba)* This will always match the beginning of the string because it specifies zero or more occurrences of 'ba'. + An RE followed by a plus (+) is an RE that matches one or more occurrences of the RE. Note that the + will find the longest match. If you want to find the first match then use {1,} the {} notation is explained below. ab(ba)+ searches for all occurrences of 'ab' followed by one or more occurrences of 'ba'. If more than one sequence of 'ba' follows an 'ab' in the text, e.g., 'abbababa', the match will be made to the entire sequence. Note that the only difference between is the asterisk and the plus sign is that the asterisk matches 0 or more occurrences and the plus sign matched 1 or more. Positional Regular Expressions The positional RE is used to indicate where in a line of text a match must occur. It is indicated by angle brackets <> enclosing one or more numbers. Some examples follow: <0> is an RE that matches the null string at position 0, the beginning of the string. <0,5,10> is an RE that matches the null string at position 0, or the null string at position 5, or the null string at position 10. ~ End Of Line Specification: If the position is preceded by a tilde ~, then the position is measured from the end of the string. <~0> matches the null string at the end of the string. <~4> matches the null string at position 4 counting from the end of the string. - Range Specification: If two positions are separated by a dash (-), a range of positions is used. <0-5> matches any of the null strings at positions 0 through 5, <5-~5> matches any null string from position 5 counting from the beginning to position 5 counting from the end. In a range specification, the second position specified must not occur before the first position specified. <5-~5> will always fail to match in a string of 9 characters or less, since 5 positions from the beginning occurs after 5 positions from the end. <~0-~5> always fails. <~5-~0> is correct. Replication Counts An RE followed by {m}, {m,}, {,n} or {m,n} is an RE that matches a range of occurrences of the RE. The values of 'm' and 'n' must be non-negative integers. {m} indicates exactly 'm' occurrences of the RE. {m,n} If 'm' is LESS THAN 'n', then {m,n} indicates at least 'm' occurrences of the RE and no more than 'n' occurrences. In cases where the RE occurs more than the minimum number of times specified by 'm', the match will be made to the shortest sequence. {0,1} This specifies that the RE must occur 0 or 1 times. ab(ba){2,4} given the string 'abbababababa' then the match will be made to 'abbaba', i.e. 'ab' followed by two 'ba's. If 'm' is greater than or equal to 'n', then {m,n} indicates at least 'n' occurrences of the RE and no more than 'm' occurrences. In cases where the RE occurs more than the minimum number of times specified by 'n', the match will be made to the longest sequence up to and including the maximum number specified by 'm'. {1,0} This specifies that the RE will occur 1 or 0 times. ab(ba){4,2} given the string 'abbababababa' then the match will be made to 'abbabababa', i.e. 'ab' followed by four 'ba's {m,} is equivalent to {m,infinity} {,n} is equivalent to {infinity,n}. Note that the asterisk and the plus sign are equivalent to {0,} and {1,} respectively Assignments $ An RE followed by $c, where c is a letter, matches whatever the RE alone would match. (Upper and lower case are equivalent.) The expression <c>, where c is a letter, is an RE which matches whatever value is assigned to the character c. If no previous assignment has been made, then it matches the null string in any position. Precedence The suffix operators *, +, {}, have the highest precedence. Concatenation has next highest precedence. Alternation, |, has the lowest precedence. The order of operation may be modified by grouping with parentheses.
Differences Between Unix REGEX REs and FileBoss REs The syntax of REs in FileBoss is almost a superset of the REs used in the Unix utility REGEX. The additional features offered by FileBoss include the following: 1. Alternation (searching for either one string or a second or a third, etc.) is allowed. 2. Generalized positional checking is allowed (as opposed to only testing for the beginning and ending of the string). 3. Assignment to variables and referencing of variables can be used in the search expression itself. This allows for context searching. For example, [a-zA-Z]$a<a> will match doubled letters. 4. All operations may act on sub expressions by grouping. FileBoss Versus Other Programs Using Regular Expressions
NS = Not Supported 1) = Supports replication counts for all REs.
Ref: HIDD_FORMAT_MATCH_TEST | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Have a question about FileBoss or anything else? If this page didn't have what you are looking for or you want to know more just ask us. Please! |
|




