Database normalization with Framework IX

Database normalization with Framework IX

Framework IX adds three new parameters to the @dbasefilter() FRED function. Those parameters add to Framework two new methods for instant access to very large disk based databases.


@dbasefilter("c:\database directory\my_database.dbf",          ; 1st.  the disk-based database file path name
             @and(DATE >= {2012/08/26}, DATE <= {2012/09/29}), ; 2nd.  a FRED logical expression (filter) (DATE is a field name)
             1,                                                ; 3rd.  starting record number to evaluate and load
             1,                                                ; 4rt.  ending record number to evaluate and load
             40,                                               ; 5th.  max number of records to load at a time for browsing in table view
             #FALSE,                                           ; 6th.  show record number in table view in read only mode in first col. 
             #FALSE,                                           ; 7th.  show delete flag in table view as 2nd col.
             ,                                                 ; 8th.  Optional MDX/NDX index expression. Not used with the new parameters
             1,                                                ; 9th.  File access mode. 1 write, 0 - read only.
             ,                                                 ; 10th. Optional big data storage expression. Not used with the 12th parameter
             ,                                                 ; 11h.  Optional big data expression. Not used with the 12th parameter
             "AND,FIRST_NAME,'JOHN',LAST_NAME,'SMITH'"         ; 12th. On the fly scan index for instant access pre-filter (case insensitive) 
                                                               ;         FIRST_NAME and LAST_NAME are field names
            )                                                  ; end of @dbasefilter

The first method controlled by the first two new parameters, the 10th and 11th, is used for handling "big data" instant interactive processing and complex encoding such as DNA analisys with specialized hardware, custom software and consulting and is not the subject of this article.

The second method which works with existing .DBF files is one of the subjects this article covers. It uses the new 12th parameter to provide virtually instant access to very large disk-based databases without the need to maintain indexing, alleviating the need for MDX and NDX index files and an 8th parameter index expression. It does it by creating "on-the-fly" custom memory-based indexes for fields specified in the 12th parameter logical expression so a FRED expression in the 2nd parameter can evaluate only those records that conform to the 12th parameter logical expression.

@dbasefilter 12th parameter consists of a single string expression that provide instant access to dBase files (.DBF) by preloading specified database fields to memory and scanning them for specified data in order to create a subset of the records which the logical FRED expression in the 2nd parameters will then evaluate to determine which of the records will be loaded to the database frame and which records will be discarded. It saves time since FRED evaluate only the data of interest. It completely eliminate the need for NDX or MDX indexes so the 8th parameter index expression can be left empty. Preloading the specified fields to memory is just as fast or faster than indexing and does not required the updating of index file when the database is written to.

"AND,FIRST_NAME,'JOHN',LAST_NAME,'SMITH'"

How to use this new capability? The 12th parameter is a string enclosed in quotes that start with one of two possible keywords: OR or AND. It continues with an unlimited number of pairs, each consist of a field name and the data to look for in it. The referenced fields are loaded to memory and scanned for the specified data, taking into account the OR or AND keyword. Only records that contain the specified data and satisfy the OR or AND condition are made available to 2nd parameter FRED expression for evaluation. Since the default returned value of the 2nd parameter FRED expression is TRUE leaving the 2nd parameter empty will load all the records that satisfy the 12th parameter conditions.

In many cases a 12th parameter can replace the 2nd parameter FRED expression but when a more complex filter require additional evaluations such as date or time periods, or the referencing of data outside the database the 2nd parameter FRED expression can be used to evaluate those records that satisfied the 12th parameter conditions. The 2nd parameter FRED expression may of course reference all the database fields including fields that are not specified in the 12th parameter as well as frames, spreadsheets and other databases, even other disk based databases. It can perform all FRED operations regardless if the evaluated record will end up loaded and appended to the database frame or discarded. The 2nd parameter FRED expression is a program that can be as long as any FRED program and can call other FRED programs. Its final returned value (its result) must be a logical value which is used to determine if the evaluated record will be loaded or discarded. A returned value of #TRUE keeps the record in the database after it was evaluated and #FALSE discards it.

Suppose we want to load a single record that contain the name John Smith in a disk based database (a .DBF file) that have the fields FIRST_NAME and LAST_NAME in it. A 12th parameter will look like that:



     "AND, FIRST_NAME,'JOHN', LAST_NAME,'SMITH'"

A synthetic example of it will look like:

     "CONDITION, FIELD-NAME,'DATA'[, FIELD-NAME, 'DATA' [, '', , '', ...]]"

      \ fields / \     1st      /  \   optional 2nd  /  \   optional    /
      condition   \ field-data /    \   field-data  /    \  field-data /
     (conjunction) \   pair   /      \     pair    /      \    pairs  /
      (OR/AND)
      
The fields condition keyword and first field-data pair are required.
The fields condition apply to all fields. Any number of additional 
field-data pairs are allowed. 
Field names and data are case insensitive.

Note the use of single quotes around data but not around field names. Also note that the data is case insensitive. The expression consists of a single keyword followed by two pairs of field name and data.

Supposed we want to load the same set of records as above but only if they have a date field named DATE containing a date between August 26, 2012 and September 15, 2012. We keep the same above expression in the 12th parameter and put this in the 2nd parameter FRED expression:

@and(DATE >= {2012/08/26}, DATE <= {2012/09/29})

The 2nd parameter FRED expression will evaluate only the records that satisfies the 12th parameter conditions. If it will return FALSE the evaluated record will be discarded. If TRUE the record will be appended to the database frame. Since the 2nd parameter default returned value is TRUE, with no 2nd parameter all the records that satisfies the 12th parameter conditions will be appended to the database frame.

Note that if the number of records that should be loaded is larger than the specified database frame height (the 5th parameter) Framework make it possible to browse through those records by pressing (or preforming using a FRED function or a macro) PgDn on the bottom record or PgUp on top record in the database frame.

If you suspect that the number of record that the 2nd parameter FRED expression will be evaluating is quite high, say for example that a few hundred records belong to John Smith, you can optimize the filter by adding more condition to the 12th parameter. e.g.

"AND,FIRST_NAME,'JOHN',LAST_NAME,'SMITH',DATE,'2012'"

so only to 2012 records will be evaluated, or even better:

"AND,FIRST_NAME,'JOHN',LAST_NAME,'SMITH',DATE,'201208',DATE,'201209'"

This will make sure that FRED will only evaluate the records with August 2012 or September 2012 in the DATE field. Depending on a case by case sometimes it is possible make the 2nd parameter FRED expression unnecessary.

The 2nd parameter FRED expression may include references to any fields in the database, including of course fields that are not referenced in the 12th parameter:

@and(DATE >= {2012/08/26}, DATE <= {2012/09/29},PRODUCT,"FWSDK", SERIAL_NO,"48731")

Note that the 12th parameter does not "evaluate" the fields by value, instead it scans the field content ASCII characters which is faster. If we were to use the 12th parameter to look for product and serial_no the syntax we use would be:

"AND,FIRST_NAME,'JOHN',LAST_NAME,'SMITH',DATE,'201208',DATE,'201209' PRODUCT,'FWSDK4', SERIAL_NO = '48731;"

The single quotes are a new addition to the FRED language syntax which depending on specific functions may signify literal strings that will not be evaluated. In regular strings single quotes are considered a normal text characters but in specific database functions such as SQL interface functions in FW10 single quotes may be used to prevent malicious code injection etc.

And now to the main course: Database normalization. As you are about to see it can be used to provide a considerable beneficial capability, normalization is achieved when database fields and tables that contain data of say, a company, a project or say, an accounting system are structured in related tables linked by unique keys, and do not contain redundant information. Why that is good? Because it prevents functional errors. No, it does not help to prevent errors, it is actually eliminate the possibility of the most common errors in data management such as double billing, duplicate payments, multiple shippings and the loss of records. How is it possible that just following a particular database structure with no additional code can do that? -- read on to find out.

Consider a database system that keeps records of Customers, invoices, and shipping bills. To achieve normalization we store the data in three corresponding tables with no data redundancies. It means that the customer's names are not stored in the invoice and the shipping bill tables. Instead a unique customer ID is given to each customer when a customer record is created. The unique ID number is recorded in the customer table (a separate .DBF database file in this case). It is also entered to the invoices tables to indicate to which customer that invoice record was issued. When we display an invoice we in fact build the invoice form from fields from two separate files, customer DBF file and the invoices DBF file. We can have as many invoices as we need per a single customer or none at all but a customer name is only entered once regardless of how many invoices were created for it.

The same type of relation: "a one to many relation" also describe the relation between the invoice table (the INVOICES.DBF file) and the Shipping Bills tables (the SHIPPING.DBF file). We can have many shipping bills per single invoice. The invoice information is not recorded in the shipping bill database. Instead a unique invoice number that assigned and recorded when a new invoice is created is used to associated a single invoice with possibly many shipping bills.

The unique customer and invoice number are referred to in database terminology as "Keys" but a more appropriate name for them would have been simply "unique id's". As a rule they should appear in any printed or electronic reference to the data as a "customer number", an "invoice number", and a "shipping bill number" to allow the unmistakable identification of stored data and allow or disallow related actions. Code that performs specific actions such as payments and deliveries should check the tables to see if they should be allowed or refused. How well modern software is following that rule is quite questionable.

As I am writing these lines I am looking a check issued by a major university as a payment for one of our products. The check amount and our invoice number, printed on its attached page, are the same as a those associates with a wire transfer which arrived at our bank a few weeks ago. It is a duplicate payment. It was probably issued automatically after the product arrival was reported and the arrival date was entered into the system, even though a wire transfer for the same invoice was already sent. Normalization can effectively be used to prevent the possibility of such operational errors.

Now to the actual implementation of a form that present data from, and allow the updating multiple DBF files. The function @execute can be used to insert and execute formulas in frame as well as in linked databases. We will need to prepare a frame containing the three linked databases with FRED program, possibly in the container or in some frames inside it, that may call a few @inputline-based functions such @inputdate or @inputnumber to gather information from a user, verify it, and build the linked database formulas and execute them.

The customer ID, an invoice number, or a shipping bill number for example can be used to build the appropriate formulas and @execute then linked the databases. Additional code should then be used to verify the related data and allow or refuse related operations. If needed while interacting with a customer, the 12th parameter makes it easy to find a customer number by entering the whole or a part of the customer name. There are no practical limits on the number of linked databases nor on how many linked databases can linked to a single DBF file or on the fields that can be in a linked database frames. The first field name in a linked frame that matched an existing field in the linked DBF file is used as the data source. To save time however ask us for sample programs and do not hesitate to ask for suggestions and for solutions to your specific needs.