Modeling the Business World
Using Procedural/Relational
Updated 1/8/2003
The 3-Graph Business Model
I will discuss how I tend to model and design
business applications from a procedural/relational
standpoint. This is not the only way to
do it, but it has worked well for me and others.
I am not claiming to have invented this process, for
variations of it have existed for decades.
In fact, one can compare it to the buy-low-sell-high (BLSH) approach
to the stock market. Methodologies-Of-The-Day come and go, yet
BLSH continues to be the best strategy despite all the fads and
hypesters that come and go. (Warren Buffet uses a variation of BLSH).
Software engineering schools of thought often battle
over whether to model using data or behavior.
My response is to use both! The more perspectives
you model something from, the more insight you will have.
Although data and behavior are often the primary aspects,
there may be other domain-specific aspects which should
be looked at.
Both the data and behavioral analysis can be conducted
in parallel. Which best comes first is often moot because
one often has limited access to the sources
of domain-specific information,
such as users and managers. Thus, both
data and behavioral questions should be asked when
these people are available for interviews because as
a developer you have very limited chances at second
tries. (Follow-up questions are best kept narrow
in scope and specific.) There is a point at which
the timing may split, as described later.
For the data side, I begin by collecting a list of all
potential pieces of information that a project may
involve. Don't over-concern yourself with grouping them
at first; simply write them all down knowing that they will
be sifted later.
The behavioral side usually involves studying inputs,
outputs, and expectations from the users' perspectives. What
deliverables do they want?
Users familiar with computer usage can often
describe preliminary user interfaces. Often
looking at existing paper
forms gives some ideas about what the user is familiar
with. (An example system is also a very good source
of information about needs). If you have time, go
through a typical, and also
oddball steps, orders, and/or processes with them. Ask
permission to observe them for a while.
Managers often most care about reports, since this
is what they see most often out of computers.
(This is changing as
more managers use interactive systems themselves, but printed
reports, or at least static reports, are still commonplace.)
After the initial interviews one can often create
some draft screens and reports to show to the
target users. If you are unable to create
computer versions of the screens, then give
them paper versions.
To prepare the UI drafts, you may have to do your
first significant behavioral analysis in the
form of "screen transitions". There are at
least two ways to do this. The first is to
create transition diagrams that are a kind of a
half UI draft and half flow-chart.
Often these are too crowded too read easily.
Another approach is have one screen sample
per page, with the "transition list" at
the bottom, under the example screen
image. An example transition list may
resemble:
If 'Delete' is pressed, then go-to page 52.
If a price is double-clicked, then go-to page 39.
If the item description is double-clicked, then
go-to page 23.
Note that the page numbers are arbitrary, and
the phrase "go-to" does not necessarily imply that
GOTO statements are needed. See
Goto's and flow
for more about this.
Regardless of the approach, the result is roughly
the same: a graph (network) of screens or tasks.
(If there is no screen involved, such as with
batch jobs, then we will call this unit a "task".)
This step is one place where behavioral analysis proceeds
ahead of data analysis. This is because it is best
to get as much feedback from users as early as possible.
Since users are usually not directly involved in the
data layouts, priority should be given to the UI
(task-orientation) at this point.
Another helpful tool is to collect "scenarios" that
represent typical and tricky transactions or behavior
in the business process.
Scenarios help the analyst and the customer
agree on how things take place or should take place
in the new system by looking at concrete situations.
They both solidify existing mutual
assumptions and may expose differences in others that need to be
reviewed. Scenarios may resemble, "Customer Bob has
two discount coupons. One is on item A, below, that he has ordered,
and the other is a general discount....."
The scenarios can also be used for initial
testing later on.
Let them mull over the samples rather than take just
their initial impressions. (Initial responses are
still important, but not the whole picture.)
They may have more to
say after they compare the samples to actual work they
do throughout the day. Often one doesn't think about
an issue until they actually encounter it.
Quantities of Relationships
While the users are reviewing the UI samples, you
can start to plan the data layouts. (It would be
best to wait for all the results of the UI samples
to come back in, but there may not be time for that.)
Deciding on quantities of relationships (QOR)
is one of the
trickiest aspects to this step of modeling. Is there
potentially multiple X's per every Y? Is the occurrences
of data item A related to data item B? (In other-words,
is the quantity of A somehow related to the quantity of B?)
QOR is probably
the primary determinant of how data items are grouped
into tables.
One of the messiest issues in QOR is the "fewness problem".
An example is multiple ship-to destinations for orders.
If there is only one ship-to per order, then we
can simply put the ship-to information in the Order
table. However, if we want potential multiple destinations
per order, then do we have a fixed number of maximum
destination slots
(or references) in our order table, or make
the number open-ended?
Idealistic software engineering tends dictate to
make the number open-ended. However, this can often
complicate the code, the tables, performance, and the user
interface. That last one is often ignored, but
I find it to be true in general. The extra complexity
will slow down maintenance for the life of the
project. Thus, the costs are not just up front.
If we determine that only 1 out of every 1000
orders ship to more than 2 destinations, then
does it make sense to complicate the system
to handle rare cases? Or, just make it
a business rule that more than 2 destinations must be
split into multiple orders, and simply have 2 ship-to
addresses in the Orders table? (See
discussion on
total cost of ownership versus Meyer's continuity
principle.)
Regardless of your opinion on this controversy,
it is something to keep in mind. Perhaps bring
the tradeoffs up to the client and have them
make the final call.
Analyzing QOR should eventually result in
groupings of data items (fields) into
potential tables. Generally there will be
one table per "noun" in the model. The nouns
will be things like customers, vendors (suppliers),
partners, employees, orders, order items,
supplier proposals, etc. If your draft
tables don't correspond to "clean nouns",
then it is not a reason for alarm, only
a suggestion to review something more
closely. Things like invoices and time-sheets
may have a corresponding "detail" table
for line-items.
(See the IS-A
time-sheet example for more on this.)
Most of the tables should have one
unique ID column to give each occurrence
a unique key that can be referenced
by other tables. For example, each
customer should have a Customer_ID,
each order have an Order_ID, etc.
I find
it safer to put unique ID's in just about every
table because sometimes the UI needs references
to potentially every record, and single ID's
are easier to deal with than combination
keys. An exception may be made if it causes
performance difficulties. Most commercial RDBMS have some
approach to generate unique ID's for every
record. However, their approaches and philosophies
differ greatly.
I have seen very complicated business-to-business
commerce systems where
orders were split into supplier shipment requests
that had no definite QOR to destinations.
Some orders were first shipped to the central
distribution center, re-grouped and then shipped
to the customer, and others were shipped
directly to the customer by the suppliers (whole-sellers).
Thus, in some cases data from multiple orders
were combined, and in other cases single orders were split
into separate orders to various suppliers. (It was often
the case that an order involved several different
suppliers.) Combine this with the fact that
every customer had a potentially different price
based on contract terms, marketing programs
and quantity discounts and with the fact
that there were tens of thousands of different
products in the catalog. That system required
rocket science and/or tons of patience to
make work.
Sometimes too many levels or relationships
between entities results in poor
application performance. This sometimes
dictates changes in data normalization to improve
key performance areas. A good book on relational
design will usually cover this in more
detail.
Many relational training materials promote the splitting up of
tables more than I often prefer. Often one splits tables or
parts of tables because they spot a grouping of related fields.
However, the relationships between such fields can be temporary
and/or fuzzy over time. For example, somebody once suggested
splitting parts of an Employee table into 3 "sub-entities"
consisting of Hourly, Salaried, and Commissioned employee
salary calculation information. However, these are not
necessarily mutually-exclusive. Plus, often multiple
attributes affect a given calculation such that lumping
an attribute into one of the 3 categories may get
sticky. Further, labor laws may change.
See sub-type criticism for related
information. My rule of thumb is, when in doubt, don't
split a table. Another way of saying this is to
try to avoid one-to-zero-or-one
and one-to-one relationships.
Relating Data and Behavior, Or Not
At this stage what we then have is two big
graphs. (Although "graph" is a potentially
misleading term, the alternatives, "web" or
"network", are equally confusing.)
One graph is the screen/task transition
diagram, and the other is our Entity-Relation (ER)
diagram (tables and their potential relations).
Object oriented modeling tends to try to
do two things to our structures. First,
it will try to find hierarchical taxonomies
in our structures. Second, it will try to
combine the data with behavior. Both of these
don't work very well in my opinion. You can
read about the problems with hierarchical
taxonomies in
hierarchies (intro),
IS-A problems,
and
subtype proliferation myth.
In the second OO habit, every operation must
belong to at least one class. The class often
chosen is an entity-based class such as
Customer, Vendor, etc. The problem is that
such an association is often arbitrary. Either
the "noun" of association is fairly likely
to change over time (a dynamic relationship),
or there are multiple candidates for
entity association. You can read more about
this in the
aspects write-up.
Sometimes associating an action or screen with
an entity is indeed helpful. However, not
enough to hardwire the concept into the language,
as OOP attempts to do. Just because an association
is convenient sometimes, does not mean it is
convenient always, or even most of the time.
The relation between any of such sets can be fleeting,
and/or one-to-many. Thus, the one-to-one OOP
encapsulation is not sufficient without making some messy compromises, areas
of confusion, and/or constant code rework ("refactoring").
If OOP found a way to place an operation
in a class, but move it to another class
or make it independent when things changed
without having to change all references
to it, then I might not mind optional
classes. However, I have not seen
a satisfactory solution to this
continuity problems
The actual relationship between nouns (ER) and verbs (tasks) in
our model is yet another graph. ("Noun" and "verb" are
a not to be taken too literally here. They are mostly
used as memory aides.)
Thus, our conceptual model is actually 3 graphs:
the screen/action graph, the ER (data) graph,
and the relationships between the nodes of
each. The third graph will be called the "join graph".
The real business world is a graph. It is not a tree,
it is not a star, but grand graph.
Trees of the same kinds of things are modeled just fine as
a table with a reference to a parent record. Usually
the users maintain these trees and not the programmer.
The programmer only needs to build the UI for table
management by users.
Business examples include product categories,
chain-of-command hierarchies, region-based
rollup hierarchies, and accounting classification codes.
However, sometimes even these are not pure trees. For
example, "lab coat" could be under both "clothing" and
"medical supplies". Dealing with these can add a lot
of complexity to a system.
Since pure graphs are tough to work with, some sort
of unit management framework needs to be build to
simplify our model. For the sake of discussion,
a "unit" will be something that has a reference-able
name. For example, subroutines, modules, classes, methods,
tables, and fields are named units. (GOTO labels are
another, but since nobody is defending them these
days, I will ignore them for now).
In our model here, the basic named units are
tables, fields, and subroutines (which include
functions).
Software engineering is the art of building
systems that are "easy to maintain." Two primary
ways to do this is minimize global side-effects
of changes and reducing the quantity of units
that have to be changed for any given change.
If we assume graph structures, this is tough to
do. Anything can potentially relate to anything
else. However, we must force some structure on
our system, or else we will go insane. (Or, at
least never finish anything.)
We already have our data model with tables, fields,
and keys; and we have our screen/task diagram.
Now all we need to do is relate the two and
produce code.
I often do this with a "task unit". It is
roughly similar to a "module" of some languages.
A common unit of business is "tasks". "I want
the computer to do X", it may be stated in a meeting.
I find tasks a much more usable unit than nouns,
often favored by object-oriented design fans.
Tasks often involve multiple things (nouns) temporarily
coming together to perform the task, not just one.
This is true with business tasks and it rings true
for computer tasks also. Getting the job done is
usually more important than who does it.
The 3-graph model minimizes the point of "participant
changes" at the "join" stage. This is usually done
early in the module. An SQL join serves as the
liaison between the task and the outside world.
You can view it as sort of a glorified black box,
with an extra "gray box" layer to serve as
a transition between the dark innards of the black
box and the outside world. It is sort of the
"customer service representative" of the module.
The liaison section serves two basic purposes.
First, it buffers the rest of the module from
schema changes or changes in data sources;
and second, it facilitates the ability of the
module coder/designer to focus just on the task
by itself. You can liken it to the "givens" in
geometric proofs. You don't have to ask where
the givens came from; you can just focus on using
the givens to get a specific task done.
This philosophy is sort of the flip-side of the "nouns
handling themselves" OOP viewpoint. Both the above
"task encapsulation" and OO noun-centric encapsulation strive
for similar goals. However, I find task-centricity more
stable and more natural a fit to business
practices and changes than noun-centricity.
You can find some examples of OO difficulties in this
regard in the invoice example
of the famed Aspects write-up.
It is not always possible to perform all the
joins or lookups up front. Sometimes there are several
steps involved, and the joins needed for
a later stage are not known up front.
Such cases may suggest that a split in
the task may be needed.
An SQL join "flattens" the information, hiding
the source data relationships, and data sources.
Thus the relationships
and even the data schemas can change without impacting most
of the task itself. (Columns can be renamed and even
calculated via a Select
statement.) This allows the task to remain
more oblivious to the outside world. Here is
an example.
*** VARIATION 1 ***
Table: Account (table layout)
--------------
AcctID
FirstNameMI
LastName
hasChecking - Boolean (bit)
Minimum
Balance
etc.
// code example
rs = DB("SELECT * FROM Account WHERE AcctID = " & acctNum)
....
DisplayPair("Checking:", rs.hasChecking, "y") // format as YES/NO
DisplayPair("Minimum:", rs.Minimum, "$") // format as money
*** VARIATION 2 *** (change)
Table: Account
--------------
AcctID
PlanRef - foreign key to PlanID
FirstNameMI
LastName
Balance
etc.
Table: Plan
-----------
PlanID
hasChecking
Minimum
etc.
rs = DB("SELECT * FROM Account, Plan WHERE _
PlanID = PlanRef AND AcctID = " & acctNum)
....
DisplayPair("Checking:", rs.hasChecking, "y")
DisplayPair("Minimum:", rs.Minimum, "$")
This example is based on the Banking Example
page. In the first variation, account attributes are stored
with the account detail. In the second variation, the account
attributes were moved a separate table. (I am not necessarily
endorsing this change, only saying that it could happen.)
Under this schema change, only the SQL statement need be
changed. Most or all of the logic below remains the
same. Some OOP approaches often need body changes such as:
DisplayPair("Checking:", account.hasChecking, "y")
DisplayPair("Minimum:", account.Minimum, "$")
TO
DisplayPair("Checking:", account.plan.hasChecking, "y")
DisplayPair("Minimum:", account.plan.Minimum, "$")
We kept the function call syntax for the OO version to
make comparing easier, but the actual result would probably
more resemble: account.plan.hasChecking.displayPair(....).
Here, the body code is coupled to the "path" or
source of the information. Sure, OOP can also be
made to flatten or hide the source information,
but this makes it less "OO", and often requires
changing of database "wrappers". In other words,
this model can be implemented in OOP, but it
will not add much too it, and often detracts
from it by requiring more non-contributing
syntax, such as "self" specifiers, etc.
(See also automating joins.)
Somebody pointed out than any other task that uses the Account
table might have to be changed if "hasChecking" is moved,
and that an Account class would not have that problem (at
least for read-only operations. The class should not
be able to change that attribute anymore). However, we could use a
view in place of the Account table to keep the same
column in the results without changing the queries; but a
better solution is a view
column instead of view tables to supply a
"virtual column". Unfortunately, most
existing RDBMS don't support view columns for some
odd reason. Perhaps because they can be partially emulated with
triggers.
Common Behavior
Common behavior among tasks can be shared instead of replicated
across each task. I refer to this as
repetition factoring.
The main purpose is to reduce the number of different
places that have to be changed. It can also simplify the code.
The easiest approach involves making subroutine "utilities"
that can be shared among multiple tasks. Good systems
often have a bunch of these that form a kind of
application-specific sub-language.
Actually, we don't need to show the shared nodes
outside of the Task Plane, since they can be
linked and still be considered in the plane.
They are shown isolated here simply to highlight
the concept. However, perhaps there is a difference
between stand-alone tasks and "utility tasks". More
research is needed on this.
However, sometimes
more complicated "frameworks" need to be made. A framework
may involve multiple tables and require
a lot of skill and practice to
do well. My favorite type of framework
is using a data dictionary table to store
screen and report information and is used to
generate screens and reports. I have
also seen several systems that stored menus and application navigation
information in tables, even in AS/400 RPG applications.
(I didn't write the RPG one.)
Why it Works
Overall, the 3-graph model works fairly well because
- Provides a semi-standardized way to "buffer"
schema changes from implementation.
- Task grouping can allow programmer
to focus on one task at a time no
matter how large the application gets.
- Task-oriented thinking maps well to
the business domain and change requests.
- Does not tie dispatching and structure to
fragile hierarchical or
mutually-exclusiveness-based taxonomies.
- Tends to isolate views and modeling to
local, as-needed formulas and patterns
instead of global, up-front modeling attempts.
In other words, the "big picture" is isolated
to only what needs to be big.
I do not claim that it is ideal for all circumstances
and projects, but variations of it seem to work well
for the majority of business projects I have worked
on.
An Alternative View of the Same
Our model is generally based on the "traditional"
input-process-output model. In the old days, most
"processes" were "batch jobs". One would gather
the necessary punched cards and/or tape reels
needed by a given process (previously-written and blank) to
serve as the input and output of the process,
get everything ready, and then run the process.
Modern systems have replaced cards and tape with
database management systems and monitors
(screens) for the most part. It does not matter
how large the system or database is, the size
of the tasks (processes) remains relatively the
same in small to large systems. This is roughly analogous
to cells in biology. The cells in an elephant
are roughly the same size as the cells in an
ant. The database is roughly analogous to
the blood flow and nervous system.
Generally one should strive to make tasks as
independent as practically possible. In the
pre-GUI days, "menu tasks" were used to
allow menus to dispatch tasks. Modern
techniques include IDE-generated code that
serves as dispatching
modules, and
event tables.
Criticisms of the 3-Graph Model
Too Many Joins?
One criticism targeted at the above 3-graph model
is that the larger the application gets, the more
joins (relations) are needed.
I am not fully sure I agree with this. The number
of joins depends more on the complexity of the
task than of the entire application.
Even so, many joins is sometimes a sign of poor
relational design. If you are bogged down by
too many joins, here are some things to
look into:
- Review your relational design. See if
there are tables with nearly 1-to-1 relationships
that can be combined into one table. Some designers
have a habit of making too many smaller tables.
I am not sure of the psychology/justification behind
such a design philosophy.
- If some information is only needed by
a small percentage of records, then perhaps
this information can be looked up
"as needed" rather than included with the
primary join.
- If there are many read-only operations
on the same or similar joined sets, then you may want to
think about creating a "pre-joined" copy of the
record-set. Perhaps a periodic (timed) process can
create a flattened table for use with such
a task. Many RDBMS come with tools to create
periodic processes for such.
- Is all of the information joined in really
necessary? Sometimes the need for some information
in the primary join is not really a priority and
may be obtained another way, such as a "drill-down"
operation. Although I don't like
to suggest changing the requirements to fit the
application, some requests simply have a
high cost-to-benefit ratio.
Most problems with "fat joins" that I have seen
are due to bad design, relational or
application-wise. However, performance issues
related to cross-referencing information
are not unique to the procedural/relational
paradigm.
Focus On What, Not Who
Lubricating your dispatching to make software change-friendly
OOP's focus on self-handling nouns
is problematic in business programming because which noun(s) affects
something is usually quite dynamic. The "features", or actions of a
business program are relatively stable. A given feature will probably
stay in the software for its life. What will change more often
will be the conditions that trigger a given feature. Thus, in
procedural/relational design you don't wrap the code structure around
who or what triggers the feature. This is instead delegated to Boolean
expressions, either in IF statements, or in SQL queries or table filters.
Thus, changes in who or what is involved in the trigger will often not
require the physical moving and restructuring of code. The philosophy
is to change the links (references) and not move the "nodes" of the
logic network.
We have seen some rough examples of this in the
Aspects article.
The basic pattern you want to
achieve resembles:
sub taskMain(aParams)
sql = [sql expression involving aParams]
rs = openRS(sql, #conn std) // open a result set
while getNext(rs)
ProcessRecord(rs) // loop thru records
end while
close(rs)
end sub
'-------------------
sub ProcessRecord(rs)
if [condition_1] then
[feature_1]
end if
if [condition_2] then
[feature_2]
end if
if [condition_3] then
[feature_3]
end if
if [condition_4] then
[feature_4]
end if
if [condition_5] then
[feature_5]
end if
...etc...
end sub
Sometimes if features are mutually exclusive (this OR that),
then you might have Else, ElseIf, or Case (switch) statements.
OO proponents suggest you
turn these into polymorphic dispatching. However, if they
cease being mutually exclusive, then the OOP approach
requires more rework to put it back (move
them out of separate subclasses). The p/r approach
requires much less code changing if mutual-exclusiveness
disappears. You just simply turn the ElseIf statements or
case statements into self-standing IF statements
Sometimes you may need to nest the IF statements,
especially when multiple aspects intersect each
other. Some claim that this eventually leads to
messy code (and that OO allegedly fixes it).
Intersecting aspects (dimensions) are not really clean in any
paradigm (known so far).
See the "Structural" document referenced below
for a discussion and tips to reduce
run-away nesting.
The general rule of thumb is try not to make the
position of code depend on the trigger/dispatching
criteria. This allows p/r code to be quite change-friendly.
Some have suggested that this approach resembles rule-based
Expert Systems, where each rule is generally independent. However,
each rule is *not* fully independent here. The sequence is
clearly defined based on position, and often one defines "helper"
variables or routines
near the top or bottom of the routine(s) or module to simplify the syntax
or logic for frequently-repeated statements or patterns.
In essence you create a little sub-language that is specific to the
task at hand.
Free-floating rules cannot easily take advantage of
such context-based helpers.
A variation of the rule is, avoid making the code
structure depend on relationships. Relationships
between entities and tasks change frequently in typical
custom business applications. Try to farm off
relationship management into relational tables, or
at least query expressions. They are easier and less
disruptive to the code structure to manage that way.
It can be called "relations via formula" (instead of
physical code structure) in some ways.
Even the transitions
(relationship) from task to task can be stored
in menu tables and/or GUI tables.
Here is another example of relationship-proofing:
sub calcItemPrice(item)
var result
result = item.listPrice // list price field
// apply a discount
if item.CustomerLevel > 5 then
result = result * (1 - item.discount / 100)
end if
....
return result
end sub
This is an example of price calculations.
This version uses a customer "level" code
to affect the discount. (We are assuming
that "item" is part of a joined recordset,
and perhaps below a traversal loop resembling
the one in the prior example.)
From personal experience I can say that
the "formula(s)" for discount calculations
vary widely over time. The marketing department
is always dreaming up different promotional deals.
Thus, we likely have to change the IF statement,
and possibly the "item.discount" portion,
fairly frequently. However, our "code structure"
stays more or less the same. If we instead
put the original discount calculation with a Customer
class/object, then we might have to move the
discount application code in and
out and all about as different entities affect
the discount. (Over time the OO designer will
probably make Discount a self-standing class,
resembling more or less a procedural
subroutine or module.)
In real systems often multiple
entities will affect price calculations.
Association with a particular noun or entity
would become a schizophrenic endeavor.
We only have to use "and" or "or" clauses
to get multiple influencers.
Thus, we focus on what gets done,
not who affects it, when designing our
code.
Sequence Considerations
Sequence is also part of
out setup. Overall, the code structure
reflects 1) tasks, 2) feature selections, and
3) sequence. If the physical code structure involves
something other than these 3, then take a closer
look.
If by chance the sequence is
dynamic, then perhaps one should try to control
it via a table(s) with a Priority or
Rank field(s). Or, if the sequence only involves
a few sequences, then perhaps something like this:
....
if x.discountFirst
result = result * (1 - calcDiscount(x))
result = result * (1 - calcSale(x))
else
result = result * (1 - calcSale(x))
result = result * (1 - calcDiscount(x))
end if
....
(Some languages allow "*=" instead of
repeating "result". There may also be
ways to factor the "1 - X" pattern
to avoid repeating it over and over.)
Inheritance or Something Else?
There are two examples that frequently come
up as alleged examples of inheritance in
business modeling. One is "types" of
customers or assets, and the other is related to
contact information, such as addresses.
Types of customers includes
business, private, and non-profit.
Types of assets include real estate
land versus structures.
Let's look at the customer types
first. In OO thinking there would
probably be a Customer class which
the three subtypes (Business, Private,
and Non-profit classes) would inherit
from.
There is something odd about this
arrangement that one does not immediately
see. The shared portion seems to be
much smaller than the portion that is
different between them. The only shared parts
seem to be description, costs, revenue,
and contact information. (We will
deal with contact information later.)
We basically have 3 sets in the parts
that are the same: description,
financial summaries, and contact info.
This at first seems like enough to
justify inheritance. But, it
is not near enough fields to be slam dunk
victory for inheritance.
Thinking deeper about this, there
are other things which may also
have similar information: fixed assets
employees, patents, etc. In other words,
regular accounting stuff. Many business
things play an accounting "role".
This may suggest a
role pattern (p/r or OO),
however, roles are sometimes considered to
have the potential of 1-to-many relationships.
We generally don't want that in accounting
because there is a risk of double-counting.
However, keeping the accounting information
in a separate table(s) allows the accounting
system to be swapped with another without
affecting the current non-accounting
systems. Swappability often has the
trade-off of some duplication of information,
such as the description. (Or, a fat interface/translator
to hide differences.)
Actually, sometimes one department will
want a different description anyhow
for a different item. What something is
called internally may be slightly different from
what the accounting department wants to
call it. I don't mean to imply "cooked books";
It is just that accounting terms are sometimes
less meaningful to non-accounting people.
In short, one description often does not fit all.
Contact Information
Some OO fans suggest using inheritance
to give everything with contact information
(address) a shared structure.
Although I agree with a contact shared structure
(unless it has a significant performance
impact), I am not sure inheritance is the
best way to get it. For one, it may require
multiple inheritance if the business entities
need to share other structures. It seems
that "HAS-A" contact is better modeling that
"IS-A" contact.
It would be nice if more RDBMS had a
"structure inheritance" feature. That way
a new element can be added to all entities
with contact information. For example, when
fax machines became popular, many screens
had to be changed to have a slot for fax
number.
However, the need for this is relatively
small. New contact fields come along maybe
once every 5 years or so. (Include an
"other contact" field.) Thus is hardly
a reason to complicate a RDBMS with yet
more newfangled knobs to twiddle. OO fans
seem to have a habit of justifying OO features
using scenarios that are frankly not that
common in business modeling.
In my opinion, there are usually so many other
issues caused by years of less-than-perfect planning that
talk about auto-propagation of a
new fax field is like worrying
about scratched paint on a car with engine problems.
Auto-propagation or schema sharing could indeed
be added to RDMBS's, but I doubt most RDBMS technicians
would be impressed from a practical perspective.
(See feature ranking
for more about this.)
Further, any "grouped change" technology often
runs into messy boundary
issues, where the boundaries of difference(s)
may not be one-to-one with the original grouping.
For example, a new address element may come along
that we want added to some addresses but not
all. The differences may not fit the current
sub-classing pattern.
Slot Happy
Some designers tend to get "slot happy" with
such fields. For example, they will break the
phone number into portions on the screen in
an attempt to force a structure on the input.
I never was a fan of this. First, it tends to
clutter up the structure with sub-pieces. Second,
it makes data entry harder in many cases (depending
on UI framework).
Third, if the structure changes,
it is often more work to redo or rid the sub-pieces.
Sometimes a warning that
the entered item is suspicious is sufficient. Making
it a warning instead of a requirement allows
new or unanticipated formats to still be entered.
I have seen one system that gave odd errors if
somebody put information such as "ask for Bob"
in the "phone extension" field. Under the "warning
approach" (above), the data entry person may simply
get a warning, but the system would not crash nor
prevent them from continuing on with text. Warnings
are often a good choice over outright error stoppages
when you cannot fully predict future changes.
Also, it is best to have an "other contact info"
field as a catch-all for new devices, URL's,
etc., that may come along in the future.
|
One alleged justification for inheriting
contact information is that all addresses
can be search easily regardless of which
entity the belong to.
class Contact {...} // parent
class Customer inherit Contact {...}
class Vendor inherit Contact {...}
class Employee inherit Contact {...}
....
class findContacts(criteria) {
contacts[] array of Contact
for c = each(contacts) { // loop thru each contact
if c.query(criteria) then {
print "Found one in " & c.description
}
} // end-for
} // end-class
Here, inheritance allegedly makes it easier to
find an occurrence of a contact element because
it "automatically" references all contacts.
If one uses a reference (HAS-A) instead of
inheritance, then one may have to use a lot
of backtracking or a case statement or a big
SQL UNION query to obtain such results.
Again, a similar pattern appears:
I have not encountered a great need
for querying diverse things for similar elements.
If OO inheritance makes such a task easier, then
it is hardly on the top 50 list of issues that
I wish software engineering and programming languages
would improve. Like I said in the
OOSC2 review, a tool
that helps remove a knee stuck in one's
ear may be impressive. However, if the need
for such a tool happens once in a blue moon,
then acquiring such a tool may not be worth
it. (Rugby players excepted.)
Further, it all may be language and vendor
specific. Under both approaches it is a
matter of a link between contact data and the
structural users of the contact data.
If a tool can fairly easily find all the addresses that
match and then show all referrers to those
matches, then we have achieved our goal.
If the RDBMS has a complete set of ER links
stored, then it may have all the information
it needs to automate such queries, or at least
provide the programmer with a complete list
of tables that reference the Contact table.
The key issue is the direction of the links. There are usually tradeoffs. Links going from A-to-B give advantage to some things and disadvantages to others in comparison to B-to-A links. We saw similar issues in the
p/r role pattern. It may be possible to have links go both ways, but is often not economical to manage two-way links. But, I see nothing in either paradigm that prevents two-way links if the language or database implementor really wants them.
It is true that many RDBMS designs embed the contact
fields within the entity table(s) instead of
referencing a separate Contact table. I believe
the main reason for this is to reduce the
number of joins. This is not only for performance
reasons, but to simplify the code also, since
there is one less join to worry about. (Some
ideas for sharing schemas without doing actual
joins was discussed above.) Arguments can be
made for both approaches.
OOP Criticism
|