Oops! The input is malformed!
Originally published 29 July 2010
For many years now, there has grown an intellectual discipline around data models. There have been conversations about entities and definitions. There is conversation about cardinality and physical characteristics. There are discussions about granularity and keys. In a word, the world has grown up knowing a lot about the intellectual activity of data modeling. What has gone unstated is that data modeling applies to structured data. Without realizing it, the discipline of data modeling has been created for structured, repetitive data.
Structured data of course refers to the world of transaction processing. In transaction processing, the same type of data occurs over and over again. When a bank withdrawal is done, the information from one transaction to the next is repeated. Or at least the same type of information is repeated. Standard DBMSs are geared to handling the repetitive occurrence of data. That is why this kind of data is called structured data.
Now as the world recognizes that there is a wealth of information to be found in unstructured textual information, it is very natural to ask this question: Does data modeling apply to unstructured data?
The answer is that INDIRECTLY data modeling applies to unstructured data. But DIRECTLY data modeling does not apply to unstructured data. An explanation is in order.
Consider one very significant difference between structured and unstructured data. Structured data can be changed, but unstructured text cannot be changed. Suppose that when an analyst is building a data model in the structured world, he/she discovers that a piece of data is missing. The analyst has the power to go and insert the missing data into the system specifications, and the structured system then will include and handle the data.
But the analyst does not have the same power at all when it comes to unstructured data. When the analyst finds text that the analyst disagrees with, the analyst cannot go and change the text. In some cases, going in and changing the text is actually illegal. In other cases, it is merely unethical. So when an error is found in the unstructured textual environment, the analyst cannot go in and make a correction of the data.
The analyst uses taxonomies to organize and understand textual, unstructured data. A taxonomy is merely a large classification of data. A simple taxonomy might be:
SOURCE: Taxonomies and Data Models
Recent articles by Bill Inmon