In evaluating the basics of data validation, generalizations can be made regarding the different types of validation, according to the scope, complexity, and purpose of the various validation operations to be carried out.For example: Data type validation is customarily carried out on one or more simple data fields.In computer security, there are often known good data — data the developer is completely certain is safe.There are also known bad characters; data the developer is certain is unsafe (can cause Code injection etc.).Therefore, data validation should start with business process definition and set of business rules within this process.Rules can be collected through the requirements capture exercise.The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records).Some data cleansing solutions will clean data by cross checking with a validated data set.
Data validation is intended to provide certain well-defined guarantees for fitness, accuracy, and consistency for any of various kinds of user input into an application or automated system.Based on this, two different approaches to how data should be managed exists: tend to prefer Whitelists, because Blacklists may accidentally treat bad data as safe. But if implemented poorly, it can lead to a denial-of-service attack in which the attacker floods the system with unexpected input, forcing the system to expend scarce processing and communication resources on rejecting it. These languages throw compile time or run time exceptions whenever a variable derived from user input is used in a risky way, e.g. A strategy that is usually insufficient is to filter out known bads.However, in some cases a whitelist solution may not be easily implemented. If the characters in the set [:;.-/] are known to be bad, but ; ls -l / is received, the original input is replaced with ls l (;-/ are thrown away).Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.