Tsql - CodeAnt AI

Jump statements, such as RETURN and CONTINUE let you change the default flow of program execution, but jump statements that direct the control flow to the original direction are just a waste of keystrokes.

`NOCOUNT is by default deactivated (OFF) at server level. It means by default, the server will send to the client the number of rows affected by the SQL query executed which is, in most cases, useless because no one will read this information.

Deactivating this feature will save some network traffic and improve the execution performance of stored procedures and triggers that’s why it is recommended to define SET NOCOUNT ON at the beginning of the definition of PROCEDUREs and TRIGGERs, before any query is processed.

This rule raises an issue when NOCOUNT is not set or is set to OFF between the beginning of the PROCEDURE (or TRIGGER) definition and the first statement that is not a SET, IF or DECLARE`.

`COALESCE and IIF (which evaluate to CASE expressions under the covers), as well as CASE input expressions should not be used with subqueries because the subquery will be evaluated once for each option in the expression, and each evaluation could return different results depending on the isolation level. To ensure consistent results, use the SNAPSHOT ISOLATION isolation level. To ensure consistent results and better performance, move the subquery out of the expression.

Note it is also an option to replace COALESCE with ISNULL`.

The complexity of an expression is defined by the number of AND and OR operators it contains.

A single expression’s complexity should not become too high to keep the code readable.

String data types (`char, varchar, nchar, nvarchar) default to a size of 1 if no size is specified in the declaration. For char and nchar this is confusing at best, but it is most probably a mistake for varchar and nvarchar.

This rule raises an issue when no size is specified for varchar or nvarchar`.

SQL Server can be tuned at `PROCEDURE and TRIGGER levels thanks to several SET statements that change the current session handling of specific information.

This rule raises an issue when expected configuration is not set or is set with an unexpected value between the beginning of the PROCEDURE (or TRIGGER) definition and the first statement that is not a SET, IF or DECLARE`.

A `WHILE loop with at most one iteration is equivalent to the use of an IF statement to conditionally execute one piece of code. No developer expects to find such usage of a loop statement. If the initial intention of the author was really to conditionally execute one piece of code, an IF statement should be used in place.

At worst that was not the initial intention of the author and so the body of the loop should be fixed to use the nested RETURN, BREAK or THROW` statements in a more appropriate way.

Theoretically, CASE, COALESCE, and IIF evaluate conditions sequentially, stopping with the first one that evaluates to TRUE. In reality, these expressions evaluate aggregate functions and non-native service calls such as CONTAIN and FREETEXT first, and then pass the results into the expression. That means that if you’re relying on short-circuit behavior to avoid runtime errors with these arguments, you will not get it. You can work around the issue by wrapping such calls in a subselect.

Deprecated system tables and views are those that have been retained temporarily for backward compatibility, but which will eventually be removed from the language. In effect, deprecation announces a grace period to allow the smooth transition from the old features to the new ones.

This rule raises an issue when system tables or views are used. Catalog tables and views should be used instead.

A CASE and a chain of IF/ELSE IF statements is evaluated from top to bottom. At most, only one branch will be executed: the first one with a condition that evaluates to true.

Therefore, duplicating a condition automatically leads to dead code. Usually, this is due to a copy/paste error. At best, it’s simply dead code and at worst, it’s a bug that is likely to induce further bugs as the code is maintained, and obviously it could lead to unexpected behavior.

Jump statements (BREAK, CONTINUE, RETURN, GOTO, and THROW), move control flow out of the current code block. So any statements that come after a jump are dead code.

Using `TOP in a SELECT without ordering the results from which the “top” results are chosen will return a seemingly random set of rows, and is surely a mistake.

The same random behavior also occurs when using TOP in a DELETE, INSERT, UPDATE and MERGE`.

This rule applies whenever an `IF statement is followed by one or more ELSE IF statements; the final ELSE IF should be followed by an ELSE statement.

The requirement for a final ELSE statement is defensive programming.

The ELSE statement should either take appropriate action or contain a suitable comment as to why no action is taken. This is consistent with the requirement to have a final ELSE clause in a CASE` statement.

When you add a new constraint to a table, (`ALTER TABLE … ADD CONSTRAINT …), WITH CHECK is assumed by default, and existing data are automatically validated.

But when you disable/enable an existing constraint, WITH NOCHECK is assumed by default, and existing data are no longer trusted. In this case you will face an integrity issue that prevents some rows from being updated, and a performance issue because the query optimizer cannot trust this constraint anymore.

Of course, WITH CHECK is obviously preferred, but if NOCHECK behavior is desired, it should not be selected by omission, but specified explicitly because WITH NOCHECK has such a significant impact. By making NOCHECK explicit, the developer documents that this behavior has been selected on purpose.

Note: You can list the existing constraints that are in an untrusted state using:

SELECT * FROM sys.foreign_keys WHERE is_not_trusted = 1;

SELECT * FROM sys.check_constraints WHERE is_not_trusted = 1;`

The repetition of a prefix operator ( +, -, ~, or NOT) is usually a typo. The second operator invalidates the first one:

Nested control flow statements IF…ELSE, WHILE and TRY…CATCH are often key ingredients in creating what’s known as “Spaghetti code”. This code smell can make your program difficult to understand and maintain.

When numerous control structures are placed inside one another, the code becomes a tangled, complex web. This significantly reduces the code’s readability and maintainability, and it also complicates the testing process.

A CATCH clause that only rethrows the caught exception has the same effect as omitting the CATCH altogether and letting it bubble up automatically.

While not technically incorrect, the omission of `BEGIN…END can be misleading and may lead to the introduction of errors during maintenance.

In the following example, the two statements seem to be attached to the IF statement, but only the first one is, and somethingElse` will always be executed:

In Transact-SQL, the semicolon statement terminator is in most cases optional. Therefore many developers don’t use semicolons. However, in some situations missing semicolons may yield insidious errors.

Semicolons are required by the ANSI standard, and Microsoft recommends the consistent usage of semicolons and might make semicolons mandatory in a future version of SQL Server. Also, semicolons make the code more portable.

Disabling “ANSI_WARNINGS” and/or “ARITHABORT” in a procedure may silence errors, decrease performance, or block index creation.

From the documentation (ARITHABORT, ANSI_WARNINGS), disabling these options could result in (among others):

`SET ANSI_WARNINGS OFF

CREATE, UPDATE, INSERT, and DELETE statements on tables with indexes on computed columns or indexed views will fail
No warning issued if null values appear in aggregate functions, such as SUM, AVG, MAX, MIN, STDEV, STDEVP, VAR, VARP, or COUNT
The divide-by-zero and arithmetic overflow errors cause null values to be returned, no roll-back

SET ARITHABORT OFF

It can negatively impact query optimization, leading to performance issues
An arithmetic, overflow, divide-by-zero, or domain error, during INSERT, UPDATE, or DELETE statement will cause SQL Server to insert or update a NULL value

This rule raises an issue when “ANSI_WARNINGS” and/or “ARITHABORT” are set to OFF` in a procedure.

Changing the configuration of database options ANSI_NULLS, ANSI_PADDING and CONCAT_NULL_YIELDS_NULL is deprecated. The future versions of SQL Server will only support the ON value, and the SET statement for those options to OFF will eventually generate an error.

Declaring multiple variable on one line is difficult to read.

Under the covers, Simple `CASE expressions are evaluated as searched CASE expressions. That is,

CASE @foo
WHEN 1 THEN ‘a’
WHEN 2 THEN ‘b’

is actually evaluated as

CASE
WHEN @foo = 1 THEN ‘a’
WHEN @foo = 2 THEN ‘b’

In most situations the difference is inconsequential, but when the input expression isn’t fixed, for instance if RAND() is involved, it is likely to yield unexpected results. For that reason, it is better to evaluate the input expression once, assign it to a variable, and use the variable as the CASE’s input expression.

This rule raises an issue when any of the following is used in a CASE input expression: RAND, NEWID, CRYPT_GEN_RANDOM`.

Functions or procedures with a long parameter list are difficult to use, as one must figure out the role of each parameter.

As soon as a WHEN clause contains too much logic this highly decreases the readability of the overall expression. In such case, the content of the WHEN clause may be extracted into a dedicated function.

Referencing a column by specifying the schema or the database is deprecated. It is retained temporarily for backward compatibility, but it will eventually be removed from the language. You should only use one part (column_name) or two part (table_name.column_name) references.

A FETCH statement fails when the number of variables does not match the number of columns selected in the CURSOR definition.

Having all branches of a CASE, IF or IIF chain with the same implementation indicates a problem.

In the following code:

@@IDENTITY returns the last identity column value created on a connection, regardless of the scope. That means it could return the last identity value you produced, or it could return a value generated by a user defined function or trigger, possibly one fired because of your insert. In order to access the last identity value created in your scope, use SCOPE_IDENTITY() instead.

In a Zen-like manner, “NULL” is never equal to anything, even itself. Therefore comparisons using equality operators will always return `False, even when the value actually IS NULL.

For that reason, comparison operators should never be used to make comparisons with NULL; IS NULL and IS NOT NULL` should be used instead.

The requirement for a final ELSE clause is defensive programming. The clause should either take appropriate action, or contain a suitable comment as to why no action is taken.

When the line immediately after a conditional has neither an enclosing BEGIN…END block nor indentation, the intent of the code is unclear and perhaps not what is executed. Additionally, such code is confusing to maintainers.

There are valid cases for passing a variable multiple times into the same function or procedure call, but usually doing so is a mistake, and something else was intended for one of the arguments.

It is needlessly complex to invert the result of a boolean comparison. The opposite comparison should be made instead.

Putting multiple statements on a single line lowers the code readability and makes debugging the code more complex.

Unresolved directive in <stdin> - include::{noncompliant}[]

Write one statement per line to improve readability.

Unresolved directive in <stdin> - include::{compliant}[]

When a SQL query is joining n tables (with n>=2), it is expected to have join conditions defined to determine on which columns these n tables should be joined. At minimum, for n joined tables, the SQL query should contain (n-1) join conditions involving all the joined table to avoid a full cartesian product between the rows of the n tables.

Not doing so will imply that too many rows will be returned. If this is not the case and unless this has been done on purpose, the SQL query should be reviewed and missing conditions should be added or useless tables should be removed from the SQL query.

This rule is raising no issue when the SQL query is involving CROSS JOIN, NATURAL JOIN statements.

Nested code - blocks of code inside blocks of code - is eventually necessary, but increases complexity. This is why keeping the code as flat as possible, by avoiding unnecessary nesting, is considered a good practice.

Merging if statements when possible will decrease the nesting of the code and improve its readability.

Each source file should start with a header stating file ownership and the license which must be used to distribute the application.

This rule must be fed with the header text that is expected at the beginning of every file.

The use of parentheses, even those not required to enforce a desired order of operations, can clarify the intent behind a piece of code. However, redundant pairs of parentheses could be misleading and should be removed.

UPDATE and DELETE statements should contain WHERE clauses to keep the modification of records under control. Otherwise unexpected data loss could result.

A typical code smell known as unused function parameters refers to parameters declared in a function but not used anywhere within the function’s body. While this might seem harmless at first glance, it can lead to confusion and potential errors in your code. Disregarding the values passed to such parameters, the function’s behavior will be the same, but the programmer’s intention won’t be clearly expressed anymore. Therefore, removing function parameters that are not being utilized is considered best practice.

There is no reason to re-assign a variable to itself. Either this statement is redundant and should be removed, or the re-assignment is a mistake and some other value or variable was intended for the assignment instead.

Executing code dynamically is security-sensitive. It has led in the past to the following vulnerabilities:

CVE-2017-9807
CVE-2017-9802

Some APIs enable the execution of dynamic code by providing it as strings at runtime. These APIs might be useful in some very specific meta-programming use-cases. However most of the time their use is frowned upon because they also increase the risk of maliciously Injected Code. Such attacks can either run on the server or in the client (example: XSS attack) and have a huge impact on an application’s security.

This rule marks for review each occurrence of such dynamic code execution. This rule does not detect code injections. It only highlights the use of APIs which should be used sparingly and very carefully.

If a label is declared but not used in the program, it can be considered as dead code and should therefore be removed.

This will improve maintainability as developers will not wonder what this label is used for.

Nested ternaries are hard to read and can make the order of operations complex to understand.

Unresolved directive in <stdin> - include::{noncompliant}[]

Instead, use another line to express the nested operation in a separate statement.

Unresolved directive in <stdin> - include::{compliant}[]

Duplicated string literals make the process of refactoring complex and error-prone, as any change would need to be propagated on all occurrences.

SELECT * should be avoided because it releases control of the returned columns and could therefore lead to errors and potentially to performance issues.

Marking a parameter for output means that callers will expect its value to be updated with a result from the execution of the procedure. Failing to update the parameter before the procedure returns is surely an error.

The MD5 algorithm and its successor, SHA-1, are no longer considered secure, because it is too easy to create hash collisions with them. That is, it takes too little computational effort to come up with a different input that produces the same MD5 or SHA-1 hash, and using the new, same-hash value gives an attacker the same access as if he had the originally-hashed value. This applies as well to the other Message-Digest algorithms: MD2, MD4, MD6, HAVAL-128, HMAC-MD5, DSA (which uses SHA-1), RIPEMD, RIPEMD-128, RIPEMD-160, HMACRIPEMD160.

Consider using safer alternatives, such as SHA-256, SHA-512 or SHA-3.

Shared coding conventions allow teams to collaborate efficiently. This rule checks that all function names match a provided regular expression.

Hardcoding IP addresses is security-sensitive. It has led in the past to the following vulnerabilities:

CVE-2006-5901
CVE-2005-3725

Today’s services have an ever-changing architecture due to their scaling and redundancy needs. It is a mistake to think that a service will always have the same IP address. When it does change, the hardcoded IP will have to be modified too. This will have an impact on the product development, delivery, and deployment:

The developers will have to do a rapid fix every time this happens, instead of having an operation team change a configuration file.
It misleads to use the same address in every environment (dev, sys, qa, prod).

Last but not least it has an effect on application security. Attackers might be able to decompile the code and thereby discover a potentially sensitive address. They can perform a Denial of Service attack on the service, try to get access to the system, or try to spoof the IP address to bypass security checks. Such attacks can always be possible, but in the case of a hardcoded IP address solving the issue will take more time, which will increase an attack’s impact.

Developers often use TODO tags to mark areas in the code where additional work or improvements are needed but are not implemented immediately. However, these TODO tags sometimes get overlooked or forgotten, leading to incomplete or unfinished code. This rule aims to identify and address unattended TODO tags to ensure a clean and maintainable codebase. This description explores why this is a problem and how it can be fixed to improve the overall code quality.

Tables without primary keys are largely unusable in a relational database because they cannot be joined to. A primary key should be specified at table creation to guarantee that all its records have primary key values.

Cryptographic hash algorithms such as MD2, MD4, MD5, MD6, HAVAL-128, HMAC-MD5, DSA (which uses SHA-1), RIPEMD, RIPEMD-128, RIPEMD-160, HMACRIPEMD160 and SHA-1 are no longer considered secure, because it is possible to have collisions (little computational effort is enough to find two or more different inputs that produce the same hash).

Having inconsistent indentation and omitting curly braces from a control structure, such as an if statement or for loop, is misleading and can induce bugs.

This rule raises an issue when the indentation of the lines after a control structure indicates an intent to include those lines in the block, but the omission of curly braces means the lines will be unconditionally executed once.

The following patterns are recognized:

Unresolved directive in <stdin> - include::{example}[]

Note that this rule considers tab characters to be equivalent to 1 space. When mixing spaces and tabs, a code may look fine in one editor but be confusing in another configured differently.

An unused local variable is a variable that has been declared but is not used anywhere in the block of code where it is defined. It is dead code, contributing to unnecessary complexity and leading to confusion when reading the code. Therefore, it should be removed from your code to maintain clarity and efficiency.

A naming convention in software development is a set of guidelines for naming code elements like variables, functions, and classes. {identifier_capital_plural} hold the meaning of the written code. Their names should be meaningful and follow a consistent and easily recognizable pattern. Adhering to a consistent naming convention helps to make the code more readable and understandable, which makes it easier to maintain and debug. It also ensures consistency in the code, especially when multiple developers are working on the same project.

This rule checks that {identifier} names match a provided regular expression.

Having two branches in an IF/ELSE IF chain with the same implementation is at best duplicate code, and at worst a coding error.

Get Started