Clean up your data using SQL for effective analysis

Clean up your data using SQL for effective analysis

03 March 2022

Clean and reliable data is very essential for an effective analysis. There are a number of tools with which you can do that. SQL is one such tool with which you can clean your data. There are various functions in SQL with which you can carry that out.

LEFT

LEFT can be used to pull out a certain number of characters from a string and provide it as a separate string.

Syntax:

LEFT(string, number of characters)

Example:

SELECT employee_id,

       fullname,

       LEFT(joining_date, 10) AS cleaned_joining_date

FROM employees

The  above query will return the first ten characters from the strings as separate strings from the column full_name.

RIGHT

RIGHT performs a similar operation as LEFT but from the right side. It pulls a certain number of characters from the right side of the string.

Syntax:

RIGHT(string, number of characters)

Example:

SELECT purchase_id,

       RIGHT(purchase_date, 10) AS cleaned_purchase_time

FROM purchases

The  above query will return the last ten characters from the strings as separate strings from the column full_name.

TRIM

TRIM function in SQL is used to remove characters from the beginning and end of a string. It takes 3 arguments

Syntax:

TRIM([ LEADING | TRAILING | BOTH ] trim_character FROM source_string);

Examples:

If you want to remove the leading and trailing characters from a string:

SELECT store_address,

       TRIM(both '()' FROM store_address)

FROM stores;

The TRIM function takes three arguments. First and foremost, you have to specify whether you want to remove characters from the leading side, the trailing side, or both. Next you must specify that all characters to be trimmed. Any characters included in the single quotes will be removed from leading, trailing or both sides of the string. Finally, you must specify that the text you want to trim using FROM.

LENGTH

LENGTH helps in returning the length of the string. This means that it can be combined with LEFT and RIGHT while cleaning up the strings in our table

You can use length in the following way

SELECT purchase_id,

       date,

       LEFT(date, 10) AS purchased_date,

       RIGHT(date, LENGTH(date) - 11) AS purchased_time

FROM purchase;

SUBSTR

LEFT and RIGHT can create substrings of a string but only from the end or the beginning of a string. If you want to extract a substring from the middle of a string, you need to work with the SUBSTR function.

Syntax:

SUBSTR(*string*, *starting position of the character*, *number of characters*);

Example:

SELECT prchase_id,

       date,

       SUBSTR(date, 4, 2) AS day

FROM purchase;

CONCAT

CONCAT function comes in handy when you want to combine the strings from multiple columns.

You can separate these combined values with a character or symbol of your choice.

Syntax:

CONCAT(string_1, string_2, string3, ...);

Example:

SELECT customer_id,

       first_name,

       last_name

       CONCAT(first_name, '  ', last_name) AS full_name

FROM customers;

In the above example the first_name and the last_name are separated by a space.

UPPER and LOWER

The UPPER and LOWER functions can be used to change the strings into upper and lower cases.

Syntax:

UPPER(string);

LOWER(string);

Example:

SELECT shop_id,

       shop_address,

       UPPER(address) AS address_upper,

       LOWER(address) AS address_lower

FROM shops;

COALESCE

There are instances where there could be null value in your table and you’d like to add another value in their place. It can be done by using the COALESCE function. COALESCE does not change the values in the original table, it just returns a temporary view of the table with the values changed

Syntax:

COALESCE(column, value to be added)

Example:

SELECT 

    fname,

    COALESCE (address, 'Address unknown') AS address 

FROM employees;

search
Blog Categories
Request a quote