viernes, 19 de agosto de 2016

SELECT INTO vs INSERT INTO on Columnstore

SELECT INTO vs INSERT INTO on Columnstore

By Ramya Makam, (first published: 2015/06/09)

Introduction

There were many enhancements in SQL Server 2014 and one amongst them is the fact that SELECT INTO now operates with parallelism. How does that help us if we need to use it on tables with clustered columnstore indexes? This article compares SELECT INTO and INSERT INTO under different scenarios, and the best approach preferred.

Explanation

I considered a table, test_source, containing 50 million rows for my tests. The space used by this table without any index on it is 23.8GB. A screenshot of the space used is shown below.

Scenario 1

Let us consider the following scenario where the source table, test_source, has a clustered columnstore index on it, and the destination table, test_dest, requires a clustered columnstore index to be created on it. This is shown in the table below.
Source table has cci index on it?
Destination table requires cci index on it?
yes
yes
First, I used “SELECT INTO” from test_source to test_dest. I then created a clustered columnstore index on the destination table. The observations are shown below. First is the output of the SELECT INTO statement:

Let’s see the space used by the table, “test_dest”.

The datafile has grown to 23.8 GB and log file has grown to 555MB.
I then created a clustered columnstore index on the destination table “test_dest” and observations are below.

Space used by the table test_dest is 3 GB

The datafile has grown to 26.8 GB and logfile has grown to 555 MB. The total time taken for the SELECT INTO + Create columnstore is 22 minutes.
Now let us consider the following scenario where we create the table first, then create clustered columnstore index and insert the data. The observations are shown below.

The space used by the table “test_dest” is shown below

The datafile has grown to 3.01 GB and log file has grown to 132 MB. The space used by the database now is 3.1 GB

The total time taken for creating the table, clustered columnstore index and inserting the data is 30 minutes. 
When comparing these two scenarios, we can easily notice that SELECT INTO is faster than INSERT INTO. However, SELECT INTO consumes more space. INSERT INTO is a little bit slower but doesn’t cause space issues, even when the datafile has less free space. Having free space of 3.01 GB is enough for the INSERT INTO operation whereas SELECT INTO requires 23.8 GB for the operation to complete.

Scenario 2

Let’s consider the following scenario where source table “test_source” has clustered columnstore index on it and destination table “test_dest” doesn’t require clustered columnstore index to be created on it.
source table has cci index on it ?
Destination table requires cci index on it ?
yes
no
First, I used SELECT INTO from source table to destination table and then created clustered columnstore index on the destination table. The observations are shown below.
This is the output of the SELECT INTO statement:

Let’s see the space used by the table, test_dest.

The datafile has grown to 23.8 GB and log file has grown to 555MB. The total time taken for SELECT INTO is 17 minutes.
Now let’s consider the following scenario where we create the table first and insert the data. The observations are shown below.

The space used by the table, test_dest, is shown below

The datafile has grown to 23.8 GB GB and log file has grown to 43.5 GB. The space used by the database is now 67.3 GB

The total time taken for creating the table and inserting the data is 25 minutes.
When comparing these two scenarios, we notice that SELECT INTO is faster and consumes less log size than inserting the data using INSERT INTO. SELECT INTO is faster because it operates in parallel and is a bulk operation from behind, whereas INSERT INTO is a single threaded operation and consumes more log space when destination table is a rowstore.

Scenario 3

Let us consider the following scenario where the source table doesn’t have a clustered columnstore index on it and the destination table requires a clustered columnstore index to be created on it.
source table has cci index on it ?
Destination table requires cci index on it ?
no
yes
First, I used SELECT INTO from the source table to the destination table and then created a clustered columnstore index on the destination table. The observations are mentioned below. This is the output of select into statement

Let’s see the space used by the table, test_dest.

The datafile has grown to 23.8 GB and log file has grown to 555MB.
I then created a clustered columnstore index on the destination table, and the observations are below.

Space used by the table test_dest is 3 GB.

The datafile has grown to 26.8 GB and the log file has grown to 555 MB. The total time taken for SELECT INTO + Create columnstore is 22 minutes.
Now let’s consider the following scenario where we create the table first, then create clustered columnstore index and insert the data. The observations are shown below.

The space used by the table, test_dest, is shown below.

The datafile has grown to 2.98 GB and log file has grown to 109 MB. The space used by the database is now 3.1 GB.

The total time taken for creating the table, creating the clustered columnstore index, and inserting the data is 28 minutes.
When comparing these two scenarios, we can easily notice that SELECT INTO is faster than INSERT INTO. However, SELECT INTO consumes more space. INSERT INTO is a little bit slower but doesn’t cause space issue even when the datafile has less free space. Free space of 3.01 GB is enough for the INSERT INTO operation whereas SELECT INTO requires 23.8 GB for the operation to complete.
Below is the comparison of all the scenarios discussed in this article.

Conclusion

We can use either of the approaches among “SELECT INTO” and “INSERT INTO” for performing a table copy. When there are no space issues and we need the operation to complete faster, SELECT INTO is preferred. When there are space issues, INSERT INTO is preferred because operation might be slow but definitely succeeds.
Hope my article was helpful to you.


Fuente: www.sqlservercentral.com  

No hay comentarios.:

Publicar un comentario

DB PostgreSQL