Monday, November 3, 2014

Assignment beats SELECT INTO

While working on some customer code, I noticed that they have a lot of code that reads like this:
SELECT a,b,c
INTO foo.x, foo,y, foo.z;
I wondered why they were doing it that way, and if it might be easier to read if it was just:
foo := (a,b,c);
Now, these aren't quite the same, especially if foo has more than three fields. But even that could be got around.

But before I tried this out I decided to see how they performed. Here's what happened:
andrew=# do $x$ 
declare 
   r abc; 
begin 
   for i in 1 .. 10000000 
   loop 
      select 'a','b',i into r.x,r.y,r.z; 
   end loop; 
end; 
$x$;
DO
Time: 63731.434 ms
andrew=# do $x$ 
declare 
   r abc; 
begin 
   for i in 1 .. 10000000 
   loop 
      r := ('a','b',i); 
   end loop; 
end; 
$x$;
DO
Time: 18744.151 ms
That's a very big difference! Direct assignment takes less than 30% of the time that SELECT INTO takes.

I'm going to dig into why this happens, but meanwhile, I have quite a lot of low hanging performance fruit to pick as a result of this.

2 comments:

  1. I check you reproducing the same relative speeds (approx. 3x), though my laptop seems to be twice as fast as yours :-) I was at first suspicious of the test constants, since in SELECT context those tend to stay 'unknown' as long as possible, I wasn't sure if they're resolv3ed earlier in the pgsql assignment. However, recasting the 10 million loop as
    three nested loops i,j,k of 1000, 100, and 100, and making the assignment then k,j,i yields exactly the same result.

    ReplyDelete
  2. You're context-switching from PL/PgSQL->SQL>PL/PgSQL every time you do a SELECT INTO, so you're paying for query overhead of the parse once, and the switches on every iteration, and not paying for either with direct assignment. Doing SELECT INTO to fetch data that isn't actually in the database is a waste.

    ReplyDelete