Skip to content

How to Optimize SQL SELECT Statement When Retrieving Data from Invoice History

Release Date:11/29/2017

Q - I used the following SQL statement to retrieve invoice history data for a certain item within a certain date range.  It took forever to run.  Please help.

SELECT CPINVLIN.INV_ITM_ITM_NO, CPINVLIN.INV_ITM_INV_DATE, CPINVLIN.INV_ITM_DESC_1,
CPINVLIN.INV_ITM_REASON_CODE, CPINVLIN.INV_ITM_DESC_2, CPINVLIN.INV_ITM_QTY_ORDER,
CPINVLIN.INV_ITM_TOT_QTY_SHP, CPINVLIN.INV_ITM_UNIT_PRICE, CPINVLIN.INV_ITM_UNIT_COST,
(INV_ITM_UNIT_PRICE*INV_ITM_QTY_ORDER) AS ExtPrice, CPINVLIN.INV_ITM_INV_NO,
CPINVLIN.INV_ITM_PROD_CATE, CPINVHDR.INV_TYPE, CPINVLIN.INV_ITM_CUST_NO,
CPINVHDR.INV_SHIP_TO_NO, CPINVHDR.INV_SHIP_TO_NAME, CPINVHDR.INV_SHIP_TO_ADDR_1,
CPINVHDR.INV_SHIP_TO_CITY, CPINVHDR.INV_SHIP_TO_ST, CPINVHDR.INV_SHIP_TO_ZIPCD,
CPINVHDR.INV_SHIP_TO_COUNTRY
FROM CPINVHDR INNER JOIN CPINVLIN ON CPINVHDR.INV_NO = CPINVLIN.INV_ITM_INV_NO
WHERE CPINVLIN.INV_ITM_ITM_NO='RU1022-22'
AND CPINVLIN.INV_ITM_INV_DATE>=20160101
And CPINVLIN.INV_ITM_INV_DATE<=20171130;

A - I used the same SELECT statement against my test database where CPINVHDR.BTR is 750MB and CPINVLIN.BTR is 540MB. These two tables initially are not cached in the server memory, so it took 15 minutes before it returned the data. Then I tried it second time. Since the data was already cached in server memory, this time around it took 26 seconds. Even though the second time is significantly faster, I still don’t like the fact that it took 26 seconds. So I tried to see what I could change to make it faster. The following revised SQL SELECT statement took only 2 seconds:

SELECT CPINVLIN.INV_ITM_ITM_NO, CPINVLIN.INV_ITM_INV_DATE, CPINVLIN.INV_ITM_DESC_1,
CPINVLIN.INV_ITM_REASON_CODE, CPINVLIN.INV_ITM_DESC_2, CPINVLIN.INV_ITM_QTY_ORDER,
CPINVLIN.INV_ITM_TOT_QTY_SHP, CPINVLIN.INV_ITM_UNIT_PRICE, CPINVLIN.INV_ITM_UNIT_COST,
(INV_ITM_UNIT_PRICE*INV_ITM_QTY_ORDER) AS ExtPrice, CPINVLIN.INV_ITM_INV_NO,
CPINVLIN.INV_ITM_PROD_CATE, CPINVHDR.INV_TYPE, CPINVLIN.INV_ITM_CUST_NO,
CPINVHDR.INV_SHIP_TO_NO, CPINVHDR.INV_SHIP_TO_NAME, CPINVHDR.INV_SHIP_TO_ADDR_1,
CPINVHDR.INV_SHIP_TO_CITY, CPINVHDR.INV_SHIP_TO_ST, CPINVHDR.INV_SHIP_TO_ZIPCD,
CPINVHDR.INV_SHIP_TO_COUNTRY
FROM CPINVLIN, CPINVHDR 
WHERE CPINVLIN.INV_ITM_ITM_NO='RU1022-22'
AND CPINVLIN.INV_ITM_INV_DATE>=20160101
And CPINVLIN.INV_ITM_INV_DATE<=20171130
AND CPINVHDR.INV_NO = CPINVLIN.INV_ITM_INV_NO;

So why is the second SQL statement significantly faster? Essentially, I took the inner join out of the “FROM” clause. Instead, I implied the inner join in the last WHERE condition. Why does this help? This is because in the second SELECT statement the WHERE clause is constructed so that PSQL will use the first condition to narrow down CPINVLIN records to that item number = ‘RU1022-22’ first. INV_ITM_ITM_NO is a key field of the CPINVLIN table. So this is done quickly and produced a small data set. Then it further narrowed down the number of records with the line item invoice date range. Finally, with a small data set, PSQL performed the join from CPINVLIN to CPINVHDR in the last WHERE condition.

On the other hand, the first SQL statement caused PSQL to perform an inner join of the entire two database tables CPINVHDR and CPINVLIN first. Both tables are big, so this is why it was slow. Finally, when these two tables were joined, the system narrowed down the records with the WHERE condition. This is why the first statement was so slow.

Unfortunately, PSQL does not always know whether it should perform the join first or the where condition first to optimize performance. So we have to evaluate each scenario to influence PSQL to make the right choice in order to run faster. Most Elliott users do not have the expertise to do this kind of SELECT statement optimization, so don’t feel bad if you don't understand this article. This is what Netcellent does best, so just talk to us and we will help you.


EMK

Feedback and Knowledge Base