An SQL based solution
What is the Subset Sum Problem?
Meet Noor. Noor leads a deal team at a major global management consultancy firm. Her team is currently working on an M&A deal. Her client's accounting division is working late hours to get the current year's financials in order. They have found some discrepancies in the overall costing figures. The total expense amount shown in a report generated by their financial management system is $575,010,000. However, when summing up the expense amounts received from individual divisions and project teams, the total comes to $578,020,000. There is a discrepancy of $3,010,000. This is because all the individual divisions and projects run their own financial management systems which are not very well integrated with the central reporting system. The client has asked Noor for help in reconciliation. Her task is to figure out which of the individual accounts have not been factored in the total generated from the central system.
What Noor is faced with here is a variation of a subset sum problem. "Given a set of integers, does a non-empty subset of integers exist that sum up exactly to zero?" In this example, the set of given integers are the expense account values from each individual division or project team. Instead of zero, Noor needs to figure out which combination of these expense values sums up exactly to the value of $3,010,000.
The solution that I am detailing here is very similar to the solution given in Algorithms: The Traveling Salesman. In this post, I would focus on the differences that need to be made to the Traveling Salesman Problem (TSP) solution to arrive at the answer to the subset problem. So, I would recommend reading the post on solving TSP (8-minute read) first before proceeding.
STAGE 1: Data!
The dataset that we need to work on this problem is a list of divisions and their annual expense amounts. I create a table to store this dataset.
create table expense_accounts
(account_name varchar2(100)
,expense number);
I am inserting 10 rows of expense account values into this table. Three of those rows would sum up to $3,010,000. The aim of the exercise would be to build a query that would correctly fetch those three rows.
Here is a script that loads data into the table.
STAGE 2: Hierarchical Query! Again!
Just like the solution in the TSP problem, we will be leveraging Hierarchical Queries again in this solution.
Step 1: a simple select * query to fetch all the rows in my expense_accounts table.
select *
from expense_accounts;
Step 2: converting this into a hierarchical query. We need to convert the query into a Hierarchical Query as that would give us access to use the SYS_CONNECT_BY_PATH function. To convert the query to a Hierarchical Query, we need to add a CONNECT BY clause. However, unlike the dataset in TSP, none of the rows have a relationship with each other.
Instead of finding a direct parent-child relationship, we shall link the rows in an ascending order of account_names. Just like we did in the TSP solution, I'll add the NOCYCLE instruction, and use SYS_CONNECT_BY_PATH and LTRIM functions to generate a "path" of expense accounts.
select ltrim(sys_connect_by_path(account_name, ' / '), ' / ') accounts
,ltrim(sys_connect_by_path(expense, '+'),'+') total_expense
from expense_accounts
connect by nocycle account_name > prior account_name
Step 3: add a WITH clause
I'll then define this result set as a table from which I can query later using the WITH clause.
with account_combinations
as
(select ltrim(sys_connect_by_path(account_name, ' / '), ' / ') accounts
,ltrim(sys_connect_by_path(expense, '+'),'+') total_expense
from expense_accounts
connect by nocycle account_name > prior account_name)
select *
from account_combinations
The output for this query would be the same as the output of the query in the previous step.
STAGE 3: The hack!
Now, we will straight away implement the EXECUTE_IMMEDIATE hack in a WITH clause, the same way we did in the TSP solution.
with function sum_exp (p_expenses varchar2)
return number
as
ln_total number := 0;
begin
execute immediate 'select '||p_expenses||' from dual '
into ln_total;
return ln_total;
end sum_exp;
account_combinations
as
(select
ltrim(sys_connect_by_path(account_name, ' / '), ' / ') accounts
,sum_exp(ltrim(sys_connect_by_path(expense, '+'),'+')) total_expense
from expense_accounts
connect by nocycle account_name > prior account_name)
select *
from account_combinations
STAGE 4: The final result!
Now that we have a result set of all the combination of expense accounts and the corresponding total, we just add a condition to filter out the row whose total expense amount is $3,010,000.
with function sum_exp (p_expenses varchar2)
return number
as
ln_total number := 0;
begin
execute immediate 'select '||p_expenses||' from dual '
into ln_total;
return ln_total;
end sum_exp;
account_combinations
as
(select
ltrim(sys_connect_by_path(account_name, ' / '), ' / ') accounts
,sum_exp(ltrim(sys_connect_by_path(expense, '+'),'+')) total_expense
from expense_accounts
connect by nocycle account_name > prior account_name)
select *
from account_combinations
where total_expense = 3010000
It looks like some "entertainment" and travel-related expenses were the ones that were not accounted for in the central financial management system. Noor passes on this info to her client's accounting team, who then take it forward to get the records reconciled.
Now, this solution works very well for a small set of integers. As the number of integers in your original set increases, the time taken by the query to return a value increases as well drastically. In such cases, you would need to look at implementing algorithms and leveraging performance improvement features in Oracle DB to improve the solution.
One key takeaway from this for me is how we have managed to make use of Hierarchical Queries in imaginative ways. I doubt if the use-cases for Hierarchical Queries in the original design included it being used for solving TSP or Subset Sum Problem. The kind of applications that some of these Oracle Database functions have, go way beyond the examples and use-cases that are talked about in the documentation. It's not just for building the relationship between managers and employees in the emp table :)