Reading an IFS Directory from SQL with a UDTF

I teach and write about IFS APIs quite often, and inevitably someone is frustrated by the limitations of the opendir(), readdir(), and closedir() APIs. They do a great job of reading a list of files in a directory, but that's about it. They don't provide wildcard matching, sorting, or other useful querying capabilities. If you want those capabilities you have to write the code yourself, which can be a little cumbersome.

The SQL SELECT statement has all of these capabilities, but of course, it's for reading a database table, not for reading files in the IFS... or is it? With a User-Defined Table Function (UDTF), I can use an SQL SELECT statement on data that I calculate in an RPG program. I thought it'd be pretty nifty to write a UDTF that returns a directory listing so that an SQL SELECT statement can be used to query it.

Once I've written my UDTF, any language that can run SQL queries (including embedded SQL from RPG, Java, Cobol, C, PHP, QShell, Visual Basic, .NET, Query Manager, RUMSQLSTM, STRSQL, and iSeries Navigator) can get the IFS directory listing simply by running a SELECT statement. How cool is that?

About UDTFs

If you're not familiar with UDTFs, you can think of them as stored procedures. Both stored procedures and UDTFs calculate the contents of a result set and let your programs read it. However, I've noted the following differences:

  1. The fields are known before the statement is executed (stored procedures can potentially change the fields on each call). This is great because it enables compatibility with embedded SQL in i5/OS programs.

  2. UDTFs are called repeatedly, returning one record each time. Stored procedures return all records at once using an array or cursor.

  3. UDTFs are run with a SELECT statement, stored procedures are run by a CALL statement.

The fact that a UDTF is run from a SELECT statement is very powerful, because it means that I can use WHERE, ORDER BY, and GROUP BY on the results. I can also use the various SQL functions.

To learn more about the basics of UDTFs and how they work, see the following articles at SystemiNetwork.com:

Using A UDTF to Read an IFS Directory

I've written a UDTF called IFSDIR. It receives an IFS directory name as a parameter. It uses the opendir(), readdir(), and closedir() APIs to read the contents of the specified directory. It also uses the stat() API to get additional information about about the objects in the directory, because this information is useful in my queries.

I don't print the code for the UDTF in the text of this article. However, there's a link at the end to download the code so you can try it out.

When you install my UDTF, you have to run an SQL statement to let SQL know how the UDTF works. Here's what that SQL statement looks like:

          CREATE FUNCTION my-library/IFSDIR(dirname VARCHAR(500))
                RETURNS TABLE
                (
                  filename    varchar(640),
                  size        decimal(20,0),
                  type        varchar(10),
                  access_time timestamp,
                  modify_time timestamp,
                  change_time timestamp,
                  ccsid       decimal(5,0),
                  owner       char(10),
                  group       char(10)
                )
          EXTERNAL NAME 'my-library/IFSDIR(IFSDIR)'
          LANGUAGE RPGLE
          PARAMETER STYLE DB2SQL
          NO SQL
          NOT DETERMINISTIC
          NO EXTERNAL ACTION
          FENCED
          NO SCRATCHPAD
          FINAL CALL
          DISALLOW PARALLEL
          CARDINALITY 1

The interesting part of the preceding CREATE FUNCTION statement is that you tell SQL that it returns a table, and you actually list all of the fields that are in this table! SQL treats the output of my UDTF as if it were a file on disk with the listed fields, and it can query that file just like any other file.

EXTERNAL NAME 'my-library/IFSDIR(IFSDIR)' means that it calls a subprocedure named IFSDIR that's in a service program named IFSDIR. Since it's a subprocedure in a service program, there's very little performance impact from calling the procedure repeatedly. In other words, a service program works much better for this than a normal program object would because the calls are faster.

To try it out, I ran the following SELECT statement from the Run SQL Scripts option in iSeries Navigator:

As you can see, the statement gives me a list of the files in the directory, along with some of their attributes. The files are in no particular order, but if I want them to be in order, I can run the following statement instead:

SELECT * FROM TABLE(IFSDIR('/qibm/proddata')) as t order by filename

As with any SELECT statement, I can use other functions as well. Let's say I want to know the size of all non-directories that are in my personal home directory. I could run the following SQL statement:

SELECT SUM(SIZE) FROM TABLE(IFSDIR('/home/klemscot')) as t 
       where TYPE<>'*DIR';

The WHERE clause lets me select any directories. The SUM() function lets me add up all of the file sizes, so the statement tells me the total size of the data in the directory almost effortlessly.

Or, maybe I want to receive a list of files that end in .CSV and get a list of their sizes as well.

SELECT filename,size FROM TABLE(IFSDIR('/home/klemscot')) as t where FILENAME like '%.csv';

Maybe I'd like a list of files in the /tmp directory that haven't been used in seven days. Why not?

SELECT filename from TABLE(IFSDIR('/tmp')) as t
              where ACCESS_TIME < CURRENT TIMESTAMP - 7 DAYS;

Hopefully I've made my point with these examples. You can perform queries against the UDTF just as you would against a database file. It's a very powerful capability!

Reusability

In the preceding examples, I used the Run SQL Scripts option of iSeries Navigator because it makes a nice-looking screen. I love using that tool for ad-hoc queries. But the great thing about UDTFs is that they can be used anywhere that you can run a SELECT statement. I can use them from my HLL programs, or from an ODBC or JDBC query from another system, or anywhere that I can run an SQL statement.

Here's an example that uses embedded SQL from an RPG program:

     D Filename        s             50a

     c/EXEC SQL
     c+     Set option naming=*SYS
     c/end-exec

     C/EXEC SQL
     C+                  Declare   C1 Cursor for
     C+                      Select Filename from TABLE(IFSDIR('/tmp')) as t
     C+                      where ACCESS_TIME < CURRENT TIMESTAMP - 7 DAYS
     C/END-EXEC

     C/EXEC SQL          Open      C1
     C/END-EXEC

     C/EXEC SQL          Fetch Next from C1 into :Filename
     C/END-EXEC

     C                   dow       sqlstt = '00000'

     C**  You might change the following line to do something
     C**  more interesting, such as delete the file...
     C                   dsply                   filename

     C/EXEC SQL          Fetch Next from C1 into :Filename
     C/END-EXEC
     c                   enddo

     C/EXEC SQL          close     C1
     C/END-EXEC
     C                   eval      *inlr = *on

I think this is pretty cool. Remember, you can write a UDTF that outputs anything you want. I used IFS directories in this example, but you write the code that determines the contents of the output, so you can write your own routine that outputs anything that makes sense for your organization.

Code Download

Click here to download the code for my IFSDIR function