sin: Find files by date very simply

Posted on June 10, 2010. Filed under: Scripts |

This is one of my small, simple work-horse scripts.  It’s called “sin” — short for “since”.  It finds and displays files based on the date last modified (or accessed, if option -a is specified).

Yes, I know that you can use “find” to do all this, but not so simply and easily.  And, unlike find, this command automatically sorts all the files found in date order, oldest to newest.

The script outputs a help display if option -h is specified.

You can find the most recent files, the oldest files or files just from a specific number of days ago.

I use this script for a number of jobs.  The command “sin +30 *.log” lets me quickly locate all *.log files in a directory older than 30 days, which is quite handy for cleaning up log directories.

Just plain “sin -r” helps me locate all the most recently modified files in a directory tree.

The script code is as follows:


#!/usr/bin/ksh
#
# sin - Select files by date "since".
#
# $Id: sin,1.10-c 1.10 2010/02/11 11:14:49 hawkinsk init  $
#
# Copyright 2010 Kimball Hawkins <khawkins@acm.org>. All rights reserved.
#
# This program is free software; you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 2 of the License, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
# more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 59 Temple Place, Suite 330, Boston, MA  02111-1307  US
#
# MODIFICATION LOG (most recent first)
# DATE      WHO DESCRIPTION
# ========= === =============================================================
# 08-Feb-10 KRH Corrected filename extraction when BRIEF for symbolic link.
#               Added license information.
# 03-Feb-10 KRH Old "find ... | xargs" also collapses spaces on some systems
#               (see 03-Dec-08). New solution is to pipe to awk and use awk's
#               printf() to fix the formatting.
# 30-Jul-09 KRH Minor corrections when a directory is specified.
# 17-Jul-09 KRH Converted to work with AIX's crippled 'find' and 'xargs'.
# 04-Dec-08 KRH Xargs can't handle the "'" character in an argument, even
#               within quotes. Needs to be quoted again.
#               Added option -c.
# 03-Dec-08 KRH Can't use "-exec ls ..." within find, because some versions
#               of find collapse space in -exec's output, spoiling the format.
# 19-Jun-08 KRH Improved help, improved some comments.
# 30-May-08 KRH Corrected output to include path where appropriate.
#               Greatly simplified the "find" command.
# 02-May-07 KRH Improved help.
# 05-May-06 KRH Rewrote options handling to not use getopts. This allows us
#               to process the "+n" and "n" options easily under bash. Added
#               support for specifying a range with a second DAYS entry.
#               Allow for a directory name with embedded spaces.
# 08-Mar-06 KRH Corrected year calculation for sort.
# 20-Sep-05 KRH Modified to use bash. The bash getopts does not support "+"
#               options.
# 23-Sep-04 KRH Modified to NOT use color where not appropriate or not wanted
# 17-Sep-04 KRH Added --color to display, if supported.
# 08-Sep-04 KRH Ensured it works with different formats for ls output.
# 07-Sep-04 KRH Added filespec support. Sort's month feature seems broken.
#               This sorting by date has really been a pain. Final fix is to
#               simply create my own numeric sort field using awk.
# 01-Apr-02 KRH Rewrote post-sort fiddle to correct the sort so that it
#               ALWAYS sorts year/month/day correctly. Simplified the -b code
#               so that it retains the oldest to newest sort order.
# 21-May-99 KRH Suppress blank line if nothing found.
# 14-May-99 KRH Corrected order of older files.
# 25-Jan-99 KRH Corrected help. Improved LSARG usage.
# 12-Aug-98 KRH List all types of files. Reorder modlog.
# 31-May-98 KRH Changed find for clarity.
# 29-May-98 KRH When option -a is specified, change LSARGS as well.
# 30-Apr-98 KRH Added option -a to specify access instead of modify time.
# 21-Oct-97 KRH Corrected so that one can enter an unsigned number for SINCE.
# 21-May-96 KRH Fixed so that it works when path is specified.
# 08-Apr-96 KRH Do not scan sub-directories unless -r specified.
# 23-Jan-96 KRH Added second parameter for directory path.
# 16-MAR-95 KRH Initial creation.
# ========= === =============================================================

# ===========================================================================
# function sin_help
function sin_help
{
 echo "\
sin

NAME
      sin - select and list files by date.

SYNOPSIS
      sin [-abchnr] [[+|-]{n} ...] [{path}] [{filespec}]

DESCRIPTION
      The name is short for \"since\".  The default action is to display all
      files modified \"since yesterday\".

      This script finds and displays files in the specified directory based on
      the file modification (or access) time.

      You can list files modified less than, greater than or exactly {n} days
      ago.  You may also specify a range of days by specifying both +{n} and
      -{n}.

OPTIONS
      -a        Use access time instead of modified time.

      -b        Brief format, just display the file name. Note that the list is
                always sorted oldest to newest, even when the date is not
                displayed.  Default is a \"ls -l\" display.

      -c        Only display a count of files found.

      -h        Display this help message.

      --license
                Display license information.

      -n        Do not use color in the file display.  Color is displayed by
                default if supported and if output is to a terminal.

      -r        Recursively scan sub-directories

       {n}      Where {n} is an unsigned number. List files modified in the 24
                hour period ending exactly {n} days ago.

      +{n}      List files modified more than {n} days ago.

      -{n}      List files modified less than {n} days ago.

      {path}    Directory to look in. Default is the current directory.

      {filespec}
                Restrict display to files matching {filespec}.  If the
                {filespec} contains wildcards or other special characters, it
                must be quoted.

NOTES
                For convenience, the {path} and {filespec} arguments may be specified
                together, as \"{path}/{filespec}\".

                However, in a recursive search (when option -r is specified), these
                arguments are still considered as separate arguments.  That is, the
                recursive search will start at {path} and search in and below that
                directory for files matching {filespec}. This is a feature, not a bug.

EXAMPLES
      List all files in the current directory that have been modified within
      the last week.

        sin -7

      List all files recursively through the /usr directory tree that have not
      been modified within the last thirty days.

        sin +30 -r /etc

      List all \"*.sql\" files found under the oracle home directory that were
      modified exactly two days ago.

        sin -r 2 ~oracle \"*.sql\"
"
    exit
}

# ===========================================================================
# function license
function license
{
    echo "\
sin
        \$Revision: 1.10 $
        \$Author: hawkinsk $

Copyright  1995,1996,1997,1998,1999,2002,2004,2005,2006,2007,2008,2009,2010
Kimball Hawkins <khawkins@acm.org>. All rights reserved.

This program is free software; you can  redistribute  it  and/or modify  it
under  the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at  your  option)
any later version.

This program is distributed in the hope that it will be useful, but WITHOUT
ANY  WARRANTY;  without  even  the  implied  warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public  License  for
more details.

You  should  have  received  a copy of the GNU General Public License along
with this program; if not, write to the  Free  Software  Foundation,  Inc.,
59 Temple Place, Suite 330, Boston, MA  02111-1307  US
"
    exit
}

# ===========================================================================
# INITIALIZATION

# For bash, turn on extended globbing and echo escape
[[ -n $BASH ]] && shopt -s extglob xpg_echo

BRIEF=false             # False = full "ls -l" display
COLOR=y                 # Display color in the "ls" display
COUNT_ONLY=false        # Do not display only a count of files found
DIR="."                 # Default is current directory.
PRUNE="norecurse"       # Default is not recursive
NAME=""                 # Default name spec is everything
SINCE1=""
SINCE2=""
MODTIME=mtime           # Default is by modify time.

# Don't use getopts, we need to process "+n" and "n" as pseudo-options
for PARAM
do
    # Special options
    [[ "X$PARAM" = "X--license" ]] && license
    [[ "X$PARAM" = "X--" ]] && continue # Ignore the "--" option

    unset OPTS
    if [[ "X$PARAM" = X-* ]]
    then
        # Parsing concatinated options (like "-ab"). Note that this will
        # NOT screw up single options ("-a") or numeric options ("-20").
        ####OPTS=( $( echo "$PARAM" | sed 's/[a-z]/ -&/g; s/- //;' ) )  # BASH
        set -A OPTS -- $( echo "$PARAM" | sed 's/[a-z]/ -&/g; s/- //;' ) # KSH
    else
        OPTS[0]="$PARAM"        # No parsing
    fi

    # Process what was found, one at a time.
    for PAR in "${OPTS[@]}"
    do
       case "$PAR" in
        -a)     MODTIME=atime;;         # Use access time instead of mod time
        -b)     BRIEF=true              # Just filenames
                COLOR=n;;
        -c)     COUNT_ONLY=true;;
        -h)     sin_help;;
        -n)     COLOR=n;;               # No color in output
        -r)     PRUNE="";;              # Check sub-directories also
        # "-" or "+" followed by one or more digits:
        [-+]+([0-9]))
                if [[ -z $SINCE1 ]]
                then
                    SINCE1="$PAR"
                elif [[ -z $SINCE2 ]]
                then
                    SINCE2="$PAR"
                else
                    echo "ERROR: sin: Too many days specifications" >&2
                    exit 1
                fi;;
        # Just one or more digits:
        +([0-9]))
                    SINCE1="$PAR"
                    ;;
        # Starts with "+" or "-", but isn't a known option.
        [+-]*)
                    echo "ERROR: Unrecognized option '$PAR'" >&2
                    sin_help                # invalid option
                    ;;
        # Not one of the above options. May be name of directory or filespec
        *)      if [[ -d "$PAR" ]]
                then
                    DIR="$PAR"          # Directory specification

                elif [[ -n $NAME ]]
                then
                    # Must be filespec, but we already have one
                    echo "ERROR: sin: Too many filenames (unquoted\c" >&2
                    echo " filespec?)" >&2
                    exit 1

                # Assume this is a filespec or dir/filespec combination
                elif [[ $PAR = */* ]]
                then
                    if [[ $DIR != \. ]]
                    then
                        # Directory, but we already have one
                        echo "ERROR: sin: Too many directories (unquoted\c" >&2
                        echo " filespec?)" >&2
                    exit 1
                fi
                # Was directory plus filespec
                DIR="$( dirname "$PAR" )"
                NAME="-name \"$( basename "$PAR" )\""
                if [[ ! -d $DIR ]]
                then
                    echo "ERROR: Directory '$DIR' not found" >&2
                    exit 1
                fi

            else
                # Just a filespec
                NAME="-name \"$PAR\""
            fi;;
        esac
    done
done

# ===========================================================================
# VALIDATION

# If SINCE was not specified, defaults to "-1".
[[ -z $SINCE1 ]] && SINCE1="-1"

# Always do the ls long display
LSARG="-l"

# Change displayed date to access time instead of modification time
[[ $MODTIME = atime ]] && LSARG="$LSARG -u"

# Add color if not blocked, if output is to a terminal and if supported
[[ -t 1 && $COLOR = y && $(uname -s) = Linux ]] && LSARG="$LSARG --color"

# Create the TIME command
TIME="-$MODTIME $SINCE1"
# A range was specified
[[ -n $SINCE2 ]] && TIME="$TIME -$MODTIME $SINCE2"

# If option -r not specified, do not recurse
[[ -z $PRUNE ]] || PRUNE="\\( ! -name \".\" -prune \\)"

# ===========================================================================
# PROCESSING

# Find files in the time range
REPL=""
if [ "$DIR" != "." ]
then
    cd $DIR
    REPL="${DIR%/}/"
fi

# The following command has some oddities:
# * Some versions of find collapse the space in -exec's output, spoiling the ls
#   format.  Using "| xargs ls" sometimes has the same bug.  So now we pipe the
#   whole thing to awk and reformat it correctly using awk's printf().
# * Also, non-display codes like "<ESC>[00m" appear when "--color" is specified
#   (the default).  That's what the "s@\([0-9]m\>\./@\1@;" command is working
#   around.
OUTPUT="$( eval find . $PRUNE $TIME $NAME ! -type d \
                -exec ls $LSARG \"{}\" \\\; | awk '{
            if ( NF == 9 ) { FNAME = $9; } else { FNAME = $9 " " $10 " " $11; }
            printf("%-10s %3s %-6s %-6s %12s %3s %2s %5s %s\n",
                $1, $2, $3, $4, $5, $6,  $7, $8, FNAME); }' - |
        sed "s@ \./@ $REPL@; s@\([0-9]m\)\./@\1@;" )"

if [[ -n $OUTPUT ]]
then
    # Sort's date sorting is not reliable on all platforms. So we use awk to
    # create an explicit sort field in the front, sort by it, then discard it.
    THISYEAR=$( date +%Y )
    LASTYEAR=$(( THISYEAR - 1 ))
    MONTH=$( date +%m )
    # Get rid of leading zero
    MONTH=${MONTH#0}

    OUTPUT=$( echo "$OUTPUT" | awk '
        BEGIN {
                split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec",
                months )
        }
        $6 ~ /....-..-../ {
                # Simple: date is "YYYY-MM-DD HH:MI"

                printf("%s\t%s\n", $6 $7, $0)
        }
        $6 ~ /[A-Z][a-z][a-z]/ {
                # Complex: date is "Mon DD YYYY" >or< "Mon DD HH:MI"

                for( i = 0; $6 != months[i] && i < 12; i++ ) {}
                MONTH = i

                if ( $8 ~ /..:../ ) {
                    TIME = $8
                    # Date is in "Mon DD HH:MI" format. Determine year.
                    # If month is > current month, it is last year
                    if ( MONTH > '$MONTH' ) {
                        YEAR = '${LASTYEAR}'
                    } else {
                        YEAR = '${THISYEAR}'
                    }
                } else {
                    # Date is in "Mon DD YYYY" format. No time.
                    TIME = "00:00"
                    YEAR = $8
                }
                DAY = sprintf("%02d", $7)
                printf("%4d%02d%02d%s\t%s\n", YEAR, MONTH, DAY, TIME, $0)
        }
        ' - | sort | cut -f2 )

    # Display what was found
    if $COUNT_ONLY
    then
        echo "$OUTPUT" | wc -l | sed 's/ //g;'  # Just count the lines

    elif $BRIEF
    then
        echo "$OUTPUT" | awk '{ print $9 }'     # Just print the file name

    else
        echo "$OUTPUT"
    fi  | grep -v "^$"          # Delete blank lines from output
fi

#
# End of sin
#

This is written and tested under korn shell on many different *nix systems, but was also run under a bash shell at other times.  See several locations where BASH is mentioned in the code to see what changes would need to be made for bash.

Kimball

Advertisements

Make a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Liked it here?
Why not try sites on the blogroll...

%d bloggers like this: