Saturday, May 13, 2006

Java Caching System

http://java-source.net/open-source/cache-solutions

Friday, May 12, 2006

AWK one liners

Taken from http://www.softpanor....org/Tools/awk.shtml

# Print the length of the longest input line:
awk '{ if (length($0) > max) max = length($0) } END { print max }' data

# Print every line that is longer than 80 characters:
awk 'length($0) > 80' data

# Print the length of the longest line in data:
expand data | awk '{ if (x < length()) x = length() }
END { print "maximum line length is " x }'

# Print seven random numbers from 0 to 100, inclusive:
awk 'BEGIN { for (i = 1; i <= 7; i++) print int(101 * rand()) }

# Print the total number of bytes used by files:
ls -l files | awk '{ x += $5 }
END { print "total bytes: " x }'

# Print the even-numbered lines in the data file:
awk 'NR % 2 == 0' data

# Print first two fields in opposite order:
awk '{ print $2, $1 }' file

# Print lines longer than 72 characters:
awk 'length > 72' file

# Print length of string in 2nd column
awk '{print length($2)}' file

# Add up first column, print sum and average:
{ s += $1 }
END { print "sum is", s, " average is", s/NR }
# Print fields in reverse order:
awk '{ for (i = NF; i > 0; --i) print $i }' file

# Print the last line
awk '{line = $0} END {print line}' file

# Print the total number of lines that contain the word Pat
awk '/Pat/ {nlines = nlines + 1}
END {print nlines}' file

# Print all lines between start/stop pairs:
awk '/start/, /stop/' file

# Print all lines whose first field is different from previous one:
awk '$1 != prev { print; prev = $1 }' file

# Print column 3 if column 1 > column 2:
awk '$1 > $2 {print $3}' file

# Print line if column 3 > column 2:
awk '$3 > $2' file

# Count number of lines where col 3 > col 1
awk '$3 > $1 {print i + "1"; i++}' file

# Print sequence number and then column 1 of file:
awk '{print NR, $1}' file

# Print every line after erasing the 2nd field
awk '{$2 = ""; print}' file

# Print hi 28 times
yes | head -28 | awk '{ print "hi" }'

# Print hi.0010 to hi.0099 (NOTE IRAF USERS!)
yes | head -90 | awk '{printf("hi00%2.0 f \n", NR+9)}'

# Find maximum and minimum values present in column 1
NR == 1 {m=$1 ; p=$1}
$1 >= m {m = $1}
$1 <= p {p = $1}
END { print "Max = " m, " Min = " p }

# Example of using substrings
# substr($2,9,7) picks out characters 9 thru 15 of column 2
{print "imarith", substr($2,1,7) " - " $3, "out."substr($2,5, 3)}
{print "imarith", substr($2,9,7) " - " $3, "out."substr($2,13 ,3)}
{print "imarith", substr($2,17,7) " - " $3, "out."substr($2,21 ,3)}
{print "imarith", substr($2,25,7) " - " $3, "out."substr($2,29 ,3)}

# Single space to Double space
awk '{print ; print ""}' infile > outfile

Data Normalization in Weka

weka.filters.unsupervised.attribute.Normalize

Normalizes all numeric values in the given dataset. The resulting
* values are in [0,1] for the data used to compute the normalization
* intervals.

awk

The command to print the first attribute from a file with attribute values as

attr1, attr2, attr3 ..
is

awk -F, '{print $1}' filename

Wednesday, May 03, 2006

Upper Triangular Matrix

For allocating upper triangular matrix, we use the following technique

(Num) * ( Num -1 ) >> 1 + Num

where num is the number of dimensions of the matrix, what essentially this formulae does is m multiplying ( Num *Num -1 is always divisible by 2 and adding num gives the upper traingular matrix)

Nice trick for allocation of upper triangular matrix.

-Kalyan