Kalyan Chakravorty Blog: April 2005

Thursday, April 28, 2005

XML parsers

XML Parsers from Apache

1. Until Java 1.4 Crimson ( It has no schema support) - From Apache
2. In java 1.5 Xerces ( Had schema support) - From Apache

Xalan is the XSLT transformer from Apache.

XSL1.0 had no schema support.
XSL2.0 had the schema support.

The DOM and SAX are two programming paradigms for processing XML documents, the DOM is based on the paradigm of creating a tree , consists of node, element attribute classes. Where as SAX is based n the event based model, wherein while parsing when an elements is reached startelement method of the default handler is called, similarly for startdocument, end document and endelements the events are caught by the default handler.

The assignment that we did tried to get good mix of both of them by using the DOM model to create the tree using SAX events.

http://mia.ece.uic.edu/~papers/WWW/MultimediaStandards/Parsers.pdf

-Kalyan

Monday, April 25, 2005

How to use Enumerations

for (Enumeration e = v.elements() ; e.hasMoreElements() ;) {
        System.out.println(e.nextElement());

    }

Thursday, April 21, 2005

Finally Success with Request Tracker

I finally had success with Request Tracker, may be I forgot the password perhaps
so steps to get RT working

1. Start apache by htppd start ( int the apache installation /usr/local/apa...)
2. Start mysql,
mysqld -uuser=kosh -basedir=
3. This is the worst part logins root to the database
use mysql
update user set password = old_password('rtuser') where user ='rtuser';
flush privileges
4. Then you are ready to go

So coming to think of it is not that bad.

-Kalyan

Wednesday, April 20, 2005

What is up with Maths and Computer Scientists

So even though i might sound technical i make a genuine effort to be non technical :) as my friend Harini wants some posts to be non-technical.

Well I thought after a long time i had this pending paper that i needed to finish it , so made a genuine effort today to finish that. All was well and I was getting a hang of it till I hit upon themathematics, tha gaussian I though i had a gneral idea because it is easier to take derivatives, needs only two parameters, mean and variance to establish the bell curve.

Just when i thot I was getting a hang of it, it hit upon mulitvariate gaussians what on earth is that,
and some properties of gaussian,

addition of gaussain is a gaussain,
average is a gaussain

To add to that there is so much Math in the paper that i got frustrated and now i understand nothing in the paper :(

Sunday, April 17, 2005

XML Processing

An XML file could be processed by the default Handler, called the even driven programming, i,e
during parsing when we come across start and end elements we hit upon start and end elements
elements and to display the text we use the characters.

Now we are using the DOM,
http://www.w3schools.com/dom/dom_intro.asp

Some points of interest:
1. Objective of DOM has been to provide a standard programming interface to a wide variety applications.
2. With XML DOM one can create an XML document, navigate its structure and add, modify or delete its elements.
3. DOM represents a tree view of the XMl document

4.

Node Type	Example
Document type	<!DOCTYPE food SYSTEM "food.dtd">
Processing instruction	<?xml version="1.0"?>
Element	<drink type="beer">Carlsberg</drink>
Attribute	type="beer"
Text	Carlsberg

One of the first things many programmers want to know is how to read an XML file and generate a DOM Document object from it. Use the DOMEcho example to learn how to do this in three steps. The important lines are:

        // Step 1: create a DocumentBuilderFactory
       DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

       // Step 2: create a DocumentBuilder
       DocumentBuilder db = dbf.newDocumentBuilder();

       // Step 3: parse the input file to get a Document object
       Document doc = db.parse(new File(filename));

Thursday, April 14, 2005

Request Tracker system

I need to move all the directories, but a good idea might be to reinstall

1. apache_1.3.33
2. mysql
3. request tracker
4. request tracker faq manager. (RTFM)

One fix for mysql for the password thing , first the perl is already installed so we need to do the following

1. CPAN for the first time
perl -MCPAN -e shell

2. install mysql.

start mysql by
bin/safe_mysqld --user-mysql. ( Here we need a password fix)

3. compile and install MM.

4. Install RT.

5. Install RTFM

Then up and running is request tracker

-Kalyan

To Do

The to do list of mine seems to be growing all the time,
1. XML assignment
2. Sensor networks (finish the paper of query cost, finish the paper of BBQ)
3. Neural networks ( Data collection, kohonen networks for clustering)

Study sensor networks and neural networks and XML. This quarter have not been slogging too much the way i did the last two need to catch up with that and hopefully from today shall start

.* and * differences in unix.

So I am installing the Php/XSLT functions on a web server

1. Way to check that everything is working fine

<? phpinfo() ?&gt

2. installation check of php and install

./confgiure --prefix=/opt/apache --with-apxs2=/opt/apache/bin/apxs --with-oracle --with-oci8


nwrecover
-xauth list | grep 
-xauth add :0  MIT-MAGIC-COOKIE-1  ac8db4b4a780bf918eb5aab1b580948e
nwrecover

-Kalyan

Tuesday, April 12, 2005

Pattern of the day

If you want to solve the problem , and the data does not suit your needs,

then change the data that will suit your need . E.g Tree traversals in XSLT , when we want traversals with the predecessor and sucessor elements printed, i can't acheive it with the tree structure as such because i am not really aware in which direction i need to go , because in one case
I need to go perhaps left and in the other case i need to travel up or whatever, so might be a
good idea to faltten the structure out and traverse and get predecessors and successors.

So that is really an awesome pattern to solve problems.

//* in XML gives a preorder traversal of the tree.

< xsl:copy > < xsl:copy-of > , the copy one operates on nodes.

Copy-of and value-of differences are , values -of does an implied conversion to text whereas
copy-of does not. Copy-of is a deep copy and Copy is a shallow copy.

Comment nodes are important , what if I want to count the nodes , comment , visit them explicitly

*(apply/templates) does not visit comment nodes so the built in template of comment might be
useful to use

copy actually grabs a node
value-of implies a conversion , copy-of no conversion nodes are outputted
template sits on one node

The entire motivation of XPATH is i am not worried abt memory i am going to load it all then process with powerful syntax

No order to the data that i have written today they are more of my class notes will give it a order one fine day :)

-Kalyan

Monday, April 11, 2005

Mathematics Mathematics .....

So what is a Radom Variable, Probability distribution dunction , Cumulative distribution function,
markov models.

Computer scientists ususally are supposed to be good in mathematics but that seems not to be the case these days, we see math in the papers and then we want to avoid it as much as possible :) although i am writing this too I do the same :) even though i like math

A genuine effor now to understand the mathematics behind the papers. I started off the day with reading about the paper of data acquisition in sensor networks...

Basic Math behind the paper

Random Variable: A function that maps events to numbers, sounds cool ...

$X(\omega) = \begin{cases}0,& \omega = \texttt{H},\\1,& \omega = \texttt{T}.\end{cases}$

Is an example of a random varibale where H and T represents Heads and Tails respectively. A radom varibale is also defined as a measurable function from a probablity space to a measurable space.

In learning this deinition this introduces some more few new concepts.

Distribution Functions:
Recording all these probabilities of output ranges of a real-valued random variable X yields the probability distribution of X.

I seem to be get these things but somehow somethings seems to lag :) ,
for more references
http://en.wikipedia.org/wiki/Probability_distribution

More on this later ...

Normal Distributions
-----------------------
A very important class of staistical distributions , bell shaped density functions with a single curve. Speaking of it mathematically it needs two quantities which are two quantities have to be specified: the mean

, where the peak of the density occurs, and the standard deviation

, which indicates the spread or girth of the bell curve

http://www-stat.stanford.edu/~naras/jsm/NormalDensity/NormalDensity.html

Standard deviation determines the shape of the curve , mean determines the shift of the curve.
Covarinace matrix ..............

-Kalyan

Saturday, April 09, 2005

DTD finding of the day

In the DTD when you put your attribute datatypes they have to be capitalized obv because the
datatypes are defined that way.

So I was trying to put "id" instead of "ID" so do keep this in mind XMLSpy throws wierd error for this.

From the specificaton
-----------------------
NMTOKEN type can contain only letters, digits and point [ . ] , hyphen [ - ], underline [ _ ] and colon [ : ] . NMTOKENS can contain the same characters as NMTOKEN plus whitespaces. White space consists of one or more space characters, carriage returns, line feeds, or tabs.

More findings
----------------

When you want to specify values to
status
(data | active | updated) #REQUIRED
note that the the datatype is not required out here

-Kalyan

Friday, April 08, 2005

Things to do

I am lagging behing doing some things, things that i need to take care of immediately

1. Subversion Install.
2. Request Tracker.
3. Oracle Scripts for sharing the tablespace.

Read the Barbie-Q paper. Run the Simulator TOSSIM and get the data and build the code.
Start the XML assignment. Read a bit of Neural networks for the project...

Too much to cover ....

-Kalyan

Thursday, April 07, 2005

Mbox and JAXB

Well what i learnt today,

the mbox normally has all the mails in a file , so all mails in one file and the first line of mbox needs to start with "From" line. Usually when we have lots of mail clients open at one time the mbox usually gets corrupted and adds some lines as part of the header.

To solve the problem, just remove the lines that got appended manually and then you are all set.
Isn't it interesting :).

Today had this presentation on JAXB , pretty good though I thought learnt a bit more about it.
JAXB is the binding part of an XML to the classes why do you need that or why ot motivate is
more of JDBC , ODBC after one level i dun want to know about the XML so rather play with the
program rather than XML as such.

I dun understand why is it powerful than SAXON parsers , DOM i think loads the entire thing into memory but JAXB can act on parts of the document.

It uses the factory patern did not understand too much of that , but need to go through the
typical example of it , of getting the context and using formatted output etc.

More on this later.

-Kalyan

Wednesday, April 06, 2005

Meeting with a Prof

Well actually apart from the career fair thing it was an interesting day after we discussed some things with Prof Srivastava . I think i like this guys ides he sticks to simple ideas of Codd relational model , lossy and lossless decomposition.

I was always under the impression that OLAP was not data mining and that is what you would read it too when you read the commandments of Data mining that "OLAP is not data mining".

Now a days people are talking about multi relational mining, because till now most of the multirelational mining has focussed on getting data from one single table.

This entire things motivates us to learn about some interesting things that how about mining in Data warehouses, why to have multirelational mining as such why can i not have single table
mining over set of tables because i can represent data in one table although i will have data duplication . ( On some food for thought is it not the case that my frequent itemsets would change because of that , if i do a join and will have redundant data )

But it is a nice digression and something which should be given some food for thought.

-Kalyan

Friday, April 01, 2005

Sensor Networks

Need to catch up with some sensor networks stuff, things to do

1. Review of the lecture
2. Paper writing, how I want to reorganize
3. What all i want to write
4. Write something by the end of the day

Need to study Support Vector Machines, XML (XSLT) assignment.
Project of Sensor