Bala's Blog

JOY OF PROGRAMMING

How to add & symbol in xml element value?

The xml doesnt support the “&” symbol when it is given as the value. To make it support it we nee to add

&

if we add it takes it as “&” and doesnt throw the error

 

Thanks

Bala

Running hadoop in eclipse

Hi folks

I have done some steps for running the hadoop map reduce program in eclipse

1)Make sure that the Eclipse and hadoop 0.20.2 is installed

2) Then follow the steps given below

i) Create a new Java Project in Eclipce and name it “hadoop-0.20.2″.
ii) Import the hadoop-0.20.2.tar.gz into the above project.
iii) Ant.jar must be imported into the library folder of the project.
iv) Rewrite eclipse/workspace/hadoop-0.20.2/.classpath to add source folders and necessary libraries into the project.

<?xml version="1.0" encoding="UTF-8"?>
<classpath>
	<classpathentry kind="src" path="hadoop-0.20.2/src/core"/>
	<classpathentry kind="src" path="hadoop-0.20.2/src/mapred"/>
	<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER/org.eclipse.jdt.internal.debug.ui.launcher.StandardVMType/JavaSE-1.6"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-logging-1.0.4.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/xmlenc-0.52.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-net-1.4.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/kfs-0.2.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jets3t-0.6.1.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/jetty-util-6.1.14.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-codec-1.3.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/log4j-1.2.15.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-cli-1.2.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/ant.jar"/>
	<classpathentry kind="lib" path="hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar"/>
	<classpathentry kind="output" path="bin"/>
</classpath>

v) Then refresh the project.

Then run the map reduce program by creating a new java project for e.g. Word Count

1) Then in build path add the eclipse project hadoop-0.20.2

2) Add hadoop-0.20.2/hadoop-0.20.2/lib/commons-cli-1.2.jar at Libraries tab

3) Then add commons-httpclient-3.1.jar and apache common logging commons-logging-1.0.4.jar

4)then create a file called log4j.properties inside the bin/wordcount/ for viewing the output as in the eclipse console

og4j.rootLogger=INFO,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n

5)Then we can execute the program wordcount.java in eclipse

Thanks
Balasundaram J K 


Restore Panels In Ubuntu Back To Their Default Settings

Messed up your panels in Gnome? Maybe your new to Ubuntu and accidentally deleted items or the panel itself and now you can’t figure out how to get it back.

Sure, you can add a new panel and rebuild it by adding the items back on the panel.

Instead of going through the trouble, there is an easy fix that will restore your panels back to their default settings quickly.

Open up a Terminal window, by clicking on Applications \ Accessories \ Terminal. Or, if you deleted the top panel and cannot access the menus, just press ALT+F2 and in the run dialog box, type gnome-terminal then click on Run.

You can also browse for applications, such as Terminal from the Run window, by clicking on the arrow icon next to ‘Show list of known applications” and browse for Terminal.

gnomedefaultpanel.png

Once the Terminal window opens, enter the following command at the prompt:

gconftool-2 – -shutdown

(Note: There should be no spaces between the two dashes before shutdown.)

EDIT – Reader nickrud has suggested a better method instead of shutting down gconfd. Instead use the following command (thanks nickrud!)

gconftool – -recursive-unset /apps/panel

(Remember: There should be no spaces between the two dashes before shutdown.)

Then enter the next command:

rm -rf ~/.gconf/apps/panel

And enter one more command:

pkill gnome-panel

That’s it!

Both top and bottom panels will appear (if missing) with their default settings. Now you can customize them to your preference and get on with using Ubuntu.

Shell Script for Reading the properties file of Java type

sed ‘/^\#/d’ property_file_name | grep ‘property_name’ | tail -n 1 | cut -d “=” -f2- | sed ‘s/^[[:space:]]*//;s/[[:space:]]*$//’

This command gives the property value from the properties file .

# gres.sh
pattern=$1
replacement=$2
propvalue=`sed ‘/^\#/d’ $3 | grep $1 | tail -n 1 | sed ‘s/^.*=//;s/^[[:space:]]*//;s/[[:space:]]*$//’`
A=”`echo | tr ’12’ ’01’ `”
sed -i -e “s$A$pattern=$propvalue$A$pattern=$replacement$A” $3
# end script

This replaces a property value within a given property file:

./gres.sh

Spell Check Configuration in Solr

The spell check is one of the essential things that we need to use in our application for spell
correction.
This can be done in solr by first writing the “Spell check component”

in the solrconfig.xml file.

The below is config of th spell check component

<searchComponent name=”keyspellcheck”>

<str name=”queryAnalyzerFieldType”>textSpell</str>

<!– Multiple “Spell Checkers” can be declared and used by this
component
–>

<!– a spellchecker built from a field of the main index, and
written to disk
–>
<!–   <lst name=”spellchecker”>
<str name=”name”>default</str>
<str name=”field”>keyword</str>
<str name=”spellcheckIndexDir”>spellchecker</str> –>
<!– uncomment this to require terms to occur in 1% of the documents in order to be included in the dictionary
<float name=”thresholdTokenFrequency”>.01</float>
–>
<!– </lst> –>
<lst name=”spellchecker”>
<!–
Optional, it is required when more than one spellchecker is configured.
Select non-default name with spellcheck.dictionary in request handler.
–>
<str name=”name”>default</str>
<!– The classname is optional, defaults to IndexBasedSpellChecker –>
<str name=”classname”>solr.IndexBasedSpellChecker</str>
<!–
Load tokens from the following field for spell checking,
analyzer for the field’s type as defined in schema.xml are used
–>
<str name=”field”>keyword</str>
<!– Optional, by default use in-memory index (RAMDirectory) –>
<str name=”spellcheckIndexDir”>./spellchecker</str>
<!– Set the accuracy (float) to be used for the suggestions. Default is 0.5 –>
<str name=”accuracy”>0.4</str>
<!– Require terms to occur in 1/100th of 1% of documents in order to be included in the dictionary –>
<!–<float name=”thresholdTokenFrequency”>.0001</float> –>
</lst>
<!– Example of using different distance measure –>
<lst name=”spellchecker”>
<str name=”name”>jarowinkler</str>
<str name=”field”>lowerfilt</str>
<!– Use a different Distance Measure –>
<str name=”distanceMeasure”>org.apache.lucene.search.spell.JaroWinklerDistance</str>
<str name=”spellcheckIndexDir”>./spellchecker</str>

</lst>
</searchComponent>

Here the field value must be specified in the schema file with the analyzers and tokenizers that are necessary.

Then the spell component has to be added with the request handler “SEARCH” so that it appears in the response of the solr

<requestHandler name=”search” default=”true”>
<!– default values for query parameters can be specified, these
will be overridden by parameters in the request
–>
<lst name=”defaults”>
<str name=”echoParams”>explicit</str>
<int name=”rows”>10</int>
<str name=”spellcheck.onlyMorePopular”>true</str>
<str name=”spellcheck.extendedResults”>false</str>
<str name=”spellcheck.count”>3</str>
<str name=”spellcheck”>true</str>
<str name=”spellcheck.collate”>true</str>
<str name=”spellcheck.extendedResults”>true</str>
</lst>

 

Then the data must be reindexed and the required suggestions for the misspelt word can be got.

 

JKB


KeywordTokenizerFactory vs StandardTokenizerFactory in solr

KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token

StandardTokenizerFactory :-
It tokenizes on whitespace, as well as strips characters

Documentation :-
Splits words at punctuation characters, removing punctuations. However, a dot that’s not followed by whitespace is considered part of a token.
Splits words at hyphens, unless there’s a number in the token. In that case, the whole token is interpreted as a product number and is not split.
Recognizes email addresses and Internet hostnames as one token.

Would use this for fields where you want to search on the field data.

e.g. –

http://example.com/I-am+example?Text=-Hello

would generate 7 tokens (separated by comma) –

http,example.com,I,am,example,Text,Hello

KeywordTokenizerFactory :-

Keyword Tokenizer does not split the input at all.
No processing in performed on the string, and the whole string is treated as a single entity.
This doesn’t actually do any tokenization. It returns the original text as one term.

Mainly used for sorting or faceting requirements, where you want to match the exact facet when filtering on multiple words and sorting as sorting does not work on tokenized fields.

e.g.

http://example.com/I-am+example?Text=-Hello

would generate a single token –

http://example.com/I-am+example?Text=-Hello

Customized Stemming Dictionary in solr Porter stemmer

The simple way to use a lot of custom stemming words in solr we can implement the StemmerOverrideFilterFactory and the schema of the field using that will look like the following
< filter class="solr.StemmerOverrideFilterFactory" dictionary="stemdict.txt" />
< filter class="solr.PorterStemFilterFactory" />

The dictionary stemdict.txt must look like
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#———————————————————————–
# test that we can override the stemming algorithm with our own mappings
# these must be tab-separated
monkeys monkey
otters otter
# some crazy ones that a stemmer would never do
dogs cat

Note:while giving the stem substitute words it must be separated by a tab space

Embedded Solr Server using Java

Hi

The Solr server can be used without using the http by using the Embedded Solr server.In this we use the java api to

connect to the solr server.The program for the Embedded Solr is as follows

System.setProperty(“solr.solr.home”, “/home/user1/apache-solr-3.4.0/example/solr”);
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
CoreContainer coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, “”);
SolrQuery solrQuery = new  SolrQuery();
solrQuery.setParam(“fl”,”id,score”);
solrQuery.setParam(“q”,string);
solrQuery.setQuery(“id:bala”);
QueryResponse rsp = server.query(solrQuery);
SolrDocumentList docs = rsp.getResults();
System.out.println(docs);

 

By using this we can actually search the indexed document in the Solr which is in a single core.
The set param is used to pass the query string as the parameter and get the required results
based on it.

 

thanks

bala

USAGE OF PING AND IFCONFIG COMMANDS IN LINUX

Hi ….

Today I am going to say some of the usages of the ping and the ifconfig command in linux systems…..

1)ifconfig

The ifconfig command is used to find the ip address of the system which we are working and has many

options in it which can be used to change the ip address or to connect to a network etc………

2)ping

The ping command is used to check whether the ip address specified as a argument of the ping command

is connected to the network in which our system works ….

If it is connected to the network then it receives packets from the system………………

That’s it for today

with regards………….

JKBS……………..

MOGRIFY COMMAND FOR IMAGE RESIZING IN LINUX

If you need  to just resize  images in a folder

1)Open terminal

2)By using cd command to respective folder

3)And then type following:

mogrify -resize 800×600 -verbose *.JPG

This will resize all the JPG files in the folder in a single go.

jkbs………………….