Nov

Concise History of Search Engines and Information Retrieval Technology – Part III

admin on November 17th, 2008

In the early 1990s, history shows that the internet was beginning to take off and more people were introduced to it daily. But in those days, web URLs were not seen on billboards or posted on the side of vehicles. Information retrieval was done by going to one of the new search engine portals and typing in some relevant words of what you were looking for, evaluating the listing of web sites returned, and then clicking on one or more of the links. Actually, these search engine portals have remained basically the same in concept throughout the history of the last 15+ years. The portals add features like free email, news feeds, and messenger services but the idea of using search engine portals for information retrieval is still used as widely today as it has been throughout history. Let’s take a look at three major players in the history of search engines and information retrieval:

Excite (1993): This is one of the first user-friendly search engine portals in the history of information retrieval. Five students from Stanford University developed Excite which originally went by the name of Architext. Excite was a new concept of information retrieval in that you had an area to enter your search queries similar to English questions and a listing of relevant web sites was returned. In 1996, Excite bought Webcrawler and Magellan which were among the first search engines in history. Excite was plagued with financial troubles as were many internet companies during 2000/2001 with the crash of the dot-com enterprises. It is still used on the internet today.

Yahoo (1994): Yahoo is an acronym for “Yet Another Hierarchical Officious Oracle.” It actually started out as “Jerry and David’s Guide to the World Wide Web” named after its founders: Jerry Yank, Ph.D. and David Filo who were electrical engineering students at Stanford. Yahoo was just one search engine among many in history at the time competing for a central location where everyone does their internet searches and information retrieval. However Yahoo gained popularity quickly and achieved its first million-hit day in history towards the end of 1994. In March of 1995, they received funding from Sequoia Capital and began to grow. When they had their first initial public offering of stock in 1996, they had 49 employees. Today Yahoo is a global corporation. Users of Yahoo can enjoy a customized portal with free e-mail and news feeds. It has messenger software and you can even set it to notify you via SMS on your cell phone when an e-mal message arrives.

Google (1998): Even though Google was the latecomer in the history of internet search engines and information retrieval, today it is the most popular of all of them. It was founded by two graduate students at Stanford University: Sergey Brin and Larry Page. Their original efforts were to develop a search engine called BackRub however they were encouraged by Yahoo founder David Filo to incorporate and they went into business on September 7, 1998. Google became quite popular among web users because it was quick, reliable, and returned search results highly-relevant to the search query. Google is the search engine that provides supplementary feeds to other search engines like Yahoo. Google had its first initial public offering of stock in 2005. Over time it has added free email services known as Gmail and mapping software. It is the dominant player in the information retrieval and search engine industry today.

Tags: history, information retrieval, search engine

Blog Comments Off

Nov

Information Retrieval, Search Engines, and the King of the Hill: Google

admin on November 16th, 2008

Today when we think of search engines and information retrieval, we think of Google. In fact, there is even a word for it: “googling.” If you are a serious internet user and use it for work, entertainment, and schoolwork, you probably hit Google many times each session. If you look at the news, you’ll see Google has a good share of the search engine and information retrieval market. But why is this? Why is it that the latecomer on the search engine market is today the king of the hill? Let’s consider some reasons why:

It sticks to the main thing. In other words, Google is and always has been about searching and information retrieval that is quick and efficient without a lot of advertising cluttering the space. For example, go to Google’s competitor search engine Yahoo and take a look at their main screen. It is filled with news, entertainment, sports, video, advertisements, classifieds, and at the very top is the search toolbar.

It has automatic localization functions. Go to Google from a computer in Thailand and the interface will be in Thai language. You can quickly click back to English if you need to. It is not so automatic with other search engines like Yahoo although they do have sites in other languages should you need them—just not automatic.

It is rated as the top place to work. In 2007 the company made number one on Fortune Magazine’s list of the top 100 best companies to work for. While this may be of no concern to you since you do not work there, the attitudes of their employees are bound to rub off on their end product. What is interesting to note about working for Google is their generosity towards employee education reimbursement. For example, an employee can get $8000/year tuition reimbursement. Plus, on the campus of Google, employees are served free meals saving them on their personal out-of-pocket expense of just going to work.

It has mechanisms in place to minimize spamming. When serious internet users come to the search engine for information retrieval, that’s exactly what they want to do. They don’t want to do a search query and get fed with a lot of pages that have no relevance and useless content. The Googlebot can detect attempts to spam the search engine and put websites that violate its policies into what is known as the Google sandbox.

It has sophisticated extras. In addition to its main line of information retrieval and search service, you have other office-related features like its free email service Gmail that gives you tons of storage space. You also have the Google Calendar where you can keep track of all your events. There is an easy-to-use file storage space called Google Docs.

And then there is the Google Toolbar. You have to watch out with many toolbars because they are really doorways for spyware to get information from your computer—information retrieval but the wrong way. Oftentimes when you install toolbars, they slow down performance of your browser but not so with this toolbar. And it is simple to use. You have one field where you type in your search text and that’s it. You can easily pull up your search history and clear it if you like.

Tags: Google, information retrieval, search engines

Blog, General Tips Comments Off

Nov

Concise History of Search Engines and Information Retrieval Technology – Part II

admin on November 13th, 2008

In order for the internet to be of usage in the public realm, information retrieval interfaces had to be intuitive and easy to use. The first search engines in history started out on UNIX systems using TCP/IP protocol. Anyone who has worked on the UNIX systems in the 1990s knows that it was an operating system used mainly at universities, scientific facilities, and government labs. Data stores grew quite large as history shows computers became more popular and widely-used. Search engines came on the scene for finding information in this growing data population.

The Archie search engine: To respond to the need for better information retrieval, the first computer search engine in history name Archie was written in 1989. It was written by J. Peter Deutsch, Alan Emtage, and Bill Heelan who were students at Montreal’s McGill University. The name Archie was derived from the word “archive.” The main information retrieval technology used with Archie was file transfer protocol (FTP). Archie used the concept of building a database of file names from different anonymous FTP servers. Each site was contacted using FTP once a month in order extract the available filenames and build the master database. Then, the UNIX grep command was used to search the database.

Gopher technology: In 1991, Mark McCahill of the University of Minnesota developed an information retrieval protocol known as Gopher that actually indexed text documents. Information retrieval with Archie meant that the program had to log in to each individual server with FTP. This proved to be very slow and error-prone quite often. But Gopher was its own protocol intended for information retrieval and it communicated with other Gopher servers. History would show that at this point, information retrieval technology stepped up one notch. It was still cryptic but progress was being made.

The Veronica search engine: When you have an information retrieval protocol like Gopher, you need a utility program to utilize it. Veronica was developed in 1993 at the University of Nevada by Steven Foster and Fred Barrie. It was basically a program designed to search the titles of Gopher files set up for information retrieval.

The WAIS search engine: WAIS stands for Wide Area Information Server and was developed in 1991 by a company called Thinking Machines Corporation. A WAIS server was also a separate machine that you logged in to. WAIS was a step up in that it could look for keywords inside the body of documents much like looking at the index of a book.

The Jughead search engine: This was another search engine developed in 1993 at the University of Utah by Rhett Jones. It was like Veronica except that it only looked at one Gopher server at a time whereas Veronica checked all known Gopher servers.

Wanderer – the first web robot: The only way to find something new on a network is to traverse its network nodes and detect something new today that was not there yesterday. This is in simple terms the first web robot created at the Massachusetts Institute of Technology by a student named Matthew Gray in 1993. This creation in search engine history would lead to what we know today as the Googlebot that regularly crawls the web looking for new pages to index.

One thing for certain about the early search engines was that they were designed for the computer gurus of their time and not necessarily for the every day home user. Interfaces were character-based and not intuitive to follow. However in that chapter of history, they were very popular.

Tags: history, information retrieval, search engines

Blog, General Tips Comments Off

Nov

Quick Tips on Search Engine Optimization

admin on November 13th, 2008

The first thing you need to remember about search engine optimization is that it is not an exact science—too many variables are involved. So you will see many different tips on how to do optimization from different experts. This article covers a few of the more common tips you’ll see when doing search engine optimization on a website.

Tip #1: Plan your website with search engine optimization in mind. Don’t rush to get your website up and forget about search engine optimization. First of all, you need to plan the keywords. The keywords are the most likely words a person would use to find your website. One free keyword analysis tool used is http://adwords.google.com. You basically type in the proposed keywords and it feeds you primary results along with similar keyword combinations that others used to search. It also provides a graphical chart showing the frequency of all the related keyword combinations used. These keywords are going to be used in the content of your website. Another free keyword analysis tool is found at http://freekeywords.wordtracker.com. It is similar but does not give the graphical display and it is good for determining keyword tails.

Tip #2: Think about search keyword you may want to target. For example, go to http://freekeywords.wordtracker.com and type in the search phrase: web hosting. It should return quite a few results. For purposes of this article, not every result is listed here but only a subset. The search in Free Keywords might return:

4731 web hosting
1822 free web hosting
895 michigan web site hosting
80 web hosting forum
80 web hosting newport
80 web hosting oc

The first column is the number of searches and the second column is the search phrase. Note that the majority of search counts are located in the first three results. Top three keywords may be hard to go after. It may be a better strategy to go after several not as popular, specific keywords.

Tip #3: Create web pages that are search engine friendly. The search engine bots like the Googlebot will “read” your pages in order index and categorize them. The content in your web pages should always be standard ASCII text because that is what a machine can read as well as a human. Advances in Googlebot technology have given the bot the ability to read Microsoft Word documents and Adobe Portable Document Format (PDF) but these are not so good for simple presentation to a user. Another tip in this section that will help with search engine optimization is if you name your files, folders, and images that make up your site with filenames containing your targeted keywords.

Tip #4: Fill your website with relevant content. The best websites are those with content that keep people interested and wanting to come back. Write keyword-rich articles about different topics pertaining to the target audience of your website. If you are not skilled in writing, you can hire ghost writers from one of the freelance bidding websites for an affordable price.

Tip #5: Get those backlinks. Part of the criteria that Google uses for ranking the importance of your website is the number of related sites that link back to yours. You can get backlinks by doing link exchanges with others, posting your link in articles you post on other sites, asking to have your site linked, and many other ways.

Tags: Search Engine Optimization, tips

Blog, General Tips, Search Engine Optimization Comments Off

Nov

Concise History of Search Engines and Information Retrieval Technology – Part I

admin on November 11th, 2008

From the very early history of computing, efficient and quick text indexing and information retrieval have always been a challenge. Computers held so much promise in the 1960s as a way to store massive information but that could not become a reality until the information retrieval methods advanced. So you can really begin search engine history with the introduction of computers and the developments all the way to the Internet we know today.

Search engine history includes Dr. Gerard Salton who was born in Nuremberg, Germany and had to leave the country during World War II. He received his education in America ending with receiving his PhD at Harvard University. He was an instructor at Harvard between 1958 and 1960 and professor there between 1960 and 1965. History would show that he also was among the first programmers to work on the Harvard Mark IV computer.

Salton is credited in history with developing Salton’s Magical Automatic Retriever of Text (SMART) and the Vector Space Model (VSM) for information retrieval. Initially, the VSM was not articulated as an information retrieval for search engines model but rather for performing specific mathematical calculations. In the late 1970s was when Dr. Salton started to present this as a model for computer information retrieval. He started publishing more about the VSM in the late 1980s.

The overall definition of the VSM is that it allows humans to insert search requests in natural language. Once the search request is entered, the content of the candidate documents is analyzed and retrieved. The idea is to retrieve as many documents as possible with the best match to the search query. One can see in this history how this was the seeds of today’s search engines.

Development of search engine and information retrieval technology was just one aspect of developments taking place in history. The Internet as we know it today has its roots in the Defense Advanced Research Projects Agency (DARPA) based at the Massachusetts Institute of Technology (MIT). In 1962, a paper was written describing a global network of interconnected computers available to anyone. The authors of this paper were J.C.R. Licklider and Leonard Kleinrock. Licklider was the first head of DARPA. Between 1961 and 1964 Kleinrock developed technology known as packet-switching. History would show that packet switching lead to the development of the first wide area network (WAN) and the connection of two remote computers: a TX-2 at MIT and a Q-32 in California.

What is significant about Licklider was his vision of what this global network would someday be. In 1965, he published a book name “Libraries of the Future” that contained discussions about how information could be stored and retrieved electronically thus creating a global library that could be accessed by all.

So you have the history of a global information retrieval system being put in place. There was research taking place on how to quickly search for and retrieve information that would later be used in the search engines we know today. You have the research in data packet transmissions taking place so that the information can be delivered long distances at great speeds. And you have the interconnection of computers being studied in this history. Kleinrock would then chair a group that would later submit its finding to then Senator Al Gore who would in turn develop and see passed the High Performance Computing and Communication Act of 1991. At this point, development of the Internet steps up one level and search engines really become very crucial to its success.

Tags: history, information retrieval, search engines

Blog, General Tips Comments Off

May

Internal Ranking Factors ~ Part I

admin on May 9th, 2008

Internal ranking factors are things you control directly (as part of your site) that can affect your rank. They can be further divided into on-page ranking factors (things you can do in a particular web page, such as providing appropriate keywords) and on-site ranking factors (things you can do to your web site as a whole).

How can one take into consideration hundreds of different ranking factors? The answer is quite simply: you can’t. The key is to narrow it down to about twenty of the most important known factors — the ones that have stood the test of time and have produced results consistently. Knowing what to optimize can make SEOs work easer. Instead of listening to internet hype, SEOs should rely on facts. All of the information is readily available – just waiting to be analyzed.

There are lots of myths, speculations, and theories in the SEO community and this makes it harder to focus on what’s really important. In order to quantify the importance of different factors we can write a Perl script to analyze and prove some of the theories.

…To Be Continued

Tags: internal ranking factors, ranking factors, search engine

Blog, Search Engine Optimization Comments Off

Concise History of Search Engines and Information Retrieval Technology – Part III

Information Retrieval, Search Engines, and the King of the Hill: Google

Concise History of Search Engines and Information Retrieval Technology – Part II

Quick Tips on Search Engine Optimization

Concise History of Search Engines and Information Retrieval Technology – Part I

Internal Ranking Factors ~ Part I

Categories

Archive

Links

Meta

Important Articles

Recent Articles

Latest Comments