Planet Network Management

Understanding how the Weather affects your Network

Not too many years ago I ran a large network for the United States Air Force. For the first several years I was there we had an old copper-wire backbone and it was amazing to see the havoc that weather could wreak upon that network. Sometimes you practically...(read more)

Congratulations Saints

Yesterday I, along with a large number of other people, watched the New Orleans Saints win their first Super Bowl.

Last year I watched the game in a hotel room in Milan (when my Steelers won their sixth championship – number 43 we miss you) but this year I was able to hold a little party.

Now, like hot dogs go with baseball, pizza is fast becoming the food of choice for football, and no Super Bowl party would be complete without some Papa John’s.

Of course, my Mom also showed up (that’s her behind the pies – “Hi mom”) and she brought enough food to feed an army. I think everyone left stuffed.

Anyway, Denise Dubie at Network World wrote an article about how Papa John’s uses OpenNMS, and how they delivered 6 million slices of pizza yesterday.

They worked so we didn’t have to (well, except for Mom. Thanks Mom).

Getting Started with Zenoss Core Webinar February 9

Have you recently downloaded Zenoss Core, or do you have questions about implementing the solution in your environment? If so, please register  to attend our bi-weekly Getting Started with Zenoss Core Webinar. The February 9 session is still open for sign-up, if you can’t make this session, the next one is already on the schedule. You can register here:

 

Tuesday, February 9 2:00 p.m. EST

Tuesday, February 23 9:00 a.m. EST

 

Here’s   what you’ll get out of the session:

  • An introduction to the Zenoss Community
  • Installing the software properly
  • Preparing your environment
  • Logging in to get started
  • Adding, classifying and auto-discovering your devices
  • Getting and staying organized
  • Seeing the “big picture” (dashboard, network map, event console, Google Maps, etc.)
  • Avoiding common mistakes

 

We also have a Zenoss engineer available to answer questions live – and there are usually lots of questions submitted! If you’re interested in seeing past Q&A logs, take a look at some of the previous Getting Started with Zenoss Q&A sessions where we document and upload all of the questions submitted along with answers.

Does Automation Replace Humans?

Somewhere in a land a long time ago, there was this steel factory in the heartland of Pennsylvania.  In this steel factory there was a bulletin board on a factory floor room, and on the board were two large posters.  The first poster was a reminder of the upcoming layoffs on Friday with procedures for filing paper work and such.  The other poster was an announcement of the new IBM mainframe computer that was arriving that week.  This new marvel of modern technology just happened to be arriving on that same Friday.  The IBM poster encouraged employees to gather and witness this historic event, and that they did.  So when Friday rolled around, the poor IBM’rs with their blue suites and dark ties were meet with harsh resentment.  The factory floor workers harassed them unmercifully, and in some cases they were actually spat on.  I take it you see the irony of the poor steel workers assumed correlation between the new computer and the layoffs.  It’s easy to see now that the new “computers” had nothing to do with the layoff… Or is it?

Read the complete post…

Does Automation Replace Humans?

Somewhere in a land a long time ago, there was this steel factory in the heartland of Pennsylvania. In this steel factory there was a bulletin board on a factory floor room, and on the board were two large posters. The first poster was a reminder of the upcoming layoffs on Friday with procedures for filing paper work and such. The other poster was an announcement of the new IBM mainframe computer that was arriving that week. This new marvel of modern technology just happened to be arriving on that same Friday. The IBM poster encouraged employees to gather and witness this historic event, and that they did. So when Friday rolled around, the poor IBM’rs with their blue suites and dark ties were meet with harsh resentment. The factory floor workers harassed them unmercifully, and in some cases they were actually spat on. I take it you see the irony of the poor steel workers assumed correlation between the new computer and the layoffs. It’s easy to see now that the new “computers” had nothing to do with the layoff… Or is it?

My first real experience with this debate actually occurred around 1987 when I created my first startup. A friend of mine and I had created an automation tool called Sybridge. This product was designed to help operators automate the mundane tasks they needed to perform throughout the day. Back in the day, computer operators working with large IBM mainframes had to enter all sorts of rote command sequences to perform their daily tasks. In 1987, the only programming interface for operators and system administrators was a “green” screen terminal. My friend and I came up with a clever way to screen scrape these terminals allowing operators and systems administrators to create “Rexx” scripts to automate these tasks. For example, back then the only way to enter a problem ticket into a problem management system was to manually enter the data into one of those unforgiving “green” screen monsters. With our Sybridge product, we could capture an online transaction (e.g., CICS transaction) and automatically enter the information into the problem database. Great idea, right? Actually, we spent a lot of time and money doing market analysis on the fear perception that operators might not think this was a great idea. At that time there was a big debate going on about whether automation products would replace humans. However, by the turn of that decade automated operations products were a ubiquitous main stay in almost every fortune 5000 enterprise IT shops. In fact, most of the “good” operators became automation analysts and scripting freaks. The operational productivity of almost every enterprise running IBM mainframes increased tremendously. At least until distributed computing snuck it’s way into the data center.

So I had to chuckle the other night when this same subject of infrastructure automation and people losing jobs came up at the Seattle Cloud Camp. During one of the breakout sessions about Opscamp, a lengthy discussion came about regarding infrastructure automation products like Chef and Puppet and the idea that such products might replace system administrators. Ironically, I was able to use an antidotal example from a conversation I had just a few weeks prior with a friend of mine. The discussion I had was about managing automation products for a large Fortune 1000 company. This friend of mine was a systems administrator who managed a number of proprietary IT management products. The conversation started by him asking me if I could recommend any open source products as replacements for the proprietary ones. He was currently using some of the larger vendor (Big Four) type products and was thinking about a change. I gave him my usual suspects, a list of my favorite operations tools. Then he said “Yeah, but management doesn’t feel comfortable with open sources.” At this point, I lost it. I told him, “Enough with the open source and management. It’s not about open sources versus proprietary, it’s about top line value." I told him. Then I said, “You have to ask what tools allow you to effect the business top line value for your company?” I had guessed that he spent at least 60% of his day working on the tools trying to fix them and make them work effectively, and less than 40% of his time providing real business value from the tools for his company. He said “John, you are wrong… I spend 80% of my time working on the tool and less than 20% of my time providing value.” That’s what you need to tell management, not OSS vs non-OSS.

The bottom line is that in my 30 years of working in this IT muck, I have never seen an example where automation eliminated valued employees. Automation, however, can be a used as an opportunity to clean up some dead weight. Automation, more often than not, makes good employees better and flushes out the bad ones. Therefore, as these topics of automated infrastructure, infrastructure as code, agile operations, and devops slip into the IT lexicon, we all need to be aware that good system administrators will typically agree that automation does not replace humans, whereas bad system administrators will beg to differ.

Who Do You Browse With?

Who Do You Browse With?

I wrote a blog nearly a year ago expressing my mistrust in Google Chrome and asserting my position that when Chrome became available for Mac (my preferred computer) I would not be jumping on that bandwagon.

Well, what a difference nearly a year makes. I did stay somewhat true to my word and I was not the first, the second, or even the thousandth to jump on the Mac compatible Chrome bandwagon last spring. But jump, I eventually did.

The bugs I encountered a year ago; the incompatibility with Google’s own commodities such as YouTube and Gmail, had been worked out. The browser offered a variety of sleek new skins with which I could customize it and the load time for my favorite sites saw a noticeable, albeit not huge, improvement from Firefox.

And according to the numbers released by W3School’s Browser Statistics Month by Month last week, I wasn’t the only internet user to take notice of Chrome’s advancements: Chrome was used for 10.8% of visits to W3School site in January 2010. A solid 3rd place after Mozilla Firefox (46.3%) and Microsoft Internet Explorer (36.2%). 

Out of curiosity I used Google Analytics to get an idea of how the browser war was playing out with the visitors of WhatsUp Gold’s external blog www.dailynetworkmonitor.com.

 

Sure enough, there was a marked upswing in visitors using Chrome to come to our blog in January as well. Our stats for January were about the same as W3School’s showing. Chrome rang in at 10% whereas Firefox held 33% of the use and Internet Explorer 51%.

But what was more interesting to me – and more impressive proof of this market trend – was the jump in Chrome use between November and December. In November Chrome barely showed up in our analytics. It was used for .03% of the visits, just coming in under Safari’s .035% share. Firefox counted for 22% while Internet Explorer dominated at an impressive 69%.

But December brought a noticeable shift; Chrome suddenly jumped to 7% of the use while Safari stayed around the same. Firefox stayed around the same overall percentage where Internet Explorer saw a drop off to 54%.

It’s my opinion that Chrome’s market increase is largely due to frustration with Internet Explorer. Security issues aren’t anything new with the browser, but renewed concern over it’s security led the French and German governments to advise people to switch browsers last month.

While IE still dominates the browser space, its market share has seen a steady decline; from 68.5% last March to 62.12% in January. Firefox hasn’t seen too much change in their market share over the last year; Mozilla’s browser suffered a one percent drop from their 23.30% in March mid year but caught back up to 24.43% last month. Chrome, however, held 1.62% in March 2009 and can now claim 5.22%. Google’s browser owes most of its jump to OS X users who did not have access to it until last spring.

I’d like to see Google Chrome’s market share continue to grow and challenge Microsoft to create a more user and security friendly Internet Explorer. While in corporate terms the Internet giant and computing czar are on a level playing field, Google’s Internet browser has a ways to go to catch up to Microsoft’s institution.

Google’s scrappy little browser could be the motivation Microsoft needs to create a level browser one should be able to expect from a company with resources like Microsoft.

It’s my opinion that the best thing to happen to any big company is a little competition. It keeps them from getting away with laziness and keeps prices and expectations in the market fair.

But then again . . . my opinion on the matter may be a little biased.

Open Source, Social Contracts and Running a Business

When I started my first company in 2002, I had a lot of previous employers to provide examples, both positive and negative, of how to run a business. At the time IBM and Hewlett-Packard were leaders in network management, so I could have modeled my business on them.

Instead I modeled it on Ben and Jerry’s ice cream.

Many might think it was a strange choice, but it seems to have worked out well, at least for us.

First, they make a good product. This is of paramount importance in any business.

Second, they limited the amount of money the highest paid people could earn in salary. In their case, the highest paid person could not make more than seven times the lowest paid person.

I am constantly disgusted by executive salaries these days. Being a previous employee of NORTEL, now in bankruptcy, I find it highly ironic that the executives responsible for driving the company into the ground received huge retention bonus to keep them from leaving. In a just world they would have had no where to go, and particularly they would not be financially rewarded for poor performance.

To me a salary should exist to cover the basic necessities of living, but the real compensation should be based on the performance of the company. Let me stress that I want there to be no limits on overall compensation – if the company is doing well I want everyone’s “upside” to be unlimited. But getting a huge salary just for showing up feels wrong, especially if the company is doing poorly.

Steve Jobs, one of the most successful CEOs ever, takes home a salary of just $1.

Back to Ben and Jerry’s. The one other thing they did that I admired was to donate a certain percentage of pre-tax profits to charity.

I like donating to charity, but I find that I am most eager to give to those organizations that are a) small and b) concerned directly with something I care about. Thus each year I give to the EFF, the FSF and the SFLC, plus a number of local charities.

When the earthquake in Haiti happened, we were shocked and saddened like most of the world. I wanted to help, but I wasn’t sure how. Luckily, the opportunity came in a most unexpected way.

Matt and Jeff (along with Alex) were hanging out in the OpenNMS IRC channel (#opennms on freenode.net) when a man named Andris Bjornson joined and started asking questions about OpenNMS. It turns out that he works for an organization called Inveneo that supplies bandwidth in rural and under-served areas in the developing world. Haiti was the perfect example of a place that needed their services, since a lot of the relief effort is run by non-government organizations (NGOs), and they rely on communications in order to maximize the good they can do.

Haiti’s communications infrastructure, such as it was, was destroyed by the earthquake, and Inveneo is using wireless technology to provide a timely replacement. Of course they need some way to manage this infrastructure (as you can imagine, it is in high demand) and they chose OpenNMS.





Andris installing an antenna in Port au Prince (click for more pictures)

Andris has been using OpenNMS for awhile, but he had some questions and there were some issues in managing the radios they were using. The guys in the channel were more than happy to help out, but we wanted to be involved in a more formal way.

We decided to donate a commercial support contract to Inveneo to help them out in Haiti.

It’s pretty cool to be involved, at least in some small way, with getting Haiti back on its feet. It was also cool to have OpenNMS chosen from all possible apps out there to play a role.

You can read more about Inveneo and OpenNMS in this press release, and please consider donating to their efforts.

Open source has a large social component, and I have a theory that being involved in open source software makes one generally more interested in social issues. I want to hear from others about their experiences with social causes tied to open source. Jon “Maddog” Hall is also a fan of Inveneo, and I’d love to have more examples.

UPDATE: Here’s a network diagram of the Inveneo network, and the “How to Deploy” document mentions us by name.

Prepare for the New CCNP Tests with FREE Training Books, Videos and Cert Kits

Cisco Press will be giving away 50 copies of its new CCNP Cert Kits and other study guides to help you prepare for the revised CCNP certifications. The mega giveaway is being sponsored by Cisco Press on Network World’s Cisco Subnet – a community website.


The chances are really high that if you enter, you will win something. All you need to do is find some words that form a specific sentence in various chapters that are provided and enter the response. Not too much work for some free study materials. The contest ends March 31. Register to win one of 10 copies of the following titles:

CCNP Route Cert Kit (Read excerpt.)

CCNP Switch Cert Kit (Read excerpt.)

CCNP Tshoot Cert Kit (Read excerpt.)

CCNP Routing and Switching Official Certification Libraries

CCNP routing and Switching Quick Reference printed bundle


Good luck and hope you win!

Oracle to Acquire AmberPoint

Oracle announced this morning that it is acquiring AmberPoint, the leading vendor of SOA management solution. AmberPoint is widely recognized as the Cadillac in the SOA management space, especially with its ability to enforce policies that help improve application performance and security, and to diagnose transactions not only within a composite application, but also across different applications. There had been speculations for a long time whether AmberPoint wanted to stay independent, or be acquired by a larger vendor. The answer is now known, and it is good that we got it. :-) AmberPoint, along with Sun Ops Center, will add to Oracle's capabilities in delivering application-to-disk management to customers.

Click here for the official press release about this acquisition.

Reductive Labs announces Puppet training dates for London, New York, and Nuremberg

Puppet Training is popular apparently! Due to demand, we’ve scheduled 3 public Puppet training courses in NY, London, and Germany. You can register and get more information about the training at this link. If you have any questions please contact Scott Campbell.

Location & Dates

Becoming a Puppet Master – 3 Days

Puppet Training consists of 3 days of hands-on training performed by a Reductive Labs Puppet professional. Attendees will be taught the principles and best practices of Puppet in a series of lectures and labs.This training is ideal for those who want a Puppet jumpstart. Newer members at an organization already using Puppet, or experienced sysadmins wanting to bring Puppet into their team will get everything they need to deploy solutions.

Topics covered include:

  • Configuring Puppet and Puppetmaster
  • Resource Types and the Resource Abstraction Layer
  • Virtual Resources, Exported Resources and Stored Configs
  • Meta-parameters, Dependencies and Events
  • Classes, Modules and Definitions
  • Tags and Environments
  • Puppet Language Patterns and Best Practices

Puppet Developer Curriculum – 2 Days (NY & London Only)

This is an advanced course for those Puppet users who are interested in developing skills and learning best practices for creating their own custom Resource Types and Modules.

  • Introduction to Ruby for Puppet
  • Advanced Function and Fact development
  • Resource Type and Provider development
  • Testing practices and RSpec for Puppet

Looking forward to seeing you there!

Oracle Consolidates Global Network Management with Monolith

In this blog, we talk a lot about the challenges faced by organizations trying to cope with multiplying services, diverse technologies and siloed monitoring and management tools.  You will also hear us mention that these challenges are far more daunting when services like VOIP and video need to be delivered globally and without interruption.

In our latest case study, Oracle’s Senior Director of Enterprise Automation & Tooling, Tony Miranda, talks about Oracle’s decision to consolidate monitoring and management of their global networks with Monolith.

In a global implementation that took mere weeks to complete, Monolith has allowed this global business software giant to simplify fault, availability and performance monitoring across the company, while cutting licensing, headcount, hardware, and annual maintenance costs.  Oracle’s team now uses Monolith’s dashboard engine to quickly create custom real-time IT dashboards, improving executive visibility across their entire network.

Oracle turned to Monolith to monitor and manage global services.  Shouldn’t you? Read the full case study.

Technorati Tags:
, , , , , ,

Technology lag

I was interested to see a blog post discussing the benefits of the new 4G wireless standards currently in development. It struck me just how long it really takes for a technology to be in use by the majority of people. Here we are at the dawn of the 4G world and yet 3G isn’t widely deployed. The 3G licences were auctioned in the UK around ten years ago.

I’ve had an Apple iPhone 3G for a few months now and I am able to use a 3G signal for a small fraction of the time. In fact, outside of major cities, you’ve very little chance of getting a decent 3G signal. Most of the time I’m stuck on GPRS speeds or worse. If 3G hasn’t spread outside of the main metropolitan areas ten years after the original spectrum auctions, then it seems likely that there is no business case for ever doing so. If it isn’t commercially viable to implement 3G then what hope is there for 4G?

I wonder if the auction process itself could be to blame for the patchy deployment? Whilst the government in the UK did very nicely out of the auction, the bidders did pay very handsomely for their spectrum. Perhaps a better solution would have been to cap the auction price but place a service guarantee onto the bidders to ensure a more even deployment.

A broader implementation of 3G technology I’m sure would be a boon to the hi tech sector in the UK and would have had the effect of increasing economic activity. Whether the increased economic activity would have made up for the shortfall in the spectrum auction revenue is hard to say. But, the auctions were one off events and the increased economic activity would keep paying year after year.

Will the areas that don’t already have 3G never benefit from high speed wireless internet access? It isn’t looking promising…


Splunk4 + Instant Messaging = SplunkAIM

This small, unofficial project integrates an open-source AIM (AOL Instant Messaging) Chatbot with Splunk 4, allowing ad hoc searching, running of prepared searches, and real-time search alerting via instant messaging.

What’s real-time searching? It’s new in Splunk 4.1, out shortly, and will allow users to search for “real-time” events, within seconds of them reaching Splunk. Most usefully, you can set up real-time searches and be IM’d with the matching events the second they show up. You could ask to be IM’d, for example, whenever someone logs into your system, whenever there’s an error, whenever someone logs in as root, etc.


Above is a screen capture of real-time alerts printing out for each time someone downloads Splunk!

Note: You can use this project with Splunk 4.0, and everything other than real-time searches will work. That means you can do ad hoc searches and run saved searches over historical data.

Download Project

Example Searches

    ? prints out a help message explaining commands.
    rtsearch login root set up a real-time alert to IM you whenever a user logs in as root.
    rtlist get a list of all your real-time alert jobs.
    rtstop * cancel all your real-time alerts.
    search login | top 5 username run an historical search reporting to top 5 users who logged in the most.
    admin error IM’s not starting with known commands will search existing saved searches (here we search for saved searches about admin errors).

Learning Lua Part 1 to 5

Here is a compilation of articles on the subject learning Lua.

Part 1
Part 2
Part 3
Part 4
Part 5

Part 5 is the final part and a complete Lua example script, monitoring Apache worker threads.

User Provisioning: the right access, right now

Think back to your first day of work at a new job (could be your current one or a past one.) Remember how exciting things were, you were in HR orientation learning about your benefits and vacation policy – learning about your 401k options… all that good stuff that you get filled with when you join a company. Then the HR rep walks you to your new cubicle (because who really gets an office anymore) and you’re ready to get to work. Then, the emptiness sets in. Over the next 45 min you try to log on to your machine, you try to get email set -up, you try to get access to (enter system/application du jour here.) Then you get on the phone with the help desk after you get the number from your new cube mate (who’s already annoyed that that you’re there sharing their space) because you have no access to systems to look the help desk phone number up. An hour and a half later, you are finally up and running – at least for today. Sound familiar? 

Welcome to the world of user provisioning. What seems like such a simple task -- giving people access to the systems, applications, and general business data they need -- is really more of a three headed monster than most of us earthlings will ever realize. Dave Kearns wrote a great article on user provisioning earlier this week that I found particularly insightful. In the piece, Kearns reveals three key events in an employee’s life at a company where provisioning comes into play; when they join the company, when they change jobs/ responsibilities, and when they leave the company, and why they are important.

While giving employees access to the right resources in a timely manner is critical to business productivity, just as critical is removing access to resources when they are no longer required to perform a task or after that employee has left the company. After all, do you really want someone who has moved from a marketing management position into a sales position to have access to the payroll data for the entire marketing organization? I think not. So, what’s the answer? You got it!  User provisioning. How does your organization remove access to resources, while providing access to new resources in a timely manner for your employees? 

While productivity and burden on the help desk are great cases for getting user provisioning right, the first time, Kearns proposes that security is an even more important reason to get your provisioning right. And I must say, I agree here. Unnecessary access to information after a user has changed roles or left the company poses one of the greatest threats to the security of your business. All it takes is one person with malicious intent who abuses their access to critical data for a data breach to occur. Yes folks, it only takes one – one person – to run the train off the tracks and the next thing you know, your company is front page news due to a data breach.

So how do you prevent these things from happening? It all starts with your approach to user provisioning. If you take the steps towards automating user provisioning, you have the ability to reduce deviation from process (and the mistakes that can easily result) and drastically reduce the time it takes to provide or revoke access to critical data. Automation really is the difference between user provisioning and securely provisioning users. The latter enables you to really protect your business, while still meeting its needs.

OSTU: Examining A File Copy Comparison (by Tony Fortunato)

Tony_fortunatoThe_tech_firmInstructor Profile - Tony Fortunato is a Senior Network Specialist with experience in design, implementation, and troubleshooting of LAN/WAN/Wireless networks, desktops and servers since 1989. His background in financial networks includes design and implementation of trading floor networks. Tony has taught at local high schools, Colleges/Universities, Networld/Interop and many onsite private classroom settings to thousands of analysts. Tony is an authorized and certified Fluke Networks and Wireshark Instructor. His Pine Mountain Group CNA Level I and II certification demonstrates his vendor neutral approach to network design, support and implementations. Tony has architected, installed and supported various types of Residential Wireless High Speed as well as hundreds of WIFI hotspots. Tony uses a variety of technologies from Powerline, Wireless and wired technologies to find the most cost-efficient and reliable solution for his customers. Tony combines custom programs, open source and commercial software to ensure a simple support infrastructure.



Tony walks you through the basics of starting an application baseline or comparison.



Continue viewing other LoveMyTool "Open Source Tools University (OSTU)" lectures »

op5 Monitor 5.0 beta release date is set


op5s acclaimed network monitoring software op5 Monitor is getting a hughe overhaul in the up coming 5.0 release. A brand new PHP GUI based on the Ninja project and the database backend from the Merlin project. The new gui will unlock the true potential of the system with the widget based tactical overview where the user can customize the data presentation after liking and need. Other goodies are the google map integration in the Nagvis data visulatization module, perfect for the network command centers big screen.

You will be able to download the BETA release of op5 Monitor on February 26th.

Tactical Overview in new Ninja GUI

SLA report from op5 Monitor 5.0

SLA report from op5 Monitor 5.0

Report from op5 Monitor 5.0

Report

Planet Network Management Highlights 2010 Week 5

Highlights from Planet Network Management + Planet Sys Admin for Week 5.


SLA reporting with Intellipool Network Monitor

(Note: this article applies to INM version 4.0.5233 and later)

INM makes it easy to generate SLA reports on any level, wether it be a whole network
or a single object. A default 'Availability report' is included with the INM installation, so let's look at how we can tweak this report
to fit your needs.

The Availability report is a template, and as with any report template in INM it's possible to
apply it in any context. First, let's run this report on a single network.

1) select the network from the relevant view
2) select the 'View report' command and then the Availability report to generate it.



As we can see, INM reports individual Up and Downtime on each device in the network,
as well as reporting the averages for the entire network. By default, INM calculates
the average Up and Downtime for each individual entry but this can be changed to the sum if required.

Now let's modify the report template, to let INM break down the statistics even further and
report statistics on each individual monitor.

1) click on the Availability report in the Report templates view
2) click on the edit icon for the Downtime report item
3) select the "Report downtime for monitors" option
4) we're going to uncheck the 'Include monitors with no downtime in the report' option to only list the monitors that were
down during the period.

Let's re-run the report on the Fileserver object only this time.



In this case, it was a Disk time monitor that contributed to the downtime in the Fileserver object.

On some occasions you want to generate a SLA report that only takes data within a specific time interval into account.
INM can do this as well.

1) Open the properties for the Downtime report item again, and specify a time period in the
Time limit textboxes. Now the report will only include statistics that lies within this time interval for each day.

There is one more interesting option, and that is to only base the SLA report on specific monitor types.

1) Again, open the properties for the DOwntime report item.
2) In the advanced section, select monitor types from the 'Monitor limit' box and add those to be included.
Only statistical data based on the selected monitor types will be included in the final report.

Finishing up, you may want to setup a scheduled event to run the SLA report periodically.
Full details on how to setup a scheduled event can be found in the online manual, located here:
http://www.intellipool.se/4_0/doc/generate_a_report.htm

Afterbytes with Marcus Ranum - Data Leakage

BERLIN/ZURICH (Reuters) - A Swiss lawmaker likened German attempts to buy data on cross-border tax evaders to bank robbery on Tuesday and the Swiss banking lobby said Berlin was acting as a receiver of stolen goods. Reference: Swiss lawmaker accuses... Marcus J. Ranum

Video Tutorials from Paessler support team available!

Members of our support team created two nice video tutorials to make it even easier for you to start monitoring your network with PRTG Network Monitor. Lean back a few minutes and learn how to install our software and let PRTG do all the work by using the built-in auto-discovery function which searches for and adds your network devices automatically. Another tutorial shows how you can set up different notifications in PRTG to keep you informed whenever there's something special going on in your network – be notified via email, ICQ, SMS text message and more. Check out the videos at our support website.

Want to Have a Smooth Running Application? Architect Your Tools Deliberately

In Oracle Unified Methods, the elaboration phase follows the inception phase of the project. This is the time when detailed analysis is done and key design decisions get flushed out. Traditionally, the focus of this phase is on coming up with detailed application functional design, especially the user interface, the data model, the means of integrating with other applications and data sources, or even the technical architecture of the deployment, etc... However, the same vigor is often not applied to the tools that are needed to manage the applications. This is very different from other complex engineering endeavors such as automotive and aerospace design, in which far more thoughts are put into the dashboards and the avionics. Using the right set of tools and implementing the tools properly are important to successful application projects.

Several deliberate decisions need to be made in tool selection for managing applications. The first one is whether to build home grown tools or buy packaged products. Some people prefer to build their own tools, but it is a difficult effort to sustain in the long run. Developing tools is like developing any software. To do it properly, they need to be properly designed, implemented, tested and maintained over time, which get expensive. Instead of building tools from scratch, most organizations opt to reuse something that they already have, which in many cases are generic management tools that were originally designed to manage servers or networks. The problem here is that a fair amount of effort is still needed to adapt these tools to manage applications, and they always provide only generic functionalities that do not address the real needs of managing applications. A better thing to do is to use tools that are designed specifically for the job of managing specific applications, while maintaining a balance of avoiding tool proliferation. I wrote about the topic of tool selection in greater depth in article last year. Click here if you want the details.

Besides getting the application management tools, it it important to design the tools to be an integral part of the runtime environment and allocate capacity to run them. Many people treat tools as an overhead. If you go by the definition that tools do not perform any actual processing of business transactions, then it is indeed an overhead. However, tools form a critical part of an application infrastructure. Without tools and the instrumentation to collect management data, the application becomes a black box that cannot be managed. No one would design an aircraft without proper avionics, and the same thing should apply to tools also.

Another set of decisions are related to the deployment architecture of the tools. The architecture needs to be designed deliberately with a similar level of care taken to design the deployment architecture of the applications, especially if the tool will be used to manage a complex application environment. In fact, there are similarities in designing tools deployment architecture and application deployment architecture. For example, one has to decide between centralized single instance tool deployment vs. multi-instance deployment of tools such as Oracle Enterprise Manager, just like one has to decide how many production application instances to deploy. This sort of decision is highly environment specific.

Oracle Enterprise Manager Grid Control, by building on Oracle Fusion Middleware and Oracle Database, allows the tool to scale both horizontally and vertically. Therefore, from a technical perspective, a single instance of Oracle Enterprise Manager can scale to manage thousands of applications, database, and servers targets spanning development, testing and production environments. However, some organizations may still want to deploy multiple instances so that different units within the organizations can maintain control over their own instances of Enterprise Manager in order to maximize control and flexibility. Others may want total separation between Enterprise Manager instances used to manage pre-production vs. production environments in order to maximize security. The final decision needs to be made based on not only technical factors, but also organization and other considerations.
Whether you are going to have a single or multi-instance Oracle Enterprise Manager Grid Control deployment, you still need to make sure that you set up at least one separate test instance of the tool. Before you roll a version of Oracle Enterprise Manager into production use, for example, you should have it tested in order to minimize any surprise.

Another potential decision is to decide whether high availability (HA) deployment is needed for the tools. Just like a pilot cannot fly with non-working avionics, it is virtually impossible for administrators to manage their applications effectively if the tools are not available. Some management tools support high availability deployment. For example, Oracle Enterprise Manager Grid Control servers, known as Oracle Management Service (OMS), can be set up to run in a clustered configuration or even a multi-site clustered configuration. The underlying Oracle Database repository can be made highly available by leveraging Oracle Real Application Cluster (RAC) and data guard technologies. More information about HA Oracle Enterprise Manager Grid Control implementation is spelled out in Oracle Maximum Availability Architecture guidelines.



Related Articles

- Want to Have Smooth Running Applications? Start with Good Planning.
- Building Application Management into Your Capacity Plan
- People, Process, Technology – The Right Tool

Mark Burgess FLOSS Interview

Mark Burgess was interviewed (with a sore throat) on FLOSS weekly.

Cfengine raising the profile of system administrators

A new feature in the Cfengine Community Core is attracting some interest from system administrators. It is the simplest of ideas, but then such ideas are often the best.

New Board Members

Moving into a new year, the privately owned and funded Cfengine company has changed its board to include some power members of the Free and Open Software community. "The time has come to change the style of our board work, as we move into a different phase of growth," says CEO Thomas Ryd.

Marketing, now Sales? WTF?

The OpenNMS Group has finally moved into double digit employee numbers with the hiring of Brad Miesner as our Vice President of Sales.

I know what you’re thinking – a sales guy? Earlier you post that you hired some folks to do marketing, and now you hire a sales guy?



First, let me point out that I’ve known Brad for over ten years and he started off in a technical role. So he’s not just some guy with no network management knowledge who’s going to pester people to spend money.

Second, interest in OpenNMS has grown to the point that it can be difficult for us to handle, in a timely manner, requests for information about our services. I always focus on our existing customers first, sometimes to the detriment of potential clients, but Brad will insure that our future clients receive the attention they deserve.

But most importantly Brad will have the role of “customer satisfaction manager”. We tend to build close relationships with our clients, and if we should happen to drop the ball, these clients might be a little hesitant to complain directly to the people at OpenNMS with whom they are working. Brad will proactively be in touch with all of our partners to insure that we’re providing the best service we can, and if there are ways we can improve, it is hoped he will hear about them.

Brad comes to us from Network Appliance, voted by Fortune Magazine as the number one “Best Place to Work” in 2009. He was doing really well there, and I think his decision to join our band of open source revolutionaries speaks well for both the company and our future sales prospects.

We are extremely happy to have Brad join our team.

Oh, at one time he worked for a little software company called Zenoss, but I think he’ll quickly adjust to working in open source.

(grin)

Vote for the winner of the T-Shirt tagline contest

Last month we kicked off a contest to design the new tagline for our next t-shirt and now is the time for you to tell us who should win. We received a bunch of great entries and have narrowed it down to the following 8 finalists. Please vote for your favorite and decide the winner.

The person who submitted the winning entry will receive a BugBundle from Buglabs. It’s in your hands to determine who wins.



Running Wireshark as You

Running Wireshark on Linux involves an interesting challenge1: Capturing packets requires root access, but Wireshark is big program and we strongly recommend against running it with elevated privileges. On Linux it’s common to see Wireshark running as root, but this is nearly unheard for similarly-sized applications like Firefox and GIMP. How can we avoid running Wireshark as root?

A good way

Notice how I said “capturing packets requires root” above? Here’s a secret — Wireshark doesn’t capture packets. A separate program called dumpcap does. Compared to Wireshark, dumpcap is tiny. It’s much less complex and much safer to run as root. We can make it so that dumpcap runs as root and that only users in a particular group can run it:

$ sudo -s
# groupadd -g packetcapture
# usermod -a -G packetcapture gerald
# chgrp packetcapture /usr/bin/dumpcap
# chmod 4750 /usr/bin/dumpcap

A better way

It’s also possible to let dumpcap do its job without involving root access at all. For a very long time Linux has allowed the use of fine-grained permissions called capabilities. In many recent distributions you can use the setcap utility to add capabilities to individual files.

Dumpcap needs CAP_NET_RAW and CAP_NET_ADMIN, so what do we need to feed setcap? On my Ubuntu Karmic system the setcap man page points you to cap_from_text. Cap_from_text points you to _cap_names, an array in the kernel. It would be nice if the setcap man page included a list of capability names along with a few examples. As it turns out, the names need to be in lower-case.

$ sudo -s
# sudo apt-get install libcap2-bin
# groupadd -g packetcapture
# usermod -a -G packetcapture gerald
# chmod 4750 /usr/bin/dumpcap
# setcap cap_net_raw,cap_net_admin=eip /usr/bin/dumpcap

Fully-functional filesystem capabilities is something the Linux world has needed for a very long time. I’m glad they’re finally seeing wide deployment.

1. This is a problem on other systems too, but it’s usually easier to solve. On Windows you can run the NPF service at startup. On OS X you can use ChmodBPF.

Free Tool and Free Tips for Network Configuration Management (by Josh Stephens)

Josh Stephens Nav_logo Author Profile - Josh Stephens is the Head Geek and VP of technology at SolarWinds, a leading provider of network management software based in Austin Texas. Josh has extensive experience in network management systems, network engineering, and software development. His 15-plus years of experience in technology include designing and deploying advanced networks and network management systems within organizations including the US Air Force, Sprint, MCI/UUNET, and WalMart. He has received several industry certifications including those from Cisco Systems, Microsoft, and HP.


Tab1_content_configGen

Opening – by Tim O’Neill - I am always on the lookout for cool tools for the Lovemytool readers, especially when they are FREE!

I had recently received an announcement of this new tool that could save you some of your valuable time. Plus as I get older I realize that one simply cannot remember everything including Configurations and I HATE Scripting ... It seems to me that as advanced as we are, everything should be visualized and controlled by simple, traceable, change management focused, safe and secure GUI’s. Well, that is not the case so here is a sweet tool to help every one of you command config people save some time and burning up those “little grey cells”!

I remember meeting the Younce brothers at Walmart when I was with the original Network General Team as the CTM – That is “Chief Trouble Maker”. Then they started Solarwinds and WOW has it come far and done well. I had used their very cool and advanced Engineering Tools for years, until I lost my license.

Well back to the subject - Check out this new Free Tool and look over their exciting Orion technology. I have often wondered why companies do not give away more solutions to prove that their Technology is the best and that they want to be part of THE SOLUTION?

Many Thanks to SolarWinds and Josh –

I wish you less Stress and More Success - Oldcommguy



Josh Stephens was kind enough to write some tips and the following description of the New Free Tool -

Engineers spend a fair amount of time manually typing in command line interface (CLI) commands to configure network devices. Common tasks like configuring VLANs and enabling NetFlow require them to develop detailed scripts specific to their environment. While this is not a difficult task, the command can be easily forgotten and is a pain to rebuild. It’s just one of those “there’s got to be a better way” tasks. Network engineers are already a time-pressed bunch, so despite our love for all things CLI, SolarWinds took a look at the problem and came up with a solution.

We built a free tool called Network Config Generator to allow engineers to create network configuration change templates that they can apply to any CLI-based network device. With Network Config Generator, network engineers can create a change template to re-use for future configurations, saving time and frustration as well as ensuring consistency in configurations across devices. They can download new change templates created by the community and share their own, all without leaving the tool.

One common example is configuring NetFlow on a Cisco ASA. While this is a standard configuration change, there are a few variables that change depending on your environment. Here’s a quick walkthrough of how you might configure this change using our new free tool:

  1. Download and install a free copy of SolarWinds Network Config Generator.

  2. For additional content that SolarWinds’ community members have already created, visit www.thwack.com. As a thwack member, config change templates can be automatically imported and exported directly from the tool.

  3. The config change template for enabling NetFlow on Cisco ASA devices is already included out of the box.

  4. The tool will prompt you for a few inputs specific to your environment, such as the target device’s IP address and community string as well as your NetFlow collector’s IP address and export port.

  5. Network Config Generator will automatically generate a config specific to your environment that enables NetFlow on the Cisco ASA that you specified. Simply copy and paste the output config into your favorite CLI client to execute the change and enable NetFlow on your Cisco ASA device.

  6. Check out the following video explaining the process:



For novice users, Network Config Generator simplifies advanced configuration change tasks by leveraging community-generated templates in a step-by-step GUI. These users can quickly enable advanced network device features, configure VLANs and change interface descriptions with just a few clicks of the mouse. We hope that some of the more advanced users will continue to create and share more templates to help the junior community.

For advanced users, this is a great tool to build a few common templates that you can save and execute on specific devices. The tool also works with SolarWinds Orion Network Configuration Manager (NCM) if you want to execute changes on multiple devices simultaneously.

Happy Config Generating,


Nav_logo

Improving Network Discovery by using SNMP OID Include/Excludes

An issue that frequently comes up for IT managers is the need to find only certain types of devices within a heterogeneous network that contains many types and manufacturers of networked devices. I recently worked with a customer that wanted to locate about a hundred Windows Servers from a network that contained several thousand devices.

One way to approach this task is to discover all the devices, then pick out the ones you are looking for, the old needle in the haystack routine. This approach is time consuming and error prone. A better method is to leverage the information available from devices that support the SNMP protocol, which includes most operating systems. SNMP includes an object library of OIDs (Object Identifiers) that are set up by each manufacturer. A Google search for “Windows OIDs” found this site which listed the OIDs to identify Microsoft Server Operating Systems.

As you can see (table below), the OIDs are built in a hierarchy so, if I could search my network for servers which contain the OIDs below for workstations, servers and domain controllers, I should find all my Windows Server boxes.








You can make the difficult task of finding and sorting networked devices much more manageable. I use dopplerVUE, a network management tool that simplifies the whole process and helps find the needle in the haystack faster and without issues.

dopplerVUE provides an OID include/exclude discovery feature that makes it easy to accomplish this task.

Here are some steps for using dopplerVUE to improve the network discovery process. To get started the server must have the SNMP agent service running and you need the credentials (called a community string) to enter in the SNMP service “security” tab. Most servers use “public” as a default and are case sensitive. SNMP service is usually turned off by default, so you’ll need to restart the service when you are done making changes.


Once you have the servers set up, you should create a discovery job within dopplerVUE to find the Windows Servers. dopplerVUE provides a discovery wizard that guides you through the step by step process as follows:


Step 1: Select a discovery method appropriate to the task. Use an IP address range that provides the most control over your discovery results.


Step 2: Set an IP range that includes the Windows Servers you are looking for in your search. Be careful, the larger the range you select, the longer it will take to complete the discovery.


Step 3: Select SNMP protocol.


Step 4: Enter the community strings for the servers. Your admin can provide these and you can always use public which is set as default on most servers.


Pictured below is a tab marked “Show sysObjectID include/exclude options”. You can click on the tab, expand the Window and then select “include”. You can then enter the OIDs we found earlier.
















Step 5: In the workstation column you’ll want to select SNMP poller and then Host MIB if you want to collect information about processor utilization, memory usage and disk space.



Step 6: Optional: Enter a name and description for this discovery job.


Now you can click finish and go to the Inventory>Discovery Jobs tab to watch the progress of the task. The job will start automatically assuming your dopplerVUE discovery service is running and you had the “run now” checkbox selected in step 6. If not, click on the job and start it.


You can watch the progress in the job details section and keep an eye on your inventory tab to see if new devices are being found. When new devices are found, they should appear in the workstation classification. You can change classifications or create new ones easily by right clicking on the objects in the workstation classification list.


This technique works for any search where you can separate the devices by manufacturer. Since each manufacturer determines how they want to build their SNMP library, you’ll need to understand how they created their hierarchy. Fortunately there is a lot of good information available on manufacturer websites to help you. Here is more information about SNMP support within Windows.


If you’re looking to improve network discovery and automate IT tasks to save time, try dopplerVUE for free for 30 days.

3 Questions with Zenoss CEO Bill Karpovich

This past weekend at OpsCamp, DTO Solutions President and sponsor of Control Tier, Damon Edwards took the opportunity to do a impromptu interview with Bill Karpovich CEO of Zenoss on the Dev2Ops Blog.

 

Damon asks three questions, see what Bill had to say:

 

It’s Poll Time Again: Linuxquestions.org

Linuxquestions.org is running a poll on the best open source projects and there is a network monitoring application category.

If you like OpenNMS, we’d appreciate your vote

SQL Injections: The Splunk Method for Auditing Your Application Security Model

Unless you have had your head in the sand, SQL Injections have made a fierce comeback to the top of the threat vector charts this year. According to the WHID (Web Hacking Incidents Database), SQL injection is still king of the attack vectors, accounting for 19 percent of attacks, followed by authentication abuse (11 percent), content spoofing (10 percent), DDoS/brute force (10 percent), configuration/admin error (8 percent), cross-site scripting (8 percent), cross-site request forgery (5 percent), DNS highjacking (5 percent), and worms (3 percent).

Reflect on the recent increase in compliance legislation requiring businesses to provide dynamic data access to customers for banking, healthcare, or the influx simple purchases on the web, and the concern may be scarier for all of us. Recently, Dark Reading reported on the number of companies who have been compromised through SQL Injection attacks.

What is SQL Injection, and How Does it Work?

If you don’t know what this is, and just learned what SQL is, I recommend going to OWASP.org and reading up a little. It is a great resource, and the mass amount of security professionals dedicated to the Open Web Application Security Project deserve a big shout out.

Lets start with how SQL Injection actually works. SQL Injection occurs when an attacker is able to insert a series of SQL statements into a ‘query’ by manipulating a data input, usually a form for users to update their account information. Some common relational database management systems that use SQL are: Oracle, MSSQL Server, DB2, Sybase, Informix, MS Access, Ingres, and so on, with the most popular being MSSQL of those.

Whether you are a potential attacker, auditor, researcher or an application developer, you may go through the same steps to exploit or find exploitable code:

  1. Input Validation
  2. Information Gathering
  3. 1=1 attacks
  4. Data extraction
  5. OS interaction
  6. OS Cmd Prompt
  7. Expand influence

More information available at OWASP (Victor Chapela, OWASP, “Advanced Topics on SQL Injection Protection”)

Splunk and SQL Injections

Splunk approaches this attack a little differently because of our ability to make all IT data security-relevant. Within the Splunk index, organizations will collect logs, custom application logs, traps, configurations, stack traces, scripted outputs, auth data and metrics for analysis. Splunk can help in applying security/audit logic in various detective controls to aggregate IT data to one place, make simple sense of the data, apply relationship logic into what might appear to be a standard operational issue. This logic can give you a “report”, alert, dashboard, e-mail, run a script to gather more information, or simply create a news feed of the ongoing event to send to a ticketing system, incident software or other system. Not just designed for tier 1 troubleshooting, Splunk can help incident handlers and analysts backtrack events by digging into logs across geographies, datacenters, applications and technologies. Incident handlers, auditors, security professionals can then persist the same logic in a “search” to identify the next occurrence, proactively defending sensitive assets.

How Does Splunk Do This?

Well, for this case, lets use the Splunk Security application, Enterprise Security Suite (ESS). ESS enables simple searching to illuminate information in the muddy, challenging environment of security and operational data accessed by more people than just security androids and SysAdmins. I use it to help organize security data into categorical security areas: Access Protection, Endpoint Protection, Network Protection, Incident Response, and Governance.

You Already Have the Answers: In Your IT Data

The same Splunk rules still apply, you have to put your data in, to get good information out, so we need a few key pieces of data to find a SQL Injection – some more damning than others – to identify what starts out as an operational issue, but turns into a security investigation.

IDS Logs and Events

Though there are many methods of subterfuge to avoid IDS/IPS detection of the 1=1 statement, getting a look at the application data in a purported attack via a Snort/Cisco/Juniper alert, is very helpful as part of a correlated event. SQL injection may include of logic, depending on the input validation, a ‘;’ may help; seeing JOIN or UNION statements may also be indicators of misuse.

Packet Capture

Always good, certainly looking at application data in the packet with SQL statements is going to be helpful. Thing is, often times, database replication, linked databases, etc. are all capable of using HTTP as the transport protocol, so be advised- this could be a lot of data, and it may be legitimate. Alerting on these events in Splunk would let you execute a script to trigger TCPdump or something based on an event, if the Splunk instance is enabled with tcpdump.

Vulnerability Assessment Tools

Nessus events, or other audit tools, can help qualify the actual threat of the injection language based on the type of systems you are protecting. MSSQL statements are more forgiving than say, Informix for example, and if you are a UNIX shop, MSSQL attacks do not pose a risk, though this may mean some interrogative work.

Anti-virus

This is handy should malicious code be dropped, downloaded and/or propagated via SQL language over HTTP, FTP, SSH or other file transfer protocols. When an event turns off AV, or a failure occurs after a noticed injection, there should be concern as to the sanctity of the system it failed on.

Host Data

Perhaps the application server and database servers have file integrity monitoring, maybe a scripted output of binaries such as top, psstat, or in windows, netstat and ipconfig? Looking at a new listening service you didn’t install, may be after the fact, but at least identifiable with Splunk. If you happen to have something like OSSEC installed, or another kernel monitoring software, perfect. An example SQL Injection provided by OSSEC, looks like this:

200.96.104.241 - - [12/Sep/2009:09:44:28 -0300] "GET /modules.php?name=Downloads&d_op=modifydownloadrequest&%20lid=-1%20UNION%20SELECT%200,username,user_id,user_password,name,%20user_email,user_level,0,0%20FROM%20nuke_users HTTP/1.1" 200 9918 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"

note the UNION statement followed by %20SELECT%200. This is the beginning of pulling all the successful users from PHPnuke site, and getting table data on all of their usernames, IDs, passwords, and e-mails. If you aren’t careful about your web application accounts all being the same, you could be the subject of more than a twitter DDoS.

Application Logs

Web server logs, depending on verbosity, this may be a good starting point to determine the source/destination centered around IP traffic, as well as success/failure and some explanation of why. Nothing special about these logs, we use them for operations and security. Several types of HTTP status errors can be helpful in determining what is going on – for instance, a 403 or a 401 error on a customer facing application.

User Audit Log

Finding failed and successful authentication attempts on a local system is extremely helpful- especially around the time something changed in our environment. Make sure to note, actual injection code may be executed on the application tier or webhost, rather than the database server itself. Detective controls should apply to the entire transaction architecture.

MSSQL Error Logs

Error logs provide some good information for Splunking account access, and database errors especially after isolating the application error occurring in the transaction model. To add the error log on a forwarder as an input:

[source::...MSSQL\\LOG\\ERRORLOG]
CHARSET = UTF16-LE
NO_BINARY_CHECK = true

(Make sure to do this on the indexer as well)

The most valuable source of information is actual data across the wire. You are using tcpdump, or wireshark, monitoring the SQL server (if you don’t, you should at least have the capability should there be a threat). If extended procedures like “xp_cmdshell” are being executed they may actually be logged when invoked. Xp_cmdshell enables a virtual cmd shell within an SQL statement. Maybe you have Windows event logging, and registry settings baselined, and a profile of your SQL server’s persisted connections base lined so you can apply a diff to them frequently, uncovering new network connections, and services alike. Similarly, user logon/logoff events to the operating system and application, both successful and denied, are handy and can indicate if a system account seems to persist failure a-la brute force attack. We may also see a large amount of auth attempts to SQL server, or the database through the windows Application log as well.

Determining Contextual Relevance

The same information has different context depending on who is looking at it, what it is used for, and how it relates to other data, what we call contextual relevance. In Security, almost every technology is interrelated in either debunking a threat, or validating it. We use Splunk to help solve Operations availability problems, Information Security problems, Compliance requirements, both on the fly and proactively, bringing these problems to light with alerting and notification.

Any of these events may show up, though logging may be turned off/system clock reset given a system may have been compromised with a Splunk forwarder. The stopping of event data in logs is also telltale, so be sure to no alerts=immediate concern in Boolean search, and persist a search that alerts when no events appear.

Input validation is really the first way to stop the problem in development, but stored procedures can be modified to add a regular expression for a credit card number, an SSN, or other fields. Preventing the additive SQL command in the input space of a form, stops the problem ahead of time, so does fuzzing before code is live, but secure coding is another chapter.

Some of the information available in Splunk will also allow the operations personnel or administrators to go back and audit configuration files, to see if DBerrors are being thrown to the user’s screen enabling enumeration/info gathering. Even if the code cannot be pulled from production, enumerating poor SQL input validation can be more difficult if error reporting is turned off.

For instance, is your config file configured to throw exceptions on that server? Have a look in Splunk. Take HTTP status errors for example a 403 Forbidden error status code, this may be something that gives valuable information to a potential attacker. 403 errors may be something you don’t want to show folks, unless in a development environment, so this may suggest turning off error responses if you see them in Splunk. At the same time, this generates an event beneficial to the application owner, but maybe doesn’t need to be broadcast. Users should see 404 errors when error notification is disabled, rather than a 403 error. When in doubt, look in Splunk! Maybe a quick search for all 403 errors may let you know there is an example of a potential SQL Injection occurring.

Once you find a 403 occurrence, or a bunch of failed account logins to a database, or an application server unavailable (503), use Splunk to span a 15-minute window by dragging the search range to the 15 minutes around the error. Then, look for all events during that time, on the host that may have hosted the error event. Perhaps a system has been compromised, and credentialed access may allow downloading/uploading?

Look for failed and successful authentications to the same system. Look for escalated account privileges. Splunk allows you to run a “System Profiler” as well to look at things like the change in listening services and ports on the Network, and things like infections or the state of anti-virus from the Endpoint Protection dashboard. When you find the behavior in all of the layers of events that correlate the event, create a search to populate a summary index as a “notable event” appearing in the Security Posture dashboard, and create an alert for it. Next time, Splunk users and Incident Handlers alike will know without re-creating the on the fly search we just did.

Splunk Helps Make the Seemingly Unrelated – Related

What does all this mean? Simply this- simple users can apply complex concepts from experts in the field, to search, detect, alert and report on security threats like SQL Injections, using Splunk. ESS allows a wide variety of users to look at the same information and derive different conclusions based on contextual relevance. Most real security risks, don’t flash “SQL INJECTION ATTACK” in your dashboard, you need to understand your environment and what pieces work together. Given the increased frequency of Web application threats, specifically SQL Injection, identifying the threat itself depends on multiple layers of security as well as a way to simply search through all of those layers simultaneously. Found an IP of a suspected system? Splunk it. Found an error for authentication to a back-end system? Splunk it. Can’t find the relationship between events in a given period of time? Use Splunk to make the seemingly unrelated, correlated.

For more information on how to monitor your MSSQL-driven application using Splunk, drop us a line: info@splunk.com

Microsoft has provided their own suggestions as well. http://msdn.microsoft.com/en-us/library/ms998271.aspx

Afterbytes with Marcus Ranum - Russian Stealth Fighters

Moscow, Russia (CNN) -- Russia tested its fifth-generation Sukhoi fighter jet in the Russian Far East on Friday. The plane, provisionally called T-50, is the country's first fighter jet based on the stealth technology and is viewed by military experts... Marcus J. Ranum

ntop.org Accredited as Endace Technology Partner

We’re proud to announce that ntop.org has been accredited as Endace technology partner as recognition for ntop contribution in the open-source world and also as guarantee for Endace customers that products such as ntop and nProbe run smoothly (and faster) on Endace DAG cards.

Be successful with Splunk in about an hour…

Here’s a document that can get you analyzing real data and making real charts, in about an hour or two…

Dive into Splunk

Feedback really, really appreciated.

Splunk Reports

Reports you could be making in about an hour!

Capacity, Customization, Interface, All Enhanced in Capsa Network Analyzer 7.1

Colasoft Announced the Release of Capsa Network Analyzer 7.1 FOR IMMEDIATE RELEASE: 2/2/2010 Contact Information: Jane Hu Email: jane.hu@colasoft.com Tel: +86 28-8512-0922 Website: http://www.colasoft.com Chengdu, China – Feb 4, 2010 – Colasoft, an innovative provider of all-in-one and easy-to-use network analyzer software, today announced the newest version of its flagship product- Capsa Network Analyzer. Version 7.1 is based on the second-generation Colasoft [...] 0

Dear Lazyweb: jQuery help

We launched a new OpenNMS Group website this week and I am having a small problem. On the home page we have a jQuery script called Crossslide that rotates some pictures in the banner:

It works fine on Firefox, Chrome, Safari and IE8. It doesn’t work on IE7 and I have absolutely no idea why. I’ve reformatted the code, used both relative and explicit paths, and … nothing. No errors either.

I have no experience debugging Javascript issues within IE, so if you can help I would appreciate it.

Using Event-Driven Analytics to Improve Knowledge Workers' Processes (by Mike Darrish)

Oc_logoMike Darrish 20070324 brighterAuthor Profile - Mike Darrish is a certified Lean Six Sigma Black Belt and Industry Specialist at OpenConnect Systems, where he works on teams delivering process improvement in large enterprises. Prior to OpenConnect, Mike worked for over 25 years in large enterprises and startups providing IP connectivity, network management, IT performance improvement and financial services. His roles have included programmer, systems administrator, network analyst, instructor, technical and management consultant, and a variety of sales support and sales management positions. Mike lives in the Atlanta, GA area.


OC_HQ_image


World class methodologies such as Lean and Six Sigma are well known, proven approaches to process improvement in the manufacturing sector of the US economy. As a result, according to the US government's Bureau of Labor Statistics, manufacturing productivity doubled between 1987 and 2007. Since these methodologies were adopted much later in the services sector, productivity there increased only 22% in the same time period. See Figure 1.




Image003

Figure 1 - BLS Productivity by Sector index 1987=100


One area of process improvement, for knowledge workers, used here to mean people using networked computers and other digital systems, performing financial services activities, customer support, IT services and the like, has been a particularly intractable problem. In the past, a common approach to improving these processes has been to send consultants out with clipboards and stopwatches, stand behind the user and record activities, timing and process data, and then, back in the office, to draw a process map, perhaps capturing and printing out a few screen shots for process documentation. There are a number of problems with this approach. First, users at a work station behave differently when they are actively being watched. While watched, these workers tend to be more productive and typically don't engage in personal activities like surfing the web or chatting with co-workers like they might otherwise do. Then, when the observation period is over, they generally lapse back to normal behavior. This phenomenon, which presents an inaccurate view of the normal process, is called the Hawthorne effect, after a landmark productivity study conducted at the Cicero, Illinois Western Electric Hawthorne plant in the 1920's. Second, and more importantly, the amount of data observers can gather is a tiny fraction of the total computer and network-based activity that goes on in a large enterprise, leading to an inaccurate process model, lacking the details necessary for process improvement with the well known approaches mentioned above.

In recent years, several vendors in the Business Process Management space have used software to collect data from applications and networks as a more robust input to process mapping. Inputs have typically been extracts of database or other enterprise application logs. This approach is more accurate and more scalable than human based observation, but is dependent on the quality of the logs, which are generally not very granular and have the disadvantage of not having user screen captures for detailed analysis of user behavior. Additionally, each application has its own log format and content, making for a difficult data collection and analysis task.

Achieving breakthrough levels of process improvement requires both very granular and scalable data collection and storage as well as world class process improvement methodologies. While collecting process data, one must also avoid materially changing the performance characteristics of the process under study and the enterprise infrastructure. Typically it is not feasible to ask an IT department to retrofit applications with performance metrics and reporting. Ideally, then, one would use passive network taps and/or switch or router based constructs like Cisco VACL Capture or mirror ports. These collection points are realtively inexpensive, non-invasive and, for a single process under study, even in the largest enterprises, can generally can be collocated near the server farm in a small number data centers. Given government regulations such as HIPAA security, and the need to protect financial transactions, the system must encrypt the collected data in place and in real time. In large enterprises, such as healthcare insurance companies, which process upwards of 100 million claims per year, there are several gigabytes per second of data to manage, requiring powerful servers and fast storage. There is also one broad case where traffic across the network, unlike web oriented applications or the still-common dumb terminal emulation, does not accurately reflect the user's behavior. That is the fat client, where an application on a PC does significant work. For fat client applications, a distributed agent with a small footprint and a low reporting traffic rate can generally capture screen shots and field changes, forwarding the information to a central server for storage and later analysis. This approach has proven economical even in VPN-based home office workers, thanks to the widespread availability of broadband data services in most homes.

As we know from Lean and Six Sigma, in order to improve the process, one must measure its key performance indicators (KPIs), including items like user think time, system time, key data inputs and granular transformation procedures. The data that is collected as described above is the key input to a business process discovery. Once we discover and analyze the process sufficiently, we generally have sufficient information to do root cause analysis of problems, hypothesize the changes needed to improve the process, do experiments to better understand the main factors and their interactions and to measure changes to the process as we make them. Since we now have powerful computers at our disposal, it is possible to automatically map the collected data into a business process along with timestamps, userids and other user interactions, as shown in Figure 2. Having large volumes (empirical) data available, rather than the usual anecdotal data and estimates regarding the process, gives us the ability to analyze it scientifically, to graph metrics like end user response time, identify paths users took through the process, analyze the status of key indicators at various times and places in the process, whether straight-through, as we we'd like, or rather, down exception paths. We can also analyze how often and why the long tails of exceptions take place, then relate user and process behavior back to business rules for improvements.


Customer Process

Figure 2- Process Map and Analysis


Process metrics don't just materialize and processes have no pre-existing context, unless they were originally imbued with BPMN or similar metadata. In 2010, the vast majority of processes in use were built before BPM standards gained traction, so it is necessary for business subject matter experts to label the discovered activities (applications) with names that reflect what they are called in everyday usage. The discovery engine can then generate meaningful event-based intelligence from transitions occurring in the monitored business process.

One approach to generating business events is commonly called screen scraping, though in reality it is a fairly sophisticated form of data analysis in its own right. One must be able to analyze the data moving between client and server, whether that data is from dumb terminals, web servers or whatever traverses the network. Then one must render the data in the same way that the target machines do, in order to recreate what the user saw and how he responded. See Figure 3 for an example of this kind of data analysis, recreating the user's experience for the business analyst and providing information needed for the second use of the original data, an analysis of user and system behavior. An additional requirement for breakthrough improvement is the need to analyze and report on process data without the restrictions of traditional Business Intelligence systems with pre-defined schema, event summarization and the subsequent restrictions on analysis and reporting. In the real world, these concepts and methods have been embodied in commercial, off the shelf software and put to use in healthcare insurance companies, improving processes like claims operations, as well as in banks to improve call center operations. By discovering and analyzing process intelligence, workforce intelligence and customer intelligence, enterprises save millions of dollars annually through identification and elimination of defects and waste, especially work in progress. Often, these enterprises write robots to automate some or all of the work previously done by humans, speeding up the process and freeing people up to do more sophisticated tasks and improving customer satisfaction.


Screen scrapes

Figure 3 - Recreating a User Session


About Comprehend and OpenConnect Systems

For more information about product improvement for knowledge workers, the reader is invited to visit the web site of the author's employer, OpenConnect Systems, to read about Comprehend, a product suite which implements the concepts described above and from which the screen shots are excerpted. OpenConnect Systems, based in Dallas, TX, delivers software and service based solutions focused on improving knowledge workers' business processes.


Oc_logo

3 new Cisco critical vulnerabilities

Recently, the The Cisco Product Security Incident Response Team (PSIRT) has published three important vulnerability advisories.

Multiple Vulnerabilities in Cisco Unified MeetingPlace
Multiple vulnerabilities exist in Cisco Unified MeetingPlace. This security advisory outlines the details of these vulnerabilities:

  • Insufficient validation of SQL commands
  • Unauthorized account creation
  • User and password enumeration in Cisco MeetingTime
  • Privilege escalation in Cisco MeetingTime

Vulnerable Products
Cisco Unified MeetingPlace versions 5, 6, and 7 are each affected by at least one of the vulnerabilities described in this document.

Details
This Security Advisory describes multiple distinct vulnerabilities in the MeetingPlace and MeetingTime products. These vulnerabilities are independent of each other.

  • Insufficient Validation of SQL Commands
    An unauthenticated user may be able to send SQL commands to manipulate the database that MeetingPlace uses to store information about server configuration, meetings, and users. These commands could be used to create, delete, or alter any of the information contained in the Cisco Unified MeetingPlace database.
  • Unauthorized Account Creation
    An unauthenticated user may be able to send a crafted URL to the internal interface of the Cisco Unified MeetingPlace web server to create a MeetingPlace user or administrator account.
  • User and Password Enumeration in Cisco MeetingTime
    The MeetingTime authentication sequence consists of a series of packets that are transmitted between the client and the Cisco Meeting Place Audio Server over TCP port 5001. An attacker may be able to alter the authentication sequence to access sensitive information in the user database including usernames and passwords.
  • Privilege Escalation in Cisco MeetingTime
    An attacker may be able to alter the packets in the MeetingTime authentication sequence to elevate the privileges of a normal user to an administrative user.

Impact
Successful exploitation of these vulnerabilities may result in a variety of conditions including: information disclosure, denial of service, privilege escalation, account creation, or alteration of configuration data.

Link: http://www.cisco.com/../advisory09186a0080b1490b.shtml

 

Cisco IOS XR Software SSH Denial of Service Vulnerability
The SSH server implementation in Cisco IOS XR Software contains a vulnerability that an unauthenticated, remote user could exploit to cause a denial of service condition. An attacker could trigger this vulnerability by sending a crafted SSH version 2 packet that may cause a new SSH connection handler process to crash. Repeated exploitation may cause each new SSH connection handler process to crash and lead to a significant amount of memory being consumed, which could introduce instability that may adversely impact other system functionality. During this event, the parent SSH daemon process will continue to function normally.

Vulnerable Products
This vulnerability affects Cisco IOS XR systems that are running an affected version of Cisco IOS XR Software and have the SSH server feature enabled.

Details
Cisco IOS XR Software is a member of the Cisco IOS Software family that uses a microkernel-based distributed operating system infrastructure. Cisco IOS XR Software runs on the Cisco CRS-1 Carrier Routing System, Cisco 12000 Series Routers, and Cisco ASR 9000 Series Aggregation Services Routers.

The SSH server implementation in Cisco IOS XR Software contains a vulnerability that an unauthenticated, remote user could exploit to cause a denial of service condition.

The vulnerability is triggered when a new SSH handler process handles a crafted SSH version 2 packet, which may cause the process to crash. During this event, a significant amount of memory may be consumed. Repeated exploitation may impact other system functionality, depending upon the size of the available memory and the duration of attack.

Although exploitation of this vulnerability does not require user authentication, the TCP three-way handshake must be completed, and some SSH protocol negotiation must occur.

Impact
Successful exploitation of the vulnerability described in this advisory could result in a crash of the SSH connection handler process. Repeated exploitation may impact other system functionality, depending upon the size of the available memory and the duration of attack.

Link: http://www.cisco.com/../advisory09186a0080b13512.shtml

 

CiscoWorks Internetwork Performance Monitor CORBA GIOP Overflow Vulnerability
CiscoWorks Internetwork Performance Monitor (IPM) versions 2.6 and earlier for Microsoft Windows operating systems contain a buffer overflow vulnerability that could allow a remote unauthenticated attacker to execute arbitrary code. There are no workarounds for this vulnerability.

Vulnerable Products
CiscoWorks IPM versions 2.6 and earlier for Windows operating systems are affected.

Details
CiscoWorks IPM is a troubleshooting application that gauges network response time and availability. CiscoWorks IPM is available as a component within the CiscoWorks LAN Management Solution (LMS) bundle. CiscoWorks IPM versions 2.6 and earlier for Windows contain a buffer overflow vulnerability when processing Common Object Request Broker Architecture (CORBA) GIOP requests. By sending a crafted CORBA GIOP request, a remote, unauthenticated attacker may be able to trigger the buffer overflow condition and execute arbitrary code with SYSTEM privileges on affected Windows systems. This vulnerability is documented in Cisco Bug ID CSCsv62350 and has been assigned the Common Vulnerabilities and Exposures (CVE) CVE-2010-0138.

Impact
Successful exploitation of the vulnerability may result in the ability to execute arbitrary code with SYSTEM privileges on affected Windows systems.

Link: http://www.cisco.com/../advisory09186a0080b1351d.shtml


© Fabio Semperboni for CiscoZine, 2010. | Permalink | No comment
Post tags: , ,

Opscode Announces John Willis as New Vice President of Training & Services

30-Year Systems Management Luminary to Lead Services Division of Fast Growing Infrastructure Automation Start-Up

SEATTLE, WA—(Marketwire – February 3, 2010) – Opscode, Inc., a cloud infrastructure automation company, today announced the appointment of John Willis as the company’s new Vice President of Training & Services. In this capacity, Willis will be responsible for leading the training, evangelism, and professional services functions within the company.

“John has done more to bridge the traditional Enterprise Systems Management space and the emerging Cloud Infrastructure world than anyone I know. He has a unique position in the industry and is widely respected for his deep technology, management, and market expertise. John will play a critical role in driving adoption of Chef and advancing our mission to bring infrastructure automation to the masses,” said Jesse Robbins, CEO and co-founder of Opscode. “It’s not every day someone of John’s caliber comes along. He has more than three decades working in the IT trenches managing complex infrastructures, John understands firsthand the Opscode vision and through his contributions, we will be well positioned to succeed in the market.”

Willis has worked in the IT management industry for more than 30 years. Prior to joining Opscode, Willis founded Gulf Breeze Software, an award winning IBM business partner, which specializes in deploying Tivoli technology for the enterprise. A leading infrastructure management architect in the enterprise systems management category, Willis has trained more than 10,000 people on IBM Tivoli products around the world and is recognized as an industry expert in enterprise systems management and monitoring. Willis has authored six IBM Redbooks for IBM on enterprise systems management and was the founder and chief architect at Chain Bridge Systems.

John is known internationally for his IT Management and Cloud blog, JohnMWillis.com, and is the co-host of Redmonk’s IT Management Guys podcast series as “Cloud Cafe”. Willis is also an organizer of the wildly successful CloudCamp & OpsCamp unconference movements.

“I’ve been a fan of Opscode ever since Chef was first released last January,” said John Willis, Vice President of Training & Services for Opscode. “Opscode has taken an entirely innovative approach to automating infrastructure by writing code rather than running commands. Because the cloud infrastructure of tomorrow will be in constant flux, a new generation of tools will be required to efficiently manage these incredibly dynamic environments. I’m thrilled to be working with some of the brightest minds in infrastructure to bring this vision to fruition.”

About Opscode

Opscode was co-founded in 2008 by Jesse Robbins and Adam Jacob, along with other notable technologists who have helped to shape today’s Cloud infrastructure. Opscode enables companies of all sizes to build and maintain scalable, fully automated infrastructure based on community best practices and expertise. The company is privately held and headquartered in Seattle’s historic Pioneer Square neighborhood. More information can be found at www.opscode.com

Press Contact:
Robert Nachbar
Kismet Communications
206-427-0389
email contact

Security 2010: When Resolutions Fail

We’re a few weeks into 2010 and I am interested in knowing how many of us have stuck to our overly optimistic New Year’s Resolutions. Has the gym membership already lapsed? Are you getting to work on time and not blaming fire, flood, or the family dog for your tardiness?

It is human nature to want to begin anew, to create something good from the chaos that has preceded us; however, it is also within our nature to unwittingly set ourselves up for failure, such as when we take on too many, too few, or overly ambitious resolutions.

When we fail to set ourselves up for success by establishing realistic goals, we unwittingly commit a form of human error. Similarly, within corporations worldwide, research has indicated that human error often leads to the largest and most expensive security breaches – and the margin for error tends to increase as the inadequacy of corporate security goals (or resolutions) grow.   

This is very evident in InfoWorld’s 2009 Data Breach Hall of Shame article, which looks back at five of the more notable breaches of the year. From the SQL injection attack at Heartland Payment Systems – one of the largest data breaches in history - to the lost hard drive at Health Net, companies continue to be felled by simple human errors, omissions, or lapses in judgment by company personnel. According to the 14th Annual CSI Computer Crime and Security Survey, over 65% of companies indicated that at least some of their monetary losses were directly attributable to non-malicious actions by company insiders. 

Over and over again, the literature points to companies characterizing security incidents arising from insiders threats as predominantly “accidental.” Only a small minority of companies believe that these threat incidents are deliberate. However, whether the threats are accidental or deliberate, the costs are still the same. A significant data breach could result in punitive regulatory actions, long-term litigation, expensive investigative and remediation efforts, competitive disadvantages, and most importantly, a potentially insurmountable loss of customer trust.

So, given human nature, what recourse does the security-conscious corporation have when “good people do bad things?” Certainly, any number of companies can and do develop, market, and deliver point solutions to help decrease the risk associated with insider threats, whether they are malicious or non-malicious. However, what we’re really seeing is that our customers are looking for an integrated security solution – one that protects sensitive data across the entire IT environment by hardening systems, controlling access to information, and managing changes.  

Good security integrates many components across a corporate infrastructure, and helps protect our most critical corporate assets from the most clandestine adversary of all – ourselves.

GIS (Geographic information system) when scifi is real

Hi.

I remember in the film Enemy of the state (in spanish “Enemigo público”) the computer systems that it was showing a Will Smith running across the city (but he wasn’t playing parkour).

This is the trailer of film for refreshing your mind:

Well, Pandora system is near near to make this thing…more or less ;) . We use to develop the javascript free library OpenLayers in the Console.

The next video you can see the demo screencast of early versión of GIS:

Splunk memory use patterns

From an operating-system perspective, splunk is a system of programs that work together to provide the utility that users experience. Each of these programs have their own memory use patterns, and having some idea of them is good for investigating memory exhaustion/performance problems, as well as resource planning.

The involved parties in the splunk memory picture are:

  • the operating system
  • splunkweb
  • splunkd

Programs launched by splunkd:

  • splunk-search
  • python search processors
  • splunk-optimize
  • scripted inputs such as wmi, imap, regmon, admon, vmware, imap, or your own customized/created agents
  • scripted alerts
  • scripted index management scripts (warmtocold, coldtofrozen)
  • scripted auth

Many of these (especially the scripts) are largely external to splunk, in that splunkd runs them as requested, but their resource consumption is up to third party authors, external designs, or external factors. The size of these tools will not be covered in great detail.

Operating system

The operating system is expected to provide an efficient data cache for splunk data files, including:

  • splunk binaries
  • web assets
  • config files
  • indexed data files
  • input log files
  • etc

Since memory access is several orders of magnitude faster than disk access, a healthy splunk system should have a significant of memory un-allocated by any process at most times. A good ballpark ratio is half or more of the ram free for cacheing purposes. A corrolary is your operating system should be making use of all your memory.

General memory info

When measuring the memory use of actual programs, always remember to review the real memory usage, not the virtual. Real memory usage is sometimes called “RSS”, “RSIZE”, “in core”. On windows the closest approximation is “Private working set”. This can be a bit misleading, as a system under hevy memory pressure will page out more of the memory allocated to programs. Therefore it’s best to first get a sense of overall system memory pressure before reviewing process sizes.

(There are other misleading factors — it’s generally a bad idea to measure dissimilar programs simply by RSIZE to guage their ‘bloat factor’. If you care about this sort of thing you might be interested in smem : http://www.selenic.com/smem/ on Linux )

Splunk Web, or the python process does need to buffer the data being fed immediately to the the browser. For the most part, the ram requirements are modest (tens to perhaps 100 MB) , but there are patterns that can push it up.

If you are displaying 50 items a page, splunkweb will have to acquire 50 items in an xml document from splunkd and then render a an html fragment with these 50 items. Normally this isn’t very large, and the default document trims them to a fixed number of lines (to avoid breaking the browser). However for odd cases (events containing lines that are tens or hundreds of kilobytes long) this could become significant per client.

Another example would be a case where you request display of the top 10,000 hostnames based on event quantity. splunkd will need to generate an xml document with 10k stanzas, which python will have to load and parse, and then generate an html entry with same.

Thus large display cases, times user concurrency, will cause splunkweb to expectedly grow. For so-called ‘pathological’ situations I’ve seen splunkweb grow by 200-300MB for one user.

splunkd has a few tasks in parallel:

  • reading in data from various inputs
  • processing data prior to indexing
  • building indexed datastructures
  • launching search requests and providing results, both interactively and scheduled.
  • authenticating users
  • possibly sending data outbound to other systems.

While all these tasks use memory, there are a few that dominate.

program baseline

splunkd is a big program. The program text itself will use some 30MB or more.

pipeline data

All the data flows of pre-indexed data to the index on disk or to network outputs live in memory. Typically for both forwarders and indexers, this data is some tens of megabytes. On an indexer, the data size is proportional to event size. Thus if you have a majority of very large events (java exception backtraces, web page documents) then this data will grow proportionally.

Pipeline data can grow sharply when the system is not able to keep up with the dataflow for some reason. An extremely underutilitized system will have 1-2 events in each FIFO queue, while a system that is behind will fill up to 1000 events in each FIFO queue. Thus you can grow from ~1MB of pipeline data to more like 20-40MB of pipeline data quickly in situations like disk bandwidth exhaustion, or a blocked downstream splunk instance.

index structures

As part of making the data searchable, an index is built for it. This is built in memory and then flushed out when the memory buffer is full. Each index has an independent buffer.

In Splunk 3, the default per-index buffer was 10MB, while the default index buffer was 100MB. Typically adding more indexes with significant volume would have similarly large buffers, so a high volume server with two user-data indexes might have around 200+MB for indexing buffers.

In Splunk 4.0, the default per-index buffer is 5MB, while the default for the main user-data index is 20MB. A similar example on Splunk 4 would be more like 30-40MB for indexing buffers.

In both 3 and 4, if the number of indexing threads goes up, additional buffers are allocated for these additional threads. We strongly do not recommend adjusting the number of threads.

ldap authentication data

In splunk 3.x and 4.0.x, the responses to the defined LDAP searches that gather user information and group information is buffered in ram. In some cases, this can be quite large. Ideally these searches should be tuned to narrow the data down to the necesssary data. Splunk 4.1 will not buffer significant LDAP data.

searches

In splunk 3, searches live in splunkd ram. Approximately 100k events will result in memory allocations on the order of 1GB.

In splunk 4, the only significant memory use for search will be generating xml descriptions of events. For splunkweb and well-behaved REST clients, this will be very small. It’s possible for a poorly behaved REST client to request extremely large documents which will kick this up.

splunk-search

Splunk-search (4.x+) runs all the operations requested by the search expression, including pulling data off disk, adding fields, sorting, timecharts, and so on. Some operations, like deduping can use significant memory for large numbers of events, while simple search does not. Thus, searches will vary from some tens of megabytes to multiple gigabytes.

If you have memory concerns about your expensive searches it is best to try them and measure using top, ps, etc.

Obviously, you have to consider the quota of searches configured, and the likely overlap of expensive searches by user patterns.

Search processors

In addition to search processors that run natively inside the splunk-search executable some search processors are written in python, and will be spawned as externel processes. Typically these are quite small, but if you have added processors of your own design they may be significant. Ideally these do not buffer any significant amount of data, but just read and write records as they go.

splunk-optimize

From a memory perspective, splunk-optimize is usually a red herring. It looks big but its real footprint is far below that.

Splunk-optimize has the task of combining small .tsidx files (bucket components) into large ones. Depending upon the files combined, the resources can very from extremely little to significant.

splunk-optimize maps the index files into memory, so the virtual size of this program will appear to be quite large. It then walks the source files in essentially linear order, faulting all of the files into the process space. However, since the memory access patterns are so linear, there will be little effective memory pressure produced by splunk-optimize, so the footprint should decrease dramatically when memory is tighter.

The rest of the tasks, including the various scripts, data gathering programs, alerting programs, archiving scripts are genearlly not significant. There are some notable exceptions:

  • The 3.x vmware app. Written in Java, it’s a bit large, over 1 GB of ram typically.
  • flatfileexport.sh – this coldtofrozen archive script invokes ‘exporttool’ which can be fairly memory hungry for 64bit buckets. It may take as much as 2.5GB of ram.
  • splunk-wmi – largely as a result of the Windows WMI subsystem that this program uses, the memory use of this tool grows with the number of categories it is pulling and with the number of hosts. Thus this growth can be a problem if you gather data from a very large number of hosts, or if you have, for example, a large number of custom eventlog categories, or both.

NCM - Inventory Report Web Views... See More in Fewer Pages

One of life's small frustrations now has an easy fix.

I exchanged emails with a customer who mentioned that it can be frustrating to page through pages and pages in the inventory report. If the view showed just a few more rows....it would be so much nicer as you could see much more on one screen.... but alas.... there's no setting for that.

Well, actually there is, but it's a bit roundabout.  You can change this: 

 

to this:

And... it's easy. 

 

NCM Standalone: 

Just go to  your c:/inetpub/solarwindsncm directory and find the web.config file. Find the line <add key="InventoryPageSize" value="20" /> and change it to something bigger – say 200 (or whatever you desire). Save it and voila – you’ll have a lot longer page view.

 

NCM Integration Module:

Go to the Orion web.config file and add the line:

<add key="InventoryPageSize" value="100" />

 

In the AppSettings section in Orions web.config file

Example:

<appSettings>

    <add key="SWOISv2.RemoteEndpoint" value="net.tcp://{0}:17777/SolarWinds/InformationService/Orion/ssl" />

    <add key="SWOIS.LocalEndpoint" value="net.pipe://localhost/SW/InformationService/Orion" />

    <add key="SWOIS.RemoteEndpoint" value="net.tcp://{0}:17777/SW/InformationService/Orion/ssl" />

    <add key="DisableBreadcrumbs" value="false" />

    <add key="InventoryPageSize" value="100" />

 </appSettings>

Save it and you are done. Feel free to experiment with the line length until you get the report view just right. 

 

 

ASA SSL Clientless VPN Plugins

Java

Image via Wikipedia

These plug-ins are buried so deep in the Cisco site, it took me a good hour to track them down. These allow you to add functionality to the clientless SSLVPN on an ASA through Java. These ROCK for setting up remote administration (without a full VPN) for a network.
SSH Plugin
RDP2 Plugin (supports Win2008/W7)
VNC Plugin
*Note - for the RDP2 plugin, the ASA does not have a built-in plugin type for it. You must manually type "RDP2" as the plugin type when uploading it to the ASA* - click thumbnail below for screencap.
rdp2.png

Afterbytes with Marcus Ranum - Under Constant Attack

Title: Critical Infrastructure Computer Systems Under Constant Attack Date: January 28 & 29, 2010 According to a report from The Center for Strategic and International Studies, utility companies’ and other critical infrastructure components’ computer systems are constantly under attack worldwide.... Marcus J. Ranum

Pandora unix agent running on ipod touch :)

A bored developer is something very dangerous… you never know what it will happen…

This weekend I was quite bored and somehow I thought it was interesting to try to run Pandora Unix Agent on my iPod Touch. It’s a jailbroken 1st generation device, so I connected to the Ipod with SSH, and with apt installed the perl binaries from http://coredev.nl/ (there are instructions there to use this

Ipod Touch Agent on Pandora FMS console

Ipod Touch Agent on Pandora FMS console (click to enlarge)

repository).

Once I’ve perl working on /usr/local/bin/perl It was just a matter of updating the first line of the scripts to point to the current location of perl to make it work… wooow the perl agent is soooo easy to port :)

The of course most of the modules didn’t work as is… so I made some testing modules (check at the end of the post) to show how it was working.

And now In my testing server there is an ipod-touch agent :)

I start / stop it manually, I actually don’t want to have it running as a daemon as most of the time i don’t have access to the console to see the actual data.

In case anyone is interested on monitoring Darwin, here are the modules I’ve used:

module_begin
module_name proctotal
module_type generic_data
module_exec ps -A | wc -l
module_description Total number of processes
module_end

module_begin
module_name sshDaemon
module_type generic_proc
module_exec ps -Af | grep sshd | grep -v "grep" | wc -l
module_description SSH Server daemon status
module_end

module_begin
module_name loadavg1m
module_type generic_data
module_exec sysctl vm.loadavg | grep -o '[0-9]\+\.[0-9]\+*’ | head -1
module_description Average process in CPU (Last minute)
module_end

module_begin
module_name freemem
module_type generic_data
module_exec top -l 1 | grep PhysMem | grep -o '[0-9]\+\.\?[0-9]\+*’ | tail -1
module_description Free memory
module_end

module_begin
module_name userCPU
module_type generic_data
module_exec top -l 1 | grep "CPU usage" | grep -o '[0-9]\+\.\?[0-9]\+*’ |tail -3 | head -1
module_description User CPU Usage
module_end

module_begin
module_name sysCPU
module_type generic_data
module_exec top -l 1 | grep "CPU usage" | grep -o '[0-9]\+\.\?[0-9]\+*’ |tail -2 | head -1
module_description Sys CPU Usage
module_end

module_begin
module_name TCPPacketsSent
module_type generic_data_inc
module_exec netstat -s -p tcp | grep "packets sent" | grep -o '[0-9]\+’
module_description TCP Packets Sent
module_end

module_begin
module_name TCPPacketsReceived
module_type generic_data_inc
module_exec netstat -s -p tcp | grep "packets received" | grep -o '[0-9]\+’
module_description TCP Packets Received
module_end

Risky Business and OWASP Podcast Interviews with Ron Gula

Recently, I had the chance to be interviewed for two different podcasts. In Risky Business #138, I had the opportunity to chat with show host Patrick Gray about the recent Google hack, why they may have been using IE6 and... Ron Gula

SAP Plugin is certified

This normally would not be a post for this blog, but it’s our first “BIG” plugin, and now it’

s officially backup by SAP (after a formal audit). I hope in the future more powerful plugins will be made by other companies and independent developers.

SAP Plugin is made by DESET, a Spanish company without any relationship with us.

ReRun: Nobody's Fault: Taking the "F" Out of FCAPS

As we transition to a new editor at NetworkPerformanceDaily.com, we’re going to be reprinting some of the best articles from our archives for a little while. We’ll have new content up shortly.


Originally Published November 29, 2006

by Ed Tittel

The ISO/OSI Network Management Reference Model is usually rendered as FCAPS: Fault management, Configuration management, Accounting management, Performance management and Security management.

This model fails to give full weight to the impact of performance. Performance drives perception, which means that, from a user's standpoint, the source of poor performance doesn't matter as much as the fact that performance is, in fact, poor. According to Denise Dubie at Network World, network managers and engineers are being increasingly tasked to prioritize performance and user experience:

"Distributed IP networks and complex real-time applications have forced a change. Now network managers need to be in the know from the start about application performance, helping developers understand what will work on a network, spotting poorly performing applications before users feel the effects and delivering LAN-like performance over the wide area to remote and branch offices."

In other words, it's not just about monitoring devices anymore. It's about delivering services, at a reasonable cost, in a reasonable amount of time, where users are increasingly asked to decide what's reasonable, time-wise. (For more, see Network World's "User experience is key".)

A performance-first approach (see whitepaper [PDF]) to network management turns FCAPS into PFCAS -- or rather, PCAS, given that fault may be considered merely the most extreme expression of bad performance. The performance-first paradigm inverts the traditional, bottom-up device-monitoring approach and begins with top-down visibility into overall performance of applications running over the network.

Infrastructure availability and utilization aren't the only gauges of network health. Why focus the entirety of network management efforts on the small fraction of network issues caused by hardware or software infrastructure failures?

The fundamental purpose of the network infrastructure is to transport data from one end of the system to the other as quickly as possible. The more efficiently data flows at the transport layer, the better applications perform. Hence, end-to-end response time measurement is the best measure to use when deciding how to optimize the network, plan new infrastructure rollouts and upgrades, and identify the severity and pervasiveness of problems.

This approach recognizes that, between the limits of the network and application infrastructure being “up” or “down,” performance can--and does--vary widely. It is not uncommon for availability status indicators in the Network Operations Center (NOC) to be “all green” even while the help desk phones are ringing off the hook with users complaining about slow response times.

By focusing on the performance of key applications running over the network, IT organizations can focus on what's most important: making informed infrastructure investments to support business demands; delivering consistent, acceptable end-user response times; and quickly resolving business-critical problems. IT organizations that successfully make the transition to a performance first approach typically -- and deservedly -- receive high marks from the business lines they serve.

Michael DeHaan, Creator and Community Lead for Cobbler, Joins Reductive Labs as Product Manager for Puppet

Michael DeHaan joins Reductive Labs as Product Manager for PuppetReductive Labs is excited to announce the hiring of Michael DeHaan as Product Manager for Puppet. Michael will drive product strategy, roadmaps and community engagement for Puppet. Michael previously was the creator and architect of Cobbler at Red Hat as well as the community lead for that product.

Michael brings a strong background in open source software and community development to Reductive Labs. As the creator and community lead for Cobbler at Red Hat, Michael oversaw the growth of Cobbler, which is now used in thousands of datacenters across the world — including in the financial industry, hosting companies, render farms, grids, and universities. Cobbler is well known in the Enterprise Linux and Fedora space as the OS provisioning tool of choice for rapid deployment in medium to large-scale environments. It is very frequently used in conjunction with Puppet to maximize flexibility and efficiency in rollouts of new machines, whether physical or virtual.

In addition to the growth of the use of Cobbler, Michael guided tremendous growth of the contributing community. The Cobbler project has had over 80 code contributors and many more community members that assist with testing, advocacy, ideas, and support.

Michael is a published contributor to Red Hat Magazine and has presented at such events as Red Hat Summit, Red Hat Cloud Forum, the Fedora Users and Developers Conference, HP Tech Forum, and local software events. Michael is also a contributor to over 50 US patent applications in the area of configuration management and datacenter automation.

“Puppet has accomplished something few open source projects achieve — not only has Reductive Labs built a best of breed configuration management platform, it also has created a large and vibrant community of users and contributors that help guide its development,” said Michael about why he was joining Reductive Labs. “The future for Puppet’s ecosystem is extremely promising, and I look forward to helping it evolve and grow further in the years to come. Whether we are talking about cloud architectures, virtualization, grid, or classical server rollouts — as datacenter application deployment gets more complex, Puppet is around to help make the complicated simple and the impossible possible. For me, this is really one of the most exciting spaces in technology to be in, because not only can you help users all over the world solve their management challenges, but you also get to drive the forefront of computing.”

“We couldn’t be happier to add someone with the open source development and community credentials of Michael to our team,” said Luke Kanies, Founder and creator of Reductive Labs and Puppet. “Michael will help us identify opportunities to enhance the value of Puppet and engage further with our already strong and passionate community. We look forward to Michael helping guide the future of Puppet.”

Speaking at CSO Conference next week

Next week I'll be speaking at the CSO Executive Seminar on Data Protection and Encryption in Washington, D.C. My presentation will focus on doing more with less in a time when security and compliance teams are stretched thin due to staff and budget cuts. I will touch on aligning security investments with key business objectives and leveraging automation as a resource multiplier.

Security process automation is often viewed as a sort of holy grail that we all imagine might exist in some form even though nobody has ever actually seen it. Instead of articulating a grand vision, I will focus on the basics: implementing appropriate security controls to address business objectives; leveraging people, process and technology to mitigate risk; and automating some simple, easily repeatable processes to reduce cost and increase repeatability.

If you happen to be in the DC area next week, please stop by and say hello!

Zenoss QA Test Day February 4 - 2.5.2 RC

Thursday, February 4th will  be the third Zenoss QA Test Day, this time covering our 2.5.2 release  candidates.  For a quick look at what is fixed to the 2.5.2 release, you  can view report #6 on Trac - http://dev.zenoss.org/trac/report.   A number of fixes around the Event Console have made it in, as well as  hardening around ZenPack installs, network hierarchy nesting fixes, and  some zenmib fixes.  Altogether, 85 tickets are fixed so far in 2.5.2.  The primary goal  of this Test Day is to catch any issues that might make the launch  difficult.  Special focus will again be directed to the updated Event Console and the resulting fixes, as well as upgrades to 2.5.2.

 

As always, our goal is to get community members interacting directly with Test Link.  In order  for Zenoss' QA efforts to scale with new features as well as continue to  increase regression coverage, we need assistance from the  community.  The benefits of more testing efforts help everyone, so hopefuly we will continue to get more participation.

 

Download  artifacts for test (updated artifacts should be posted Monday/Tuesday):

http://alpha.zenoss.com/files/2.5.2-beta

 

Tickets  included in 2.5.2:

http://dev.zenoss.org/trac/query?group=patch_state&col=id&col=summary&col=status&col=owner&col=priority&col=milestone&col=component&col=changetime&order=status&report=6&patch=2.5.2

 

Zenoss QA  Test Day forum post:

http://community.zenoss.org/thread/12540

 

For those  of you that wish to join, we will be running this session on

Server:  irc.freenode.net (port 6667)

Channel: #zenoss-testing

 

We'll  record a transcript of the day's conversations and links will be  available from the Testing and IRC pages.

 

Zenoss QA Test Day February 4 - 2.5.2 RC UPDATED

Thursday, February 4th will  be the third Zenoss QA Test Day, this time covering our 2.5.2 release  candidates.  For a quick look at what is fixed to the 2.5.2 release, you  can view report #6 on Trac - http://dev.zenoss.org/trac/report.   A number of fixes around the Event Console have made it in, as well as  hardening around ZenPack installs, network hierarchy nesting fixes, and  some zenmib fixes.  Altogether, 85 tickets are fixed so far in 2.5.2.  The primary goal  of this Test Day is to catch any issues that might make the launch  difficult.  Special focus will again be directed to the updated Event Console and the resulting fixes, as well as upgrades to 2.5.2.

 

As always, our goal is to get community members interacting directly with Test Link.  In order  for Zenoss' QA efforts to scale with new features as well as continue to  increase regression coverage, we need assistance from the  community.  The benefits of more testing efforts help everyone, so hopefuly we will continue to get more participation.

 

Download  artifacts for test (updated artifacts should be posted Monday/Tuesday):

http://alpha.zenoss.com/files/2.5.2-beta

 

Tickets  included in 2.5.2:

http://dev.zenoss.org/trac/query?group=patch_state&col=id&col=summary&col=status&col=owner&col=priority&col=milestone&col=component&col=changetime&order=status&report=6&patch=2.5.2

 

Zenoss QA  Test Day forum post:

http://community.zenoss.org/thread/12540

 

For those  of you that wish to join, we will be running this session on

Server:  irc.freenode.net (port 6667)

Channel: #zenoss-testing

 

We'll  record a transcript of the day's conversations and links will be  available from the Testing and IRC pages.

 

UPDATE: IRC transcript QA Test Day 02/04/2010

 

Zenoss IRC session Thursday February 4 at 11am EST

Zenoss developers will be available for questions on Thursday, February 4 at 11am EST in the #zenoss IRC channel on irc.freenode.net (port 6667). Please drop in and bring your questions, answers,  suggestions and feedback.  Zenoss Developer John Causey and  other developers will be available to answer your questions on Zenoss, discuss the on-going Zenoss  in the Clouds ZenPack Contest and the Zenoss 2.5.2 Beta RC - QA Test Day that will also be ongoing.

 

We’ll log the session and repost it here if you can’t make it.

 

Don’t forget you can search for answers to common  questions by visiting the Zenoss Forum.

Zenoss IRC session Thursday February 4 at 11am EST UPDATED

Zenoss developers will be available for questions on Thursday, February 4 at 11am EST in the #zenoss IRC channel on irc.freenode.net (port 6667). Please drop in and bring your questions, answers,  suggestions and feedback.  Zenoss Developer John Causey and  other developers will be available to answer your questions on Zenoss, discuss the on-going Zenoss  in the Clouds ZenPack Contest and the Zenoss 2.5.2 Beta RC - QA Test Day that will also be ongoing.

 

We’ll log the session and repost it here if you can’t make it.

 

Don’t forget you can search for answers to common  questions by visiting the Zenoss Forum.

 

UPDATE: IRC transcript Dev chat 02/04/2010

Parsing the Splunk Timezone Format

Every once in a while, rarely, you may get a splunkd.log error that looks something like this:

12-07-2009 14:32:06.894 ERROR bucket - Failed to resurrect timezone ('
' delimited): '### SERIALIZED TIMEZONE FORMAT 1.0
C0
Y0 NW 47 4D 54
$'

This is splunk saying it can’t parse the timezone description it just got. This can be a problem when you’re in a distributed environment, and you’re asking for data to be bucketed (collected) into time-specific chunks. A typical example is when using timecharts.

The fix for this particular issue is called Splunk 4.0.7, but if you’re curious to know what timzeone it actually is, the digits of hex are the name, represented as ascii values.

A quick trip to python shows us a more familiar name:

jrodman@joshbook:~> python
Python 2.6.1 (r261:67515, Jul 7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
0x47, 0x4D, 0x54
(71, 77, 84)
chr(71)
'G'
map(chr, (0x47, 0x4D, 0x54))
['G', 'M', 'T']

HNAP Protocol Vulnerabilities - Pushing The "Easy" Button

Ease and Security Don't Mix In the eternal quest to create easy ways for systems to communicate with people and other systems, embedded device manufacturers have created new protocols. One of the first was UPnP, or Universal Plug and Play,... Paul Asadoorian

Using a weather map as your background for your maps

So I am sitting here in our European office trying to decide what to write on.  I was catching up on my thwack posts since I was in Barcelona last week for Cisco Live (aka Networkers) and have seen some discussions on thwack recently from some of you and I keep hearing about the Weather Map like we have on the online demo.   Hmmm seems like a great idea for a post!!

I am going to describe this setup using 9.5 and above. 

1. Using Network Atlas, create a new map and click on Linked Background in the top ribbon bar and you will receive a dialog to specify the URL to the weather map image you wish to use. 

2. Enter the url and click validate to ensure we can retrieve the image ok from the Orion server and once the validation is successful, click ok.  In this case below I specified Europe since this is where I am currently at, as you can see, it is freaking cold here.

clip_image002

3. Drag onto the map your nodes or other maps you want to have on this image and save the map.

4. You can edit your map resource on the Summary Home page to show this map.

clip_image004

Now your map on you Network Summary home page will always show the current weather based on when the page refreshed.

Syndicate content