Generating openssl key and cert via shell script

Sometimes certain programs require a PKI (Public key infrastructure) to secure data transfers or to encrypt files. The Open Source backup software Bacula is such a animal which uses a private key and a self signed certificate to encrypt the backup data on the client. It adds a so called master key, too. In case of disaster you can use the master key to decrypt your backup data, too.

The idea is great and it’s very simple. However, what’s the best way to automate the installation on the clients? puppet comes to the rescue, of course, all we need a snippet which can be used to generate the key and the certificate on the client without any interaction.

 

Here is a small shell snippet (called bacula-keygen.sh) which can handle this.

#!/bin/sh
#
# This program creates a RSA key and a certificate
# on the command line without asking questions
#
# Useful for Bacula client-fd installation.
#
# You can override the default values in the ENV
# aka:
# $ DAYS=5 ./bacula-keygen.sh
#
# Which will create a certificate which is valid
# for 5 days.
#
# Ulrich Habel <rhaen@pkgbox.de>

set -eu

: ${CERT="`hostname`.pem"}
: ${KEYLENGTH=4096}
: ${DAYS=3650}
: ${COUNTRY="DE"}
: ${STATE="."}
: ${LOCATION="Starnberg"}
: ${COMPANY="ACME Ltd"}
: ${DEP="ACME Support Coordination"}
: ${CN="`hostname`"}
: ${MAIL="support@acme.org"}
/usr/bin/openssl req -new -x509 -newkey rsa:${KEYLENGTH} \
-nodes -days ${DAYS} -out ${CERT} -keyout ${CERT} << EOF
${COUNTRY}
${STATE}
${LOCATION}
${COMPANY}
${DEP}
${CN}
${MAIL}
EOF

You can download this script here: RAW on GitHub, as GIST on GitHub

If you are using this kind of encryption your backup server will store encrypted data and will never see unencrypted files – which is a great thing in terms of PCI compliance.

There are more useful blog links like:

Using git with custom CA certificates (https)

Well, well – there seem to be some misunderstandings in the world of git and its usage with SSL certificates which doesn’t belong to standard CAs. Here is a quick way how to tell git where to look for certificates. Please note – this might vary from operating system to operating system, I’ll explain it for CentOS / RHEL based systems, however it should work everywhere with a litte adaption. Leave your solution in the comments.

 

Allmost every git distribution relies on libcurl for the HTTP transactions which is a good thing. In CentOS the libcurl is checking the certificates against certain locations. Usually there are:

  • Initializing NSS with certpath: sql:/etc/pki/nssdb
  • CAfile: /etc/pki/tls/certs/ca-bundle.crt

The way this works is to copy your CA file (the exported certificate from the CA which signed your certificates) to /etc/pki/tls/certs.

The only thing we have to do now is to tell git to add a CApath to its configuration by running:

$ sudo git config --system http.sslCAPath /etc/pki/tls/certs

There you go. If you are running your git command now, libcurl will check the path /etc/pki/tls/certs for other certificates, too – find yours and will be able to verify the custom certificate. Of course, this works with the cacert.org based certificates, too – just place them into your certs directory.

Your can turn on debugging by prefixing your git command with the option GIT_CURL_VERBOSE. So the example of a clone will be:

GIT_CURL_VERBOSE=1 git clone <url>

This way you can see exactly what git is doing with the verification. As this is a configuration it will survive the next updates, nice, eh?

 

Oh, and please stop writing turning of SSL verification if you just lack the understanding…(same with SELinux btw – check this article for help)

 

Getting a grip on PHP

Sometimes things in your life are changing rapidly. Due to several reasons I want to look into the worlds of PHP. Being a Perl guy for a long long time I’m wearing my Perl glasses and started to look into the different aspects of the language.

 

When I started to look at PHP I wasn’t interested in the old versions – 5.4.11 is the version to look at, everything else is history. I wanted to start with OO from the beginning and I wanted to start in a testing approach. TDD was all the way to go, how can I define my tests, how can I write a class, what is the best way to ship it, how do I structure my code/applications/modules.

 

And finally – how do I write code? Is that the emacs, vi approach or should I go for NetBeans, Eclipse or other IDEs?

 

This will start a short series about some PHP related articles and my attempt to learn a new language. I will start to use the tag PHP for the articles, hopefully I can manage the separation so that the Perl RSS aggregators are not picking up the PHP content – I will notify them.

 

If you want to follow the blog with all the contents use this feed: All articles

 

If you want to see the Perl stuff use this feed: Perl related

 

Want to see the PHP stuff? Take this route: PHP related

Testing the scraper – with Perl (of course)

There has been quite some feedback about the last blog article on web scraping using Web::Scraper. I’ve promised to put up a GitHub repository but I didn’t publish the link so far. So here it is – the full scraper application for the Tagesschau::Video::Asset module.

 

The distribution is CPAN ready – so it has all the documentation which is needed, it has a Makefile.PL, it has tests and it has an example. All you need to get started is to direct your browser to the repository which is listed above and you are ready to go.

 

Here is something which was rather interesting to me. I am scraping a webpage which is being produced by a content management system, however, if the layout changes or the CSS classes are changing there is a chance that my module will break. I therefore decided to have a small check inside the module. Usually I return a data structure with the correct data after the scraping was successful. I decided to fail inside the module when you get something back which differs.

 

Usually the structure looks like this:

 

 

$VAR1 = {
          'formats' => [
                         {
                           'link' => '...',
                           'format' => '...'
                         },
                         {
                           'link' => '...',
                           'format' => '...'
                         }
                       ],
          'timestamp' => '...',
          'story' => '...',
          'headline' => '...'
};

 

 

 

So you do know before the call what you’ll get back if the call and the parsing is successful. I decided to implement a quick check for the keys of the hash inside the code. If one of the keys is missing the whole thing is going to fail. I am not sure if this a great idea, however, it will certainly help you to detect changes inside the web page.

 

 

for (qw/formats timestamp story headline/) {

  croak “Missing key from scrape ($_)”

    if !exists $res->{$_};

}

 

 

Please provide some feedback about the idea. Do you think it makes sense to validate the return before you are returning from the subroutine?

 

I’ve also had some great fun in writing the tests. I never messed with Test::Deep. Test::Deep provides a great way to test multilevel data structures. I am providing a very small part of the original webpage to run the tests. So there is no need to parse the actual data on the web page when you are installing the module, you can use the example asset file. The test actually runs the calls against a file:/// resource, parses aka scrapes the html document and returns a data structure. Here is the test inside the basic.t test file.

 

 

cmp_deeply(

  $res,

  { timestamp => ignore(),

    formats   => ignore(),

    headline  => ignore(),

    story     => ignore(),

  }, “Testing data structure”

);

 

This is redundant to the check we’ve seen earlier. However, I just wanted to have a start – now I can use this as a personal reference. Yep, I’ve looked into Test::Deep - at least I’ve read the modules pod page.

Scraping the web (Web::Scraper) and CSS

There is more than one way to extract contents out of a webpage. The CPAN module Web::Scraper provides a great way to scrape (hence the name) contents out of webpages. It makes use of CSS selectors or XPath queries to extract the data out of the HTML content. I prefer to use CSS selectors as I am already familar with them. The module provides decent documentation, however, examples are always better to understand. CSS selectors are common in the world of HTML. Usually you are using them to describe certain elements in a HTML document and assign CSS styles to them. However, with Web::Scraper you can use selectors to navigate through the document.

 

 

The challenge

There is a webpage of the German national broadcaster which hosts video streams of reporters from all over the world in video blogs. Unfortunately they don’t provide a RSS feed to the video streams. So every time you want to watch a new episode you can use the RSS feed to access a webpage with the new stream. I decided to pull their RSS feed, query the links inside it and check for video streams. I wrote a small wrapper around this so I can use a small Web::Simple based application to generate a RSS feed with video feeds included – which I can subscribe to. Voila, there is my RSS feed with videos for my tablet.

 

 

The solution

Here are some elements which I wanted to extract and I’ll show you the CSS selector which I used to navigate through the content.

 

 

Here is the first snippet of the webpage that I want to extract:

 

<html>

  ….

  <span>Videoblog …</span>

  <h1></h1>

  <p></p>


There is only one h1 headline inside the document. I wanted the content of the headline and the content  of the paragraph which is following after the h1 tag. Here is the selector to get in the language of Web::Scraper.

 

my $scrap = scraper {

  process ‘h1′,   headline => ‘TEXT’;

  process ‘h1+p’, story    => ‘TEXT’;


The h1 tag is pretty self explanatory. The notation h1+p describes the first p tag after the h1 tag. The ‘TEXT’ says that I want to extract the content which is enclosed by the tags.

 

The webpage has some download links for video streams on it. I wanted to grab the URLs of the streams and the title of the corresponding stream. Here is the HTML snippet.


<ul id=…>

  <li>

    <a href=”” title=”…”>

      <span></span>

    </a>

  </li>

  <li>…multiple times one per stream ….</li>

</ul>


Well, things are getting a little bit complicated here. I am interested in the href attribute and in the content enclosed by the span tag. As there are many li elements, one for each stream we need to store the return value in an array. So, let’s look into the CSS selector.


 

process ‘a.downloadLink’, “formats[]” => scraper {

  process ‘a.downloadLink[href]‘, link   => ‘@href’;

  process ‘span.title’,           format => ‘TEXT’;

};

 


Huh, ok – so let’s proceed step by step. The selector a.downloadLink selects every a tag which is of the class downloadLink. I am using the array formats[] to store the return values of the included scraper. This scraper selects the href attribute (‘@href’) and text (‘TEXT’) of the span element of the class title.


The result is a data structure which can be processed. Here is an extract of it:

 

 

$VAR1 = {

  ‘formats’ => [

                 {

                   'link'   => ...,

                   'format' => ...,

                 },

                 {

                   'link'   => ...,

                   'format' => ...,

                 },

               ],

  ‘story’    => “…”,

  ‘headline’ => “…”

};

 

The display of the source code and the data structure is nowhere near perfect in this blog engine so I’ve provided a small gist for it on my GitHub account. You can find it here.

 

After you walked through the source code of the example you’ll know how to extract portions of a webpage using CSS selectors. You can use Mojolicious for this purpose, too! The Mojo::DOM::CSS module provides everything you need. Which one you choose is just a matter of subject. If you are using long HTML documents I would go for Web::Scraper which might be faster due to the fact that it can use LibXML if properly installed (plus you’ll have validation support if needed).

 

Conclusion

CSS selectors provide a powerful way to parse HTML/XML contents and to extract data out of it.  Web::Scraper or Mojo::DOM::CSS are two modules on CPAN which are well maintained, documented and have an active community around it. I think Web::Scraper deserves to be included in the toolchain of all people working with Perl.

 

Here are some more links with some useful material:

What are the key elements of a Perl workshop?

There is a Linux workshop in Augsburg in March 2013. I used to give presentations there, mostly about Open Source projects and thinking, some about Perl aspects, too. This year I want to run a Perl starters workshop to get more people into the modern way thinking and coding in Perl. I am not sure if I can run a practical sessions with laptops, probably it will be a mixture of practical things and theoretical things. The workshop time slot is 2 hours.

 

Image that you will take part in the workshop as someone who is starting to learn Perl. What would you expect to learn? What do you want to hear about?

 

Please leave your feedback in the comments, I have to submit the workshop until 13th of Jan 2013.

Old Perl books – still useful?

Here is an oldie. I found it when I was cleaning my bookshelf. Is a short reference about Perl 5.6 still usable in 2012 after Perl 5.16.2 has been released? It depends on you usage of such references, do you like to have something on your desks which looks nice, something which avoids that the wind will mess with your sheets so called paper weight? Or do you prefer current references. I am pretty indifferent about such references. The code inside in it will still work, even in 2012. If you are trying to compile stuff from the original K&R C book the current C compilers will probably remove your user account and lock down the root access to the server. In Perl, however, nearly everything still works – starting from the geological stable Perl versions ( (c), mst ) up to the current Perl version. This leads to the question about deprecation of features. Brian d. Foy wrote a nice article about this (Perl 5.12). Make sure to look into the perldelta pod pages, usually a few things are being listed as deprecated from version to version (you already did this,right?).

So I am still indifferent about the book. Keep it or trash it? Maybe it's worth keeping it – to get in touch with the ancients. You decide, what's your opinion about this?

Perl 5 - Kurz und gut

Perl 5 Reference

 

LPW2012 – A short summary

First things first – it has been a great conference with great talks, great organization, a great pub and a very inspiring community. The travel to London was ok, the flight was on schedule and I started my Oyster card master course in time. The hotel room was amazingly small (that’s probably common in the UK). Anyway – everything was clean and nice.

I was tired after the journey and therefore I didn’t attend the pre-conference meeting on Friday. I got up early on Saturday and left the hotel at 08.00 o’ clock. Sorry, there seem to be a problem with my personal and social life! I am getting up early on Saturday to attend a Perl conference? Yes, …
There has also been a tweet about that from someone – great, I had the same idea.

The University of Westminster hosted the event. Several track ran in parallel so make sure to setup your personal schedule! I finished my slides and started with my first talk at 11.20. That was the 40 minute talk “There is no single server” and it dealt with server deployments which are running Perl applications. The talk was ok, however, I was still tired, probably it wasn’t my best one. I attended several other talks – one which was very interesting was the one from mst about fatpacking and messing with WWW::Simple. Holy! One day I will ask him what he is into when he builds such applications? Free range mushrooms, this guy is totally insane!
Here is one thing which took a while until I realized it. mst was using Windows on his laptop. That’s fine for me, something which was interesting that he mentioned some code snippets in his presentation about assembling paths and directories. Dealing with all kinds of problems around this I was very pleased to find someone like him caring for Perl specifics for Windows. That’s great and I really encourage people (this is you readers) to get in touch with the in-core modules like File::Path to create directory structures.

Finally – at the end of the conference it was lightning talk time! Well, I had prepared a lighting talk, however, the problem I was dealing in the talk seems to be non existent in UK. Amazing! The talk deals about the challenge to get new people into Perl. Perl seems to be dead in terms of a company. There is just no single reason to pick for a new project here in Germany, at least not that I am aware of. Perl for scripts is fine, Perl for application is just not applicable in the companies I am working or have worked for. It’s all the way Java and nowadays Ruby or Python.
Anyway – “I come from the dead” were the first words of my talk and then I started to explain the problems with Perl in Germany from my point of view. The presentation was about asking your boss if you should port everything business critical to Perl. (oh, btw. I am still employed there…)

The talk worked really well and I got some very nice feedback. After the conference we went to the pub King and Queen and had some drinks and some food. It was a great conference – thanks to all the people who took part in the organization of this event.

Today (on Sunday) was London sightseeing time, expect some images once I have a nice network connection again.

Right now I am sitting at Heathrow and wait for my flight (which will take some more minutes…)

 

(Please note – that was one week in the past – I just forgot to publish the post)

Everything in a box