June 20, 2009

Twitter and Powershell

This is a cool little script that I found today.

June 19, 2009

Temp Directory in Powershell

I deal sometimes with very large files that I need to unzip and then parse. I have had instances where I just forget to delete that huge file and it comes back to bite me. I started to unpack these files to the temp directory in order for it to get cleaned up when I clean my temp folder. Here is the simple function that I use:
function CreateTempDir
{
$tmpDir = [System.IO.Path]::GetTempPath()
$tmpDir = [System.IO.Path]::Combine($tmpDir, [System.IO.Path]::GetRandomFileName())

[System.IO.Directory]::CreateDirectory($tmpDir) | Out-Null

$tmpDir
}
and then I use it like this:
$tmpDir = CreateTempDir

June 18, 2009

Remove SVN folder using Powershell

I use subversion as my source code repository and it generates a svn folder. There are times where I would just like to remove all svn folders and files attached to this. Here is a simple one liner in powershell to do just that:
gci 'C:\_DevCode\CustomApps' -include _svn -recurse -force | foreach ($_) { del $_.fullname -recurse -force -whatif}
Three things:
  1. -whatif will do the simulation, so please remove when you are reay to run
  2. My folders for svn were setup as _svn while the default is actually .svn, so you may need to modify that
  3. -force will make sure to get rid of hidden files
You can also use this same code for example to search your machine and get rid of thumbs.db!
gci 'C:\' -include thumbs.db -recurse -force | foreach ($_) { del $_.fullname -recurse -force -whatif}

AD Login Test with Hex Error

If you ever have done work with C# and AD you probably would have come across various issues. One of the first issues usually relates to authentication. In an earlier posting, I already proved that win32 logon is better than Ldap binds and so I am going to assume that this is the approach you are all taking. Now, let's say you want the hex code error and also try different types of logon such as LOGON32_LOGON_INTERACTIVE or LOGON32_LOGON_NETWORK. This little utility will allow you to switch between the different options and easily test and get the hex code error.

Here is a snippet of the code which you can find here:
private void btnLogonUser_Click(object sender, EventArgs e)
{
IntPtr token = IntPtr.Zero;
int logonType = (int)((ComboItem)cbLogonType.Items[cbLogonType.SelectedIndex]).Value;
bool isOk = LogonUser(tbUsername.Text, tbDomain.Text, tbPassword.Text, logonType, LOGON32_PROVIDER_DEFAULT, ref token);
int err = GetLastError();
if(isOk) {
tbStatus.Text = "Status";
}
else {
tbStatus.Text = String.Format("Error: 0x{0:X} ({0})", err, err);
}
}

June 17, 2009

DotLucene and Accented Characters

During my projects working with Lucene, I had to index data from a database and make that searchable. One Of the issues that I came across was that first and last names with accents did not play nicely when searching for them. Once again a Java filter existed for this but nothing in C#. You can find here my ISOLatin1AccentFilter conversion which is a filter that replaces accented characters in the ISO Latin 1 character set by their unaccented equivalent (the case will not be altered).

June 16, 2009

KStemmer Port for DotLucene

I worked with Lucene a few years ago and since then I have not really played with it. Yesterday, I got an email asking me if I still had code for my KStemmer port.

Lucene is a text search engine that was initially written in Java. It does a full-text search of files and data that you have indexed using Lucene. The basic idea is that you want to search your network for files or if you wanted to index data from your database and search using a "Google" style search, than this was the tool for you. I was first introduced to Lucene by a client who wanted to index data from a database and search against it quickly. He had worked with these types of engines before and asked me to build it out for him. He wanted to index the data, add some boosting for certain fields (basically a way to give high priority to certain fields) and to stem the data using KStemmer and PorterStemmer. Stemming is a way for removing common endings from words in English or any other language to give word normalization. It could strip suffix for example 'ing', 'ed', or 'ly'. So that if you searched for the word fishing you would really be searching for the root word 'fish'. The PorterStemmer was known as the de-facto standard algorithm for all stemming done in English and it is also included as part of Lucene.

My client also wanted the KStemmer which was not part of the Lucene package. I had to do some research but basically found KStemmer to be a bit less aggressive than the PorterStemmer and faster. My client felt that for what he needed, the KStemmer was actually better.

This project needed to be in C# and the only KStemmer version was in Java. I converted the code a few years ago and it seems now that the DotLucene site is down, it is hard to find my code. I dug around for it and now you can find it here.

Perhaps, I'll post some of my other tools and code with my time using Lucene.

TinyUrl and Powershell

TinyUrl is something that I assume most of you have used or at least heard about. Basically, it creates a short version of your long URL. It is available free to anyone and is best used for long query strings that probably will break in an email. I figured why not show how it can be created using POSh. An example is below:
#Create the function
function Gen-TinyUrl($url){
(new-object net.webclient).downloadString("http://tinyurl.com/api-create.php?url=$url")
}

#Call the function
$gen = Gen-TinyUrl -url http://joelangley.blogspot.com
$gen
The output being: http://tinyurl.com/lmusyc

How easy was that!

June 15, 2009

Copy files across the network

I needed to copy a large amount of files across the network and I was trying to figure out the fastest approach. The assumption was to mirror and not do any timestamp, CRC, or MD5 checks. I came up with the following options:
  1. Copy from explorer: Just wanted to mention this, but did not consider it :)
  2. XCopy: The obvious one, but slow.
  3. XXCopy: The free extension to XCopy, but I don't believe it is that much faster.
  4. Powershell copy cmdlet: OK, I love powershell, but is it really that much faster?
  5. SyncBack: Free version, but it turned out to be slower than I would have liked
  6. eseutil: Mentioned to me by my exchange buddies, but turns out to copy only one file at a time
  7. RoboCopy: This was where I wanted to start...seemed to be the fastest so far!
  8. FastCopy: Free and seems to be even faster than RoboCopy...I think I have a winner.
I thought I had it with FastCopy, and then I came across a tool called RichCopy which is from Microsoft. It turns out that RichCopy is faster than all of the above options and I don't understand why it is not well known like RoboCopy is. Take a look at RichCopy if you ever need to do some bulk copying of files across network shares.

Bing API and Powershell

Microsoft released the API for Bing and you can find that here. It supports multiple protocols such as SOAP, XML, and JSON. Also, they have an idea of SourceTypes which is data sources that you can search in. I think this is really neat. I was surprised that no powershell library existed until I decided to search for one (with Google and not Bing) and came across PoshBing which can also be found on codeplex! The blog can be found here. The PoshBing library has a ton of features for the amount of time it has been out and it seems really simple to use. Thanks for this great library!

June 14, 2009

LogParser and Powershell

Here is a great library that can be used to call logparser from powershell. The cmdlets are written for you to use and are very easy to follow.