The American National Science Foundation (NSF) has recently announced new requirements for handling and sharing data from projects funded by the agency.
"Science is becoming data-intensive and collaborative," noted Ed Seidel, acting assistant director for NSF's Mathematical and Physical Sciences directorate. "Researchers from numerous disciplines need to work together to attack complex problems; openly sharing data will pave the way for researchers to communicate and collaborate more effectively."
"This is the first step in what will be a more comprehensive approach to data policy," added Cora Marrett, NSF acting deputy director. "It will address the need for data from publicly-funded research to be made public."
The new requirements, although publicly stated to encourage better sharing of data within the science community, are almost certainly related to the recent Climategate fiasco. It should be seen as a reminder to us all to follow proper procedures when writing software, especially in the area of source code control.
For more information on Climategate see the articles in Wikipedia, Wallstreet Journal, the original files related to the scandal, and an analysis of the emails.
For more information on the subjects that are more directly computer related see the articles in Wikipedia for LIMS, data management, and source code control.
Monday, May 24, 2010
New Requirements by NSF for Data Management
Friday, May 21, 2010
Roman Numeral to Decimal Conversion - Algorithm 2
A second method for converting Roman numerals to decimal numbers has been posted on the Computer Science 101 website.
According to the article on that site, the algorithm boils down to this:
- Set the character pointer to point at the right-most character.
- Set the value of the summing variable to be the value of the current character.
- Move the character pointer to the left one step.
- Compare the value of the current character to the previous character:
- If the current character's value is the same or larger than the previous character, ADD its value to the sum.
- Otherwise, if the current character's value is smaller than the previous character, SUBTRACT its value from the sum.
- Repeat steps 3-6 until the left-most character has been evaluated.
Thursday, May 13, 2010
Decimal Number to Roman Numeral Conversion Version 3
Two previous posts showed how to convert a decimal value to a Roman numeral. The first method was very clunky and elementary, the second method is considerably more efficient.
It ocurred to me, after designing and implementing the second algorithm, that if you're willing to sacrifice a bit more memory space for a table, then there's a faster way to convert from decimal values to Roman numeral notation. The following code shows an implementation of the algorithm in VB .NET. I call it the MCXI algorithm due to the values of the four rows in the table. There are some funny things going on between the 0 based indexing used in accessing array data and 1 based indexing used in the MID function used for parsing the input string that you may need to address this if you use a different programming language.
Private Sub btnConvert_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnConvert.Click
' Algorithm MCXI
' Written by: Erik Oosterwal
' Written on: 1-21-2008
'
' Routine to convert Decimal values to Roman Numeral notation. This algorithm uses a table to store
' all the standard Roman Numeral notation for each digit in each Base-10 digit position (ones, tens,
' hundreds, thousands). Each Decimal character is then evaluated to find its magnitude and extract
' the corresponding Roman Numeral notation for that value and prepend it to the output string.
'
' This routine does not check for proper usage - the input is not checked for valid numeral characters
' and it does not check to verify that the values are positive values between 1 and 4000. To make
' function robust, you check put checks in place for these conditions.
'Store the Roman codes for each Decimal digit.
Dim strRoman(,) As String = {{"", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"}, _
{"", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"}, _
{"", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"}, _
{"", "M", "MM", "MMM", "MMMM", "", "", "", "", ""}}
Dim strOutput As String = Nothing ' Initialize the output string to blank.
Dim intLoopI As Integer = Nothing ' Counter used for the Loop
Dim intStrLength As Integer = Nothing ' The length of the input string.
intStrLength = Len(Me.txtInput.Text) ' Find out how many digits are in the input number (magnitude).
For intLoopI = intStrLength To 1 Step -1 ' MID function uses 1 based indexing for the string pointer.
strOutput = strRoman(intStrLength - intLoopI, Val(Mid$(Me.txtInput.Text, intLoopI, 1))) & strOutput
Next
Me.txtOutput.Text = strOutput
End Sub
As you can see, algorithm number three (MCXI Algorithm) executes the loop a maximum of 4 times and uses no division or multiplication. This should speed up the run time a great deal over the previous two listings.
Tuesday, May 11, 2010
Fizz Buzz Busted!?
On the site Computer Science 101, Erik Oosterwal presents two different solutions to the FizzBuzz problem and demonstrates why it is important for developers on larger software teams to write code that is developer-friendly even when the developer-friendly code is less efficient.
Here's a preview of one of the two solutions:
void FizzBuzz2(void)
{
int i = 0;
char output[10];
for(i = 1; i<=100; i++)
{
strcpy(output, "");
if(i%3==0)
{
strcpy(output, "Fizz");
}
if(i%5==0)
{
strncat(output, "Buzz", 4);
}
if(strcmp(output, "")==0)
{
printf("%d\n", i);
}
else
{
printf("%s\n", output);
}
}
}
Monday, May 10, 2010
Roman Numeral to Decimal Conversion
You can find algorithms for creating Roman Numerals from decimal values here and here.
In this post we'll take a look at the first of two algorithms for accepting a Roman Numeral and converting it back into a decimal value.
The basic rules for this algorithms are:
Start at the left-most character in the Roman Numeral string and work your way to the last character one character at a time. Add the value of the current character to the total. If the current character has a larger value than the previous character, then subtract 2x the value of the previous character from the total.
For instance the Roman Numeral XIX has a decimal value of 19. We start at the first character, 'X', and add 10 to the running total; our total is now 10. We evaluate the second character, 'I' and add 1 to our running total, which now becomes 11. We evaluate the third character, 'X', and add 10 to our total; our total is now 21, but... Because the current character has a larger value than the previous character we must subtract 2x the value of the previous character, 1, from the total bringing the new total to 19. There are no more characters so we stop.
Function Roman2Decimal(strInput as string) as integer
Dim intValues() as integer ' Reserve space for an array.
Dim intLoopI as integer = nothing ' Initialize the loop counter to 0.
Dim intTotal as integer = nothing ' Clear the total.
Dim intPreviousPointer as integer = 99999 ' Set the value of the previous pointer to some MAX value.
Dim intCurrentPointer as integer = nothing ' This will hold the value of the current Roman Numeral character.
intValues(M) = 1000 ' This array addressing method
intValues(D) = 500 ' assumes that the language you
intValues(C) = 100 ' are using allows you to use
intValues(L) = 50 ' values other than integer values
intValues(X) = 10 ' to access data in an array.
intValues(V) = 5
intValues(I) = 1
For intLoopI = 0 to (len(strInput)-1) ' Assuming 0 indexing...
intCurrentPointer = intValues(mid$(strInput,intLoopI,1)) ' Get the value of the current Roman Numeral character
intTotal = intTotal + intCurrentPointer ' Add the value to the total.
If intCurrentPointer > intPreviousPointer then ' If the current value is larger than the previous, then
intTotal = intTotal - (2 * intPreviousPointer) ' subtract twice the value of the previous character from the total.
EndIf
intPreviousPointer = intCurrentPointer ' Move the value of the current character to be the
' value of the previous character and get then next one.
Next
Roman2Decimal = intTotal
End Function
Spam
Due to the high number of spam comments recently, the ability to leave anonymous comments has been turned off. You can still leave comments by signing in with your Google or OpenID passwords.
Thanks for your patience in this matter.
Thursday, May 6, 2010
Infix to RPN Notation
Here is a limited algorithm for converting infix notation to RPN notation. The algorithm has no way of handling unary minuses (indicating negative numbers) nor does it deal with functions such as SIN(), COS(), SQR(), etc. I describe one way of handling unary minus in the comments of the program listing below and will alter the code to handle those as well as sines, cosines, square roots, etc. at a later date. In the mean time if you need an infix to RPN converter, this should handle most of your needs.
Parsing from left to right, any numeric value we encounter or any alphanumeric string that could be interpreted as a variable is sent directly to the output, including decimal points.
When we encounter anything other than numbers or letters (or a decimal point), we do the following:
Append a token separator to the output string (in our case a comma.)
If we encounter an open parenthesis, that parenthesis is pushed to a FILO stack.
If we encounter a closing parenthesis, we discard it and pop operators from the stack to the end of the output string until we come to an open parenthesis. The open parenthesis is then also discarded.
If we encounter a low priority operator (+ or -) then we must first pop all operators (+, -, *, /, ^) from the stack to the end of the output string until the stack is empty or we encounter an open parenthesis and then push the low priority operator on to the stack.
If we encounter a medium priority operator (* or /) then we must first pop any medium or high priority operators (*, /, ^) from the stack to the end of the output string until the stack is empty or we encounter a low priority operator (+ or -) or we encounter an open parenthesis, then push the medium priority operator on to the stack.
If we encounter a high priority operator (^) we pop all other high priority operators from the stack to the end of the output string then push the new high priority operator to the stack.
Once all the input character have been processed, we pop operators off the stack until it's empty.
The listing below shows a Visual Basic implementation of the algorithm:
'
' Infix to RPN - program to convert infix notation to
' RPN notation. Operands are separated by commas, to
' use a different separation character change the
' line where txtSeparator is initialized. The operator
' stack has been set to an upper limit of 100 operators,
' if you expect to have input strings with more than 100
' operators (heaven forbid...) increase the size of
' txtStack in the Dim statement.
'
' This routine does not handle unary minus signs. To allow
' unary minus signs two additional changes must be made. In the
' Low Operator case (+, -) we must check to see if the previous
' input character was an alphanumeric. If it IS alphanumeric
' we treat the minus as a normal binary operator. If it is NOT
' alphanumeric, then we append a zero and token separator to the
' output string, and push a special unary minus character to the
' stack, which has an operator value equal to ^. You can use the
' tilde "~" as the unary minus character since it's not being used
' for anything else. I'll make this changes to the routine at a
' later date.
'
' Written by - Erik Oosterwal
' Started on - Nov. 14, 2005
' Completed on - Nov. 15, 2005
Dim txtSeparator, txtStack(100), txtCharacter As String ' Declare all variables
Dim intPointer, intStackPointer As Integer ' used in the routine
txtSeparator = "," ' Set the token separator to a comma
intPointer = 1 ' Start by looking at the first character in the input string
intStackPointer = 1 ' Start the operator stack at 1
txtOutput.Text = "" ' Clear the output sting
While intPointer <= Len(txtInput.Text) ' As long as there's characters in the string
txtCharacter = Mid$(txtInput.Text, intPointer, 1) ' get the next character.
Select Case txtCharacter
Case "0" To "9", ".", "A" To "Z", "a" To "z" ' Numbers and variables get appended
txtOutput.Text = txtOutput.Text + txtCharacter ' directly to the output string.
Case "("
txtStack(intStackPointer) = txtCharacter ' Open parentheses get pushed on the stack
intStackPointer = intStackPointer + 1
Case ")" ' If we find a closing parenthesis
While intStackPointer > 1 And _
txtStack(intStackPointer - 1) <> "(" ' Clear the stack until we
txtOutput.Text = txtOutput.Text + txtSeparator + _
txtStack(intStackPointer - 1) ' get to the opening parenthesis.
intStackPointer = intStackPointer - 1
Wend
intStackPointer = intStackPointer - 1 ' Decrease the stack pointer to overwrite the opening parenthesis.
Case "+", "-" ' Lowest operators
txtOutput.Text = txtOutput.Text + txtSeparator ' Append a token separator to the output string.
' If there are any operators on the stack AND they're higher or equal valued than + and -, then...
' pop them off the stack and append them to the output string.
While intStackPointer > 1 And _
txtStack(intStackPointer - 1) <> "("
txtOutput.Text = txtOutput.Text + _
txtStack(intStackPointer - 1) + txtSeparator
intStackPointer = intStackPointer - 1
Wend
txtStack(intStackPointer) = txtCharacter ' Push the low operator on the stack.
intStackPointer = intStackPointer + 1
Case "*", "/" ' Medium operators
txtOutput.Text = txtOutput.Text + txtSeparator ' Append a token separator to the output string.
' If there are any operators on the stack and they're higher or equal valued than * and /, then
' pop them off the stack and append them to the output string.
While intStackPointer > 1 And _
(txtStack(intStackPointer - 1) = "*" Or _
txtStack(intStackPointer - 1) = "/" Or _
txtStack(intStackPointer - 1) = "^")
txtOutput.Text = txtOutput.Text + _
txtStack(intStackPointer - 1) + txtSeparator
intStackPointer = intStackPointer - 1
Wend
txtStack(intStackPointer) = txtCharacter ' Push the medium operator on the stack.
intStackPointer = intStackPointer + 1
Case "^" ' High operators
txtOutput.Text = txtOutput.Text + txtSeparator ' Append a token separator to the output string.
While intStackPointer > 1 And _
txtStack(intStackPointer - 1) = "^"
txtOutput.Text = txtOutput.Text + _
txtStack(intStackPointer - 1) + txtSeparator
intStackPointer = intStackPointer - 1
Wend
txtStack(intStackPointer) = txtCharacter ' Push the high operator on the stack.
intStackPointer = intStackPointer + 1
' There is no default case included in this routine. Any characters not explicitly taken care
' of by the cases will be ignored as whitespace. Some characters, such as x, #, and others
' that are used to denote binary, hex, octal, etc. numbers could be taken care of in the first
' case where numeric and alpha tokens are handled. It's ok to ignore spaces, tabs and other
' white space.
End Select
intPointer = intPointer + 1 ' Set the pointer to look at the next character
Wend
' All the input characters have been taken care of, now it's time to clear the stack.
'
While intStackPointer > 1 ' As long as there's still operators on the stack
txtOutput.Text = txtOutput.Text + txtSeparator + _
txtStack(intStackPointer - 1) ' Take one off and append it to the output string
intStackPointer = intStackPointer - 1 ' look at the previous operator
Wend
End Sub
Wednesday, May 5, 2010
A Replacement for the Bubble Sort
For reference, here's a listing of the Bubble Sort algorithm in Visual Basic:
' Bubble Sort
Dim I, J, IntArray(), Temp as Integer ' Declare Variables
For I = 1 To Length - 1 ' Supposed to be the smaller value
For J = I To Length ' Supposed to be the larger value
If IntArray(I) > IntArray(J) Then ' Compare Cells
Temp = IntArray(I) ' \
IntArray(I) = IntArray(J) ' | Swap Cells
IntArray(J) = Temp ' /
End If
Next
Next
There's not much good to say about the Bubble Sort except that it's small and compact size-wise. Performance-wise it's a whole other animal.
So what do you do when you need a sorting algorithm that's comparable to a Quick Sort in terms of performance but still only takes up about as much space as the Bubble Sort? Consider implementing the O-Sort. Here's a listing of the O-Sort in Visual Basic:
' O-Sort (Oosterwal Sort) Version 3
SngPhi = 0.78 ' Define phi value
SngFib = Length * SngPhi ' Set initial real step size
Step = Int(SngFib) ' Set initial integer step size
While (Step > 0)
For I = 1 To Length - Step ' Range of lower cell
If IntArray(I) > IntArray(I + Step) Then ' Compare cells
Temp = IntArray(I) ' \
IntArray(I) = IntArray(I + Step) ' | Swap Cells
IntArray(I + Step) = Temp ' /
End If
Next
SngFib = SngFib * SngPhi ' Decrease the Real step size
Step = Int(SngFib) ' Set the integer step value
Wend
Here's how it works.
The O-Sort is a type of Comb Sort algorithm. Comb Sort algorithms use a step size or interval between the two values being compared that starts off large, compared to the number of values being sorted, and becomes smaller for each subsequent pass of the data. The idea behind the algorithm is to get chunks of the larger values to the end of the list and chunks of the smaller values to the beginning of the list quickly, then with each subsequent pass of the data look at values that are closer together and moving smaller chunks of data shorter distances until during the last pass neighboring data points are compared.
In practice, with 10,000 elements of completely randomized data the O-Sort takes about 1.6 times longer to sort the data than the Quick Sort. For comparison, the Shell Sort takes 36 times as long and the Bubble Sort takes over 200 times as long.
When you have data that is less randomized, such as what you would get when appending lists of sorted data, the Quick Sort is less quick. When 10% of the data is randomized the O-Sort requires 0.96 as much time to completely sort the data as the Quick Sort. When only 5% of the data is randomized, the O-Sort takes only 0.61 as much time to completely sort the data as the Quick Sort.
When you need a fast in-place sorting algorithm that doesn't require much code space consider using the O-Sort routine.