PERL Upload script and NodeWorx BW Calculations

Not sure if this should be in NodeWorx forum, but I think this is general on how InterWorx analyzes the log files.

My question has to do with a PERL script for uploading files via a form POST. The entire file system is programmed in PHP, but I use a PERL script so that I can provide an upload progress bar (need access to the raw POST data as it’s being uploaded). I have uploaded several files just testing it and I know the amount of BW shown in NodeWorx is much less than it should be.

One thing to note is that I removed the SuexecUserGroup from the apache config file which I think may be the problem :rolleyes: (although I don?t know if having that set would fix the problem).

I removed the Suexec because my download script is written in PHP and PHP runs as apache so the files need to be created as the apache user. I was unsuccessful in trying to change the permissions of the file in the PERL upload script [either changing the group to apache or chmod’n everyone to read] to allow PHP to be able to read the file for download.

Basically if someone can tell me that the SuexeUserGroup is the problem with the BW calculations then I will either work on finding a way to do the Chmod or write a PERL download script. I would like to have the files created with the actual user?s name as this would provide an accurate measure of disk space usage for the client website through NodeWorx (since that is based on the owner of the file).

Also, while on this topic would this problem affect PHP based file upload scripts or is the method PHP uses allow NodeWorx to calculate the correct BW for the user even though PHP is running as apache (or nobody).

Thanks for the info,

Justin

Update

I figured out what I was doing wrong with my CHMOD in the PERL script and have put back the SuexecUserGroup and it has no effect on the BW calculations.

Also, while on the subject of Suexec and running scripts as different users I found a module that was being developed that allowed you to have apache run child processes for each virtual host as a particular user which would allow PHP (or any other script) to run as a particular user instead of the Apache user set in the main apache config file. (http://httpd.apache.org/docs-2.0/mod/perchild.html)

Not sure if anyone has more information on this module :confused:

Hi Justin,

You are correct that the Suexec settings have no effect on the bandwidth tracking.
InterWorx uses mod_watch for real-time bandwidth measuring. Both the realtime graphs in SiteWorx and the totals calculated use this method. As far as I know large POSTs should be included in this measurment, but it sounds like based on your testing that it is not, correct?

Perhaps you could set up a test SiteWorx account on your server, with your upload script installed that we can access, and do tests on. You can open a support ticket to give us access.

Paul

I would have liked to do that right now, but I have had a little accident with my server (root shell access) which I think will most likely require a reinstall of the OS so I think I am going to put this problem on hold until I figure out a solution to the new problem.

As soon as I get everything figured out I will post a ticket with the siteworx info.

Here is the general CGI script I used, I have modified it slightly to fit my needs, but core code is the same, this is from the MegaUpload project (http://www.raditha.com/megaupload/)


#!/usr/bin/perl -w

# PHP File Uploader with progress bar Version 1.43
# Copyright (C) Raditha Dissanyake 2003
# http://www.raditha.com

# Licence:
# The contents of this file are subject to the Mozilla Public
# License Version 1.1 (the "License"); you may not use this file
# except in compliance with the License. You may obtain a copy of
# the License at http://www.mozilla.org/MPL/
# 
# Software distributed under this License is distributed on an "AS
# IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or
# implied. See the License for the specific language governing
# rights and limitations under the License.
# 
# The Initial Developer of the Original Code is Raditha Dissanayake.
# Portions created by Raditha are Copyright (C) 2003
# Raditha Dissanayake. All Rights Reserved.
# 

# CHANGES:
# As of version 1.00 cookies were abolished!
# as of version 1.02 stdin is no longer set to non blocking.
# 1.40 - POST is no longer required and processing is more efficient.
#	Please refer online docs  for details.
# 1.42 - The temporary locations were changed, to make it easier to
#	clean up afterwards.	

use CGI;
use Fcntl qw(:DEFAULT :flock);
use File::Temp qw/ tempfile tempdir /;
#use Carp;


@qstring=split(/&/,$ENV{'QUERY_STRING'});
@p1 = split(/=/,$qstring[0]);
$sessionid = $p1[1];
$sessionid =~ s/[^a-zA-Z0-9]//g;  # sanitized as suggested by Terrence Johnson.


require("./header.cgi");


#carp "$post_data_file and $monitor_file";


$content_type = $ENV{'CONTENT_TYPE'};
$len = $ENV{'CONTENT_LENGTH'};
$bRead=0;
$|=1;

sub bye_bye {
	$mes = shift;
	print "Content-type: text/html

";
	print "<br>$mes<br>
";

	exit;
}


# see if we are within the allowed limit.

if($len > $max_upload)
{
	close (STDIN);
	bye_bye("The maximum upload size has been exceeded");
}


#
# The thing to watch out for is file locking. Only
# one thread may open a file for writing at any given time.
# 

if (-e "$post_data_file") {
	unlink("$post_data_file");
}

if (-e "$monitor_file") {
	unlink("$monitor_file");
}


sysopen(FH, $monitor_file, O_RDWR | O_CREAT)
	or die "can't open numfile: $!";

# autoflush FH
$ofh = select(FH); $| = 1; select ($ofh);
flock(FH, LOCK_EX)
	or die "can't write-lock numfile: $!";
seek(FH, 0, 0)
	or die "can't rewind numfile : $!";
print FH $len;	
close(FH);	
	
sleep(1);


open(TMP,">","$post_data_file") or &bye_bye ("can't open temp file");
 

#
# read and store the raw post data on a temporary file so that we can
# pass it though to a CGI instance later on.
#



my $i=0;

$ofh = select(TMP); $| = 1; select ($ofh);
			
while (read (STDIN ,$LINE, 4096) && $bRead < $len )
{
	$bRead += length $LINE;
	
	#select(undef, undef, undef,0.35);	# sleep for 0.35 of a second.
	
	# Many thanx to Patrick Knoell who came up with the optimized value for
	# the duration of the sleep

	$i++;
	print TMP $LINE;
}

close (TMP);


#
# We don't want to decode the post data ourselves. That's like
# reinventing the wheel. If we handle the post data with the perl
# CGI module that means the PHP script does not get access to the
# files, but there is a way around this.
#
# We can ask the CGI module to save the files, then we can pass
# these filenames to the PHP script. In other words instead of
# giving the raw post data (which contains the 'bodies' of the
# files), we just send a list of file names.
#

open(STDIN,"$post_data_file") or die "can't open temp file";

my $cg = new CGI();
my $qstring="?";
my %vars = $cg->Vars;
my $j=0;

while(($key,$value) = each %vars)
{
 	
	$file_upload = $cg->param($key);

	if(defined $value && $value ne '')
	{	

		my $fh = $cg->upload($key);
		if(defined $fh)
		{
			#carp $fh;
			#($tmp_fh, $tmp_filename) = tempfile();
                  ($tmp_fh, $tmp_filename) = tempfile(DIR => "/phptemp/");

			while(<$fh>) {
				print $tmp_fh $_;
			}

			close($tmp_fh);

			$fsize =(-s $fh);
                  

                  $cnt = chmod 0777, '/phptemp/'. $tmp_filename;
			$fh =~ s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
			$tmp_filename =~ s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
                  $j_file='FzndkfOQ4K';
                  
                  $qstring .= "file[name][$j]=$fh&file[size][$j]=$fsize&";
			$qstring .= "tmp_name[name][$j]=$tmp_filename&cnt[$j]=$cnt&";
			$j++;
		}
		else
		{
			$value =~ s/([^a-zA-Z0-9_\-.])/uc sprintf("%%%02x",ord($1))/eg;
			$qstring .= "$key=$value&" ;
		}
	}
}



my $url = $php_uploader . $qstring . "&jobid=" . $jobid;


open (SIGNAL,">", $signal_file);
print SIGNAL "
";
close (SIGNAL);

print "Location: $url

";



Test Account Created

Hi Paul,

I have sent in a support ticket with the login information for the test siteworx account.

Thanks,
Justin

Something I just thought of

I tried the uploading on the new test account as well as another siteworx account and it seems to be working normal. So I stopped and thought what was different.

  1. The other 2 siteworx I was using was non-SSL, just regular http.
  2. The other 2 were just www.domain.com or domain.com while the account Im having problems with has the subdomain “secure” (ie secure.domain.com).

My guess would be a problem with the SSL rather than the subdomain, but I need to read up on how Mod_watch works.

While I was typing this post I had another idea… :cool:

I went to the siteworx account in question and looked at the webalizer for October. The monthly totals for “Kbytes” was 588532.
588532 / 1024 / 1024 =~ .56 GB

…which is a lot more than the .08 showing in NodeWorx. The way I see it this numbers should be exactly the same (or very very close).

Hope this helps.

[EDIT]

I forgot to also mention that CGI was turned off when the account was created in NodeWorx. There was no option in NodeWorx to turn it back on so I manually edit the config file for the site and added CGI access.

No rush guys

Just wanted to let you know I did a test on the SiteWorx account in question using regular HTTP and it counted the BW correctly, so it’s looking like a problem with mod_watch and SSL.

Another Test

I tested doing a file upload via a PHP script and both HTTPS and HTTP registered the BW usage normally.

So to sum up everything:
PHP - HTTPS and HTTP normal
PERL - HTTP normal / HTTPS fails

:confused:

This is on my list to check out Justin and I will get to it today.

Sorry (again) for the delay.

Chris

No Problem, not in a rush. I just wanted to make sure I got as much information I could on the topic to help you guys and as I get more info I’m just posting it here.

I just wanted to post a few comments and answer some general questions about the bandwidth system.

  1. SuexecUserGroup - has no effect on the bandwidth calculation. mod_watch is tied to each domain on the vhost level, not the unix user level. Any bandwidth uploaded / downloaded through a given vhost should be counted.

  2. Webalizer (or any log based stat program) will almost always show less usage than the mod_watch calc’d bandwidth (you had the exact opposite in your example, which i’ll address in #3). The reason is because log-based bandwidth calculators don’t include http headers in the calculation, and thus “miss” part of the xfer. mod_watch based stats are more accurate but even they miss tcp/ip overhead (which is miniscule).

  3. SSL / mod_watch - this combo should function the same as non-ssl / mod_watch but I’m looking into our implementation.

Justec, can you re-test your PHP vs. PERL SSL based uploads. The method shouldn’t matter as mod_watch just sees it all as HTTP traffic but I want to double check with you that your test is reproducable.

Chris

Understood, my guess is this is just do to the fact that mod_watch is not applying that BW usage to the virutal host

Test using the PERL upload script (I posted more detailed test results to my ticket):
HTTP - Passed
HTTPS - Passed
HTTPS with a subdomain - Failed

When uploading to http://domain.com or https://domain.com it works fine, but when uploading on https://subname.domain.com it doesn’t calculate the BW.

I have one new idea and it has to do with the ServerName in the Vhost file, would that have anything to do with this?

Any word on this issue?

Has anyone else experienced any issues with the MOD_WATCH, SSL, subdomain combination?

:confused:

Look at these two images, the BW spike is at the same time but in SiteWorx only shows about 50 Kbps upload while NodeWorx shows close to 150 Kbps.

http://justechnology.com/BW/Nodeworx.png
http://justechnology.com/BW/Siteworx.png