Life as Clay

Posts Tagged ‘Code

Processing class for finding normal vectors

leave a comment »

Processing uses the PVector class to store vector data. The .normalize() method will normalize any PVector to a unit vector. What I needed to do was to find the normal of a vector — a vector perpendicular to the original vector. This is easy enough to do with some simple math. However, I wanted to encapsulate the logic into a class so that I could store multiple perpendicular vectors (to an original vector) and different locations along the original vector, and at different magnitudes. That is what this class does:

 

class PerpVectorPack {
  // Store the original data
  PVector originalVector;

  // Keep track of values used to calculate points
  ArrayList<Float>   percentOfOriginalVector;
  ArrayList<Float>   magnitudeOfPerpendicularVector;
  ArrayList<Boolean> onRightSide;

  // Storage of calculated perpendicular line points
  ArrayList pointsOnLine;
  ArrayList perpVectorPoints;


  // Constructors
  PerpVectorPack(PVector origVector) {
    initializeLists();
    originalVector = origVector.get();
  }

  PerpVectorPack(PVector origVector, float perpVectDist, float perpVectMag, boolean rightSide) {    
    initializeLists();
    originalVector = origVector.get();
    addPerpVector(perpVectDist, perpVectMag, rightSide);
  }

  private void initializeLists() {
    pointsOnLine                    = new ArrayList();
    perpVectorPoints                = new ArrayList();

    percentOfOriginalVector         = new ArrayList<Float>();
    magnitudeOfPerpendicularVector  = new ArrayList<Float>();
    onRightSide                     = new ArrayList<Boolean>();
  }


  // Methods
  public void addPerpVector(float perpVectDist, float perpVectMag, boolean rightSide) {

    // Keep track of the values used to create the perpendicular points
    percentOfOriginalVector.add(perpVectDist);
    magnitudeOfPerpendicularVector.add(perpVectMag);
    onRightSide.add(rightSide);

    PVector full    = originalVector.get();
    float   fullMag = full.mag();

    // Determine where along the original vector the perpendicular vector should lie (lay?)
    full.normalize();
    full.mult(perpVectDist * fullMag);

    // Set the first local variable
    pointsOnLine.add(full.get());

    // Get the normal of the original full vector and normalize it
    PVector perpVector  = new PVector(-originalVector.y, originalVector.x);
    perpVector.normalize();

    // Determine if it should be on the right or left side of the original vector
    int dir = 1;
    if (!rightSide) dir = -1;

    // Make it the proper length
    perpVector.mult(fullMag * perpVectMag * dir);

    // Offset it from the origin
    perpVector.add(full);

    // Set the second local variable
    perpVectorPoints.add(perpVector.get());
  }

  public PVector ptOnOriginalLine() { 
    return ptOnOriginalLine(0);
  };
  
  public PVector ptOnOriginalLine(int idx) {
    PVector thePoint = (PVector)pointsOnLine.get(idx);
    return thePoint.get();
  }

  public PVector perpVectorPt() { 
    return perpVectorPt(0);
  };
  
  public PVector perpVectorPt(int idx) {
    PVector thePoint = (PVector)perpVectorPoints.get(idx);
    return thePoint.get();
  }
}

Written by Clay

February 21, 2013 at 22:44

Posted in Code, Processing, Resource

Tagged with ,

Paperclip, S3 & Delayed Job in Rails on Heroku – Jan 2012

with 2 comments

Edit: I forgot to mention here that with Paperclip 2.4.5 you have to use the ‘paperclip-aws’ gem in order for Paperclip to work with Amazon’s newer ‘aws-sdk’ gem. That is no longer true with Paperclip 2.5.

I followed this good tutorial on how to push paperclip image processing to the background with delayed_job. My Rails app is deployed to Heroku on the cedar stack. I’ve had problems with the NoMethod errors using delayed_job 3.0.0, so I downgraded to 2.1.4. Also using paperclip version 2.4.5. In the end, I found that I could ditch the struct presented in the aforementioned tutorial and just call handle_asynchronously on my reprocessing method. This is what the codes looks like:


class Image < ActiveRecord::Base
  attr_accessible :profile_pic,
                  :caption,
                  :note,
                  :pic_file_name,
                  :pic_content_type,
                  :pic_file_size,
                  :pic_updated_at,
                  :pic,
                  :pic_attributes,
                  :processing
  
  belongs_to :imageable, :polymorphic => true
  
  # Added for paperclip-aws
  def self.s3_config
      @@s3_config ||= YAML.load(ERB.new(File.read("#{Rails.root}/config/s3.yml")).result)[Rails.env]    
  end
  
  has_attached_file :pic,
                    :styles => { 
                      :large => "500x500>",
                      :thumb => "100x100>", 
                      :tiny => "50>x50>", 
                      :smallest => "24>x24>" },   
                      :default_url => '/:attachment/:style/missing.png',               
                      
                      # Added for paperclip-aws
                      :storage => :aws,
                      :s3_permissions => :authenticated_read,
                      :path => "images/:id/:style/:filename",
                      :s3_credentials => {
                        :access_key_id => self.s3_config['access_key_id'],
                        :secret_access_key => self.s3_config['secret_access_key']
                      },
                      :bucket => self.s3_config['bucket'],
                      :s3_protocol => "https"    
                      
  validates_attachment_content_type :pic, :content_type => [ /^image\/(?:jpeg|gif|png)$/, nil ]
  
  # How to implement on Heroku with processing in the background
  # http://madeofcode.com/posts/42-paperclip-s3-delayed-job-in-rails
  
  # cancel post-processing now, and set flag...
     before_pic_post_process do |image|
       if !image.processing && image.pic_changed?
         image.processing = true
         false # halts processing
       end
     end
     
     # call method from after_save that will be processed in the background
     after_save do |image| 
       if image.processing
         processImageJob(image)
       end
     end
     
     def processImageJob(image)
         image.regenerate_styles!
     end
     handle_asynchronously :processImageJob
     
     # generate styles (downloads original first)
     def regenerate_styles!
       self.pic.reprocess! 
       self.processing = false   
       self.save(:validations => false)
     end
     
     # detect if our source file has changed
     def pic_changed?
       self.pic_file_size_changed? || 
       self.pic_file_name_changed? ||
       self.pic_content_type_changed? || 
       self.pic_updated_at_changed?
     end
                      
end

Written by Clay

January 12, 2012 at 14:29

Posted in Code, Rails, Ruby

Tagged with , , , , ,

Javascript snippet to display elapsed time in words

leave a comment »

Send in two strings in the format indicated and return a string that represents the time elapsed between the two inputs, in words.

function findDuration(startTime, endTime) {
	
	// Accepts input of two time stamps in this format:
	// 2011-11-22T12:40:00-05:00
	
	var year_start    	= parseInt(startTime.substring(0,4), 10);
	var month_start   	= parseInt(startTime.substring(5,7), 10);
	var day_start     	= parseInt(startTime.substring(8,10), 10);
	var hours_start   	= parseInt(startTime.substring(11,13), 10);
	var minutes_start 	= parseInt(startTime.substring(14,16), 10);
	
	var year_end    	= parseInt(endTime.substring(0,4), 10);
	var month_end   	= parseInt(endTime.substring(5,7), 10);
	var day_end     	= parseInt(endTime.substring(8,10), 10);
	var hours_end   	= parseInt(endTime.substring(11,13), 10);
	var minutes_end 	= parseInt(endTime.substring(14,16), 10);
	
    var years_total    	= 0;
    var months_total   	= 0;
    var days_total     	= 0;
    var hours_total    	= 0;
    var minutes_total  	= 0;
    
    var earlier_date = new Date();
    earlier_date.setFullYear(year_start);
    earlier_date.setMonth(month_start - 1);
    earlier_date.setDate(day_start);
    earlier_date.setHours(hours_start);
    earlier_date.setMinutes(minutes_start);
    
    var later_date = new Date();
    later_date.setFullYear(year_end);
    later_date.setMonth(month_end - 1);
    later_date.setDate(day_end);
    later_date.setHours(hours_end);
    later_date.setMinutes(minutes_end);
    
    var nTotalDiff 	= later_date.getTime() - earlier_date.getTime();
    var oDiff 		= new Object();
    
    oDiff.days 		= Math.floor(nTotalDiff/1000/60/60/24);
    nTotalDiff 	   -= oDiff.days*1000*60*60*24;
    
    oDiff.hours 	= Math.floor(nTotalDiff/1000/60/60);
    nTotalDiff 	   -= oDiff.hours*1000*60*60;
    
    oDiff.minutes 	= Math.floor(nTotalDiff/1000/60);
    nTotalDiff 	   -= oDiff.minutes*1000*60;
    
    oDiff.seconds 	= Math.floor(nTotalDiff/1000);
    
    oDiff.years 	= 0;
    if (oDiff.days > 364) {
        oDiff.years = Math.floor(oDiff.days / 365);
        oDiff.days 	= oDiff.days - (oDiff.years * 365) - 1;
    }
    
    var diff_string = '';
    if (oDiff.years > 0) {
        if (oDiff.years == 1) {
            diff_string += oDiff.years + " year ";
        } else {
            diff_string += oDiff.years + " years ";
        }
    }
    if (oDiff.days > 0) {
        if (oDiff.days == 1) {
            diff_string += oDiff.days + " day ";
        } else {
            diff_string += oDiff.days + " days ";
        }
    }
    if (oDiff.hours > 0) {
        if (oDiff.hours == 1) {
            diff_string += oDiff.hours + " hour ";
        } else {
            diff_string += oDiff.hours + " hours ";
        }
    }
    if (oDiff.minutes > 0) {
        if (oDiff.minutes == 1) {
            diff_string += oDiff.minutes + " minute ";
        } else {
            diff_string += oDiff.minutes + " minutes";
        }
    }

    // Add support for seconds here, if you want it.
    
    return diff_string;
}

Written by Clay

November 25, 2011 at 09:58

Posted in Code, javascript

Tagged with , ,

Parsing US Census data with Ruby

leave a comment »

I am a fan of population pyramids as a visualization of age demographics. They are simple and effective. Sometimes getting the data to generate them isn’t….

I’m building a simple Rails app for a client who needs county-level population data. It doesn’t have to be precise. I went with the 2009 estimates from the Census bureau, located here:
http://www.census.gov/popest/counties/asrh/CC-EST2009-alldata.html

There were a few issues:
1. Each state’s county-level data is in a different file. You can order a CD that has all of the data in a single file, but I didn’t have time to do that.
2. There’s a ton of data that I didn’t need in each file. All I needed were the county totals and the totals for male/female for each age group.
3. The data is in rows — a different row for each age group. I needed the data in columns, with one row for each county.

What I did:
1. Downloaded all 51 of the files (DC has its own file) and put them in a folder on my desktop.
2. Changed the extension on them to .csv (because it changed to .txt when I downloaded, for some reason).
3. Wrote the Ruby script below, which extracts the data that I need and puts it into a single .csv file.
4. Saved the script to the same folder as the Census files as script.rb.
5. Opened terminal and ran ruby script.rb
4. Reconciled the data with my Excel sheet that I was using to collect county-level data so that I could seed my Rails app’s counties table.

I comment my code pretty heavily when I write it so that I can look back and remember what I did. I’ve been burned in the past by not commenting thoroughly.

Here’s the script:

# Script to pull population data out of census files for each county
# Original data files are here: http://www.census.gov/popest/counties/asrh/CC-EST2009-alldata.html

require 'CSV'

# These are the FIPS codes for each state which are used in file names.
state_fips_codes = ["01", "02", "04", "05", "06", "08", "09", "10", "11", "12", "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", "56"]

add_header = true

# This is the file that all of the data will go into...
File.open("county_age_data.csv", "w") do |output_file|
  
  # counter for iterating through the files
  n = 0
  
  state_fips_codes.length.times do  # This loop goes through all of the files
    
    # Read in the relevant file
    state_data = CSV.parse(File.read("cc-est2009-alldata-" + state_fips_codes[n] + ".csv"))
    puts "Processing data for " + state_data[1][3]

    # Add the header to the file
    if add_header == true
      output_file.puts "state_fips,county_fips,state_name,county_name,:census_total_pop,:census_total_male,:census_grp01_male,:census_grp02_male,:census_grp03_male,:census_grp04_male,:census_grp05_male,:census_grp06_male,:census_grp07_male,:census_grp08_male,:census_grp09_male,:census_grp10_male,:census_grp11_male,:census_grp12_male,:census_grp13_male,:census_grp14_male,:census_grp15_male,:census_grp16_male,:census_grp17_male,:census_grp18_male,:census_total_female,:census_grp01_female,:census_grp02_female,:census_grp03_female,:census_grp04_female,:census_grp05_female,:census_grp06_female,:census_grp07_female,:census_grp08_female,:census_grp09_female,:census_grp10_female,:census_grp11_female,:census_grp12_female,:census_grp13_female,:census_grp14_female,:census_grp15_female,:census_grp16_female,:census_grp17_female,:census_grp18_female"
      add_header = false
    end

    # Iterate through the loaded file.
    row = 1
    while row < state_data.length
      if state_data[row][5].to_i == 12 # Only take data from year coded 12 (2009 estimates)

        ########## MAIN LOGIC #############
        if state_data[row][6].to_i == 0 # It's a new county because this is the group 0 row with totals

          # Reset the data arrays
          row_data    = []
          male_data   = []
          female_data = []

          row_data.push state_data[row][1].to_s     # state_fips
          row_data.push state_data[row][2].to_s     # county_fips
          row_data.push state_data[row][3].to_s     # state_name
          row_data.push state_data[row][4].to_s     # county_name
          row_data.push state_data[row][7].to_s     # census_total_pop

          male_data.push state_data[row][8].to_s    # census_total_male
          female_data.push state_data[row][9].to_s  # census_total_female

        elsif state_data[row][6].to_i == 18 # It's the last row -- push the data into the array

          # push in the final data points
          male_data.push state_data[row][8].to_s    # census_grp18_male
          female_data.push state_data[row][9].to_s  # census_grp18_female

          # append all of the data to the row_data array
          row_data += male_data
          row_data += female_data

          # write the data to the file
          output_file.puts row_data.join(",")

        else # It's not the first or last row

          male_data.push state_data[row][8].to_s    # census_grpXX_male
          female_data.push state_data[row][9].to_s  # census_grpXX_female

        end # End of the if statement for pushing data into arrays
        ################ END OF MAIN LOGIC ####################
        
      end       # End of the if statement to check if it is year 10
      
      row += 1  # Look at the next row
    end

    n += 1      # Increment the counter
  end           # End of loop for going through each file
end

Written by Clay

July 15, 2010 at 21:17

Posted in Code, Ruby

Tagged with , , ,

Resource: Rails country seeds with ISO codes

leave a comment »

In case you need to seed your Rails app with countries. You can use the ISO codes to display flag icons, like these linked ones from famfamfam.com: link.

 countries = Country.create([
                               {:name => 'Åland Islands', :iso_code => 'AX'},
                               {:name => 'Albania', :iso_code => 'AL'},
                               {:name => 'Algeria', :iso_code => 'DZ'},
                               {:name => 'American Samoa', :iso_code => 'AS'},
                               {:name => 'Andorra', :iso_code => 'AD'},
                               {:name => 'Angola', :iso_code => 'AO'},
                               {:name => 'Anguilla', :iso_code => 'AI'},
                               {:name => 'Antarctica', :iso_code => 'AQ'},
                               {:name => 'Antigua and Barbuda', :iso_code => 'AG'},
                               {:name => 'Argentina', :iso_code => 'AR'},
                               {:name => 'Armenia', :iso_code => 'AM'},
                               {:name => 'Aruba', :iso_code => 'AW'},
                               {:name => 'Australia', :iso_code => 'AU'},
                               {:name => 'Austria', :iso_code => 'AT'},
                               {:name => 'Azerbaijan', :iso_code => 'AZ'},
                               {:name => 'Bahamas', :iso_code => 'BS'},
                               {:name => 'Bahrain', :iso_code => 'BH'},
                               {:name => 'Bangladesh', :iso_code => 'BD'},
                               {:name => 'Barbados', :iso_code => 'BB'},
                               {:name => 'Belarus', :iso_code => 'BY'},
                               {:name => 'Belgium', :iso_code => 'BE'},
                               {:name => 'Belize', :iso_code => 'BZ'},
                               {:name => 'Benin', :iso_code => 'BJ'},
                               {:name => 'Bermuda', :iso_code => 'BM'},
                               {:name => 'Bhutan', :iso_code => 'BT'},
                               {:name => 'Bolivia', :iso_code => 'BO'},
                               {:name => 'Bosnia and Herzegovina', :iso_code => 'BA'},
                               {:name => 'Botswana', :iso_code => 'BW'},
                               {:name => 'Bouvet Island', :iso_code => 'BV'},
                               {:name => 'Brazil', :iso_code => 'BR'},
                               {:name => 'British Indian Ocean Territory', :iso_code => 'IO'},
                               {:name => 'Brunei Darussalam', :iso_code => 'BN'},
                               {:name => 'Bulgaria', :iso_code => 'BG'},
                               {:name => 'Burkina Faso', :iso_code => 'BF'},
                               {:name => 'Burundi', :iso_code => 'BI'},
                               {:name => 'Cambodia', :iso_code => 'KH'},
                               {:name => 'Cameroon', :iso_code => 'CM'},
                               {:name => 'Canada', :iso_code => 'CA'},
                               {:name => 'Cape Verde', :iso_code => 'CV'},
                               {:name => 'Cayman Islands', :iso_code => 'KY'},
                               {:name => 'Central African Republic', :iso_code => 'CF'},
                               {:name => 'Chad', :iso_code => 'TD'},
                               {:name => 'Chile', :iso_code => 'CL'},
                               {:name => 'China', :iso_code => 'CN'},
                               {:name => 'Christmas Island', :iso_code => 'CX'},
                               {:name => 'Cocos (Keeling) Islands', :iso_code => 'CC'},
                               {:name => 'Colombia', :iso_code => 'CO'},
                               {:name => 'Comoros', :iso_code => 'KM'},
                               {:name => 'Congo', :iso_code => 'CG'},
                               {:name => 'Congo, The Democratic Republic of The', :iso_code => 'CD'},
                               {:name => 'Cook Islands', :iso_code => 'CK'},
                               {:name => 'Costa Rica', :iso_code => 'CR'},
                               {:name => 'Côte D\'Ivoire', :iso_code => 'CI'},
                               {:name => 'Croatia', :iso_code => 'HR'},
                               {:name => 'Cuba', :iso_code => 'CU'},
                               {:name => 'Cyprus', :iso_code => 'CY'},
                               {:name => 'Czech Republic', :iso_code => 'CZ'},
                               {:name => 'Denmark', :iso_code => 'DK'},
                               {:name => 'Djibouti', :iso_code => 'DJ'},
                               {:name => 'Dominica', :iso_code => 'DM'},
                               {:name => 'Dominican Republic', :iso_code => 'DO'},
                               {:name => 'Ecuador', :iso_code => 'EC'},
                               {:name => 'Egypt', :iso_code => 'EG'},
                               {:name => 'El Salvador', :iso_code => 'SV'},
                               {:name => 'Equatorial Guinea', :iso_code => 'GQ'},
                               {:name => 'Eritrea', :iso_code => 'ER'},
                               {:name => 'Estonia', :iso_code => 'EE'},
                               {:name => 'Ethiopia', :iso_code => 'ET'},
                               {:name => 'Falkland Islands (Malvinas)', :iso_code => 'FK'},
                               {:name => 'Faroe Islands', :iso_code => 'FO'},
                               {:name => 'Fiji', :iso_code => 'FJ'},
                               {:name => 'Finland', :iso_code => 'FI'},
                               {:name => 'France', :iso_code => 'FR'},
                               {:name => 'French Guiana', :iso_code => 'GF'},
                               {:name => 'French Polynesia', :iso_code => 'PF'},
                               {:name => 'French Southern Territories', :iso_code => 'TF'},
                               {:name => 'Gabon', :iso_code => 'GA'},
                               {:name => 'Gambia', :iso_code => 'GM'},
                               {:name => 'Georgia', :iso_code => 'GE'},
                               {:name => 'Germany', :iso_code => 'DE'},
                               {:name => 'Ghana', :iso_code => 'GH'},
                               {:name => 'Gibraltar', :iso_code => 'GI'},
                               {:name => 'Greece', :iso_code => 'GR'},
                               {:name => 'Greenland', :iso_code => 'GL'},
                               {:name => 'Grenada', :iso_code => 'GD'},
                               {:name => 'Guadeloupe', :iso_code => 'GP'},
                               {:name => 'Guam', :iso_code => 'GU'},
                               {:name => 'Guatemala', :iso_code => 'GT'},
                               {:name => 'Guernsey', :iso_code => 'GG'},
                               {:name => 'Guinea', :iso_code => 'GN'},
                               {:name => 'Guinea-Bissau', :iso_code => 'GW'},
                               {:name => 'Guyana', :iso_code => 'GY'},
                               {:name => 'Haiti', :iso_code => 'HT'},
                               {:name => 'Heard Island and Mcdonald Islands', :iso_code => 'HM'},
                               {:name => 'Holy See (Vatican City State)', :iso_code => 'VA'},
                               {:name => 'Honduras', :iso_code => 'HN'},
                               {:name => 'Hong Kong', :iso_code => 'HK'},
                               {:name => 'Hungary', :iso_code => 'HU'},
                               {:name => 'Iceland', :iso_code => 'IS'},
                               {:name => 'India', :iso_code => 'IN'},
                               {:name => 'Indonesia', :iso_code => 'ID'},
                               {:name => 'Iran, Islamic Republic of', :iso_code => 'IR'},
                               {:name => 'Iraq', :iso_code => 'IQ'},
                               {:name => 'Ireland', :iso_code => 'IE'},
                               {:name => 'Isle of Man', :iso_code => 'IM'},
                               {:name => 'Israel', :iso_code => 'IL'},
                               {:name => 'Italy', :iso_code => 'IT'},
                               {:name => 'Jamaica', :iso_code => 'JM'},
                               {:name => 'Japan', :iso_code => 'JP'},
                               {:name => 'Jersey', :iso_code => 'JE'},
                               {:name => 'Jordan', :iso_code => 'JO'},
                               {:name => 'Kazakhstan', :iso_code => 'KZ'},
                               {:name => 'Kenya', :iso_code => 'KE'},
                               {:name => 'Kiribati', :iso_code => 'KI'},
                               {:name => 'Korea, Democratic People\'s Republic of', :iso_code => 'KP'},
                               {:name => 'Korea, Republic of', :iso_code => 'KR'},
                               {:name => 'Kuwait', :iso_code => 'KW'},
                               {:name => 'Kyrgyzstan', :iso_code => 'KG'},
                               {:name => 'Lao People\'s Democratic Republic', :iso_code => 'LA'},
                               {:name => 'Latvia', :iso_code => 'LV'},
                               {:name => 'Lebanon', :iso_code => 'LB'},
                               {:name => 'Lesotho', :iso_code => 'LS'},
                               {:name => 'Liberia', :iso_code => 'LR'},
                               {:name => 'Libyan Arab Jamahiriya', :iso_code => 'LY'},
                               {:name => 'Liechtenstein', :iso_code => 'LI'},
                               {:name => 'Lithuania', :iso_code => 'LT'},
                               {:name => 'Luxembourg', :iso_code => 'LU'},
                               {:name => 'Macao', :iso_code => 'MO'},
                               {:name => 'Macedonia, The Former Yugoslav Republic of', :iso_code => 'MK'},
                               {:name => 'Madagascar', :iso_code => 'MG'},
                               {:name => 'Malawi', :iso_code => 'MW'},
                               {:name => 'Malaysia', :iso_code => 'MY'},
                               {:name => 'Maldives', :iso_code => 'MV'},
                               {:name => 'Mali', :iso_code => 'ML'},
                               {:name => 'Malta', :iso_code => 'MT'},
                               {:name => 'Marshall Islands', :iso_code => 'MH'},
                               {:name => 'Martinique', :iso_code => 'MQ'},
                               {:name => 'Mauritania', :iso_code => 'MR'},
                               {:name => 'Mauritius', :iso_code => 'MU'},
                               {:name => 'Mayotte', :iso_code => 'YT'},
                               {:name => 'Mexico', :iso_code => 'MX'},
                               {:name => 'Micronesia, Federated States of', :iso_code => 'FM'},
                               {:name => 'Moldova, Republic of', :iso_code => 'MD'},
                               {:name => 'Monaco', :iso_code => 'MC'},
                               {:name => 'Mongolia', :iso_code => 'MN'},
                               {:name => 'Montenegro', :iso_code => 'ME'},
                               {:name => 'Montserrat', :iso_code => 'MS'},
                               {:name => 'Morocco', :iso_code => 'MA'},
                               {:name => 'Mozambique', :iso_code => 'MZ'},
                               {:name => 'Myanmar', :iso_code => 'MM'},
                               {:name => 'Namibia', :iso_code => 'NA'},
                               {:name => 'Nauru', :iso_code => 'NR'},
                               {:name => 'Nepal', :iso_code => 'NP'},
                               {:name => 'Netherlands', :iso_code => 'NL'},
                               {:name => 'Netherlands Antilles', :iso_code => 'AN'},
                               {:name => 'New Caledonia', :iso_code => 'NC'},
                               {:name => 'New Zealand', :iso_code => 'NZ'},
                               {:name => 'Nicaragua', :iso_code => 'NI'},
                               {:name => 'Niger', :iso_code => 'NE'},
                               {:name => 'Nigeria', :iso_code => 'NG'},
                               {:name => 'Niue', :iso_code => 'NU'},
                               {:name => 'Norfolk Island', :iso_code => 'NF'},
                               {:name => 'Northern Mariana Islands', :iso_code => 'MP'},
                               {:name => 'Norway', :iso_code => 'NO'},
                               {:name => 'Oman', :iso_code => 'OM'},
                               {:name => 'Pakistan', :iso_code => 'PK'},
                               {:name => 'Palau', :iso_code => 'PW'},
                               {:name => 'Palestinian Territory, Occupied', :iso_code => 'PS'},
                               {:name => 'Panama', :iso_code => 'PA'},
                               {:name => 'Papua New Guinea', :iso_code => 'PG'},
                               {:name => 'Paraguay', :iso_code => 'PY'},
                               {:name => 'Peru', :iso_code => 'PE'},
                               {:name => 'Philippines', :iso_code => 'PH'},
                               {:name => 'Pitcairn', :iso_code => 'PN'},
                               {:name => 'Poland', :iso_code => 'PL'},
                               {:name => 'Portugal', :iso_code => 'PT'},
                               {:name => 'Puerto Rico', :iso_code => 'PR'},
                               {:name => 'Qatar', :iso_code => 'QA'},
                               {:name => 'Réunion', :iso_code => 'RE'},
                               {:name => 'Romania', :iso_code => 'RO'},
                               {:name => 'Russian Federation', :iso_code => 'RU'},
                               {:name => 'Rwanda', :iso_code => 'RW'},
                               {:name => 'Saint Barthélemy', :iso_code => 'BL'},
                               {:name => 'Saint Helena, Ascension and Tristan Da Cunha', :iso_code => 'SH'},
                               {:name => 'Saint Kitts and Nevis', :iso_code => 'KN'},
                               {:name => 'Saint Lucia', :iso_code => 'LC'},
                               {:name => 'Saint Martin', :iso_code => 'MF'},
                               {:name => 'Saint Pierre and Miquelon', :iso_code => 'PM'},
                               {:name => 'Saint Vincent and The Grenadines', :iso_code => 'VC'},
                               {:name => 'Samoa', :iso_code => 'WS'},
                               {:name => 'San Marino', :iso_code => 'SM'},
                               {:name => 'Sao Tome and Principe', :iso_code => 'ST'},
                               {:name => 'Saudi Arabia', :iso_code => 'SA'},
                               {:name => 'Senegal', :iso_code => 'SN'},
                               {:name => 'Serbia', :iso_code => 'RS'},
                               {:name => 'Seychelles', :iso_code => 'SC'},
                               {:name => 'Sierra Leone', :iso_code => 'SL'},
                               {:name => 'Singapore', :iso_code => 'SG'},
                               {:name => 'Slovakia', :iso_code => 'SK'},
                               {:name => 'Slovenia', :iso_code => 'SI'},
                               {:name => 'Solomon Islands', :iso_code => 'SB'},
                               {:name => 'Somalia', :iso_code => 'SO'},
                               {:name => 'South Africa', :iso_code => 'ZA'},
                               {:name => 'South Georgia and The South Sandwich Islands', :iso_code => 'GS'},
                               {:name => 'Spain', :iso_code => 'ES'},
                               {:name => 'Sri Lanka', :iso_code => 'LK'},
                               {:name => 'Sudan', :iso_code => 'SD'},
                               {:name => 'Suriname', :iso_code => 'SR'},
                               {:name => 'Svalbard and Jan Mayen', :iso_code => 'SJ'},
                               {:name => 'Swaziland', :iso_code => 'SZ'},
                               {:name => 'Sweden', :iso_code => 'SE'},
                               {:name => 'Switzerland', :iso_code => 'CH'},
                               {:name => 'Syrian Arab Republic', :iso_code => 'SY'},
                               {:name => 'Taiwan, Province of China', :iso_code => 'TW'},
                               {:name => 'Tajikistan', :iso_code => 'TJ'},
                               {:name => 'Tanzania, United Republic of', :iso_code => 'TZ'},
                               {:name => 'Thailand', :iso_code => 'TH'},
                               {:name => 'Timor-Leste', :iso_code => 'TL'},
                               {:name => 'Togo', :iso_code => 'TG'},
                               {:name => 'Tokelau', :iso_code => 'TK'},
                               {:name => 'Tonga', :iso_code => 'TO'},
                               {:name => 'Trinidad and Tobago', :iso_code => 'TT'},
                               {:name => 'Tunisia', :iso_code => 'TN'},
                               {:name => 'Turkey', :iso_code => 'TR'},
                               {:name => 'Turkmenistan', :iso_code => 'TM'},
                               {:name => 'Turks and Caicos Islands', :iso_code => 'TC'},
                               {:name => 'Tuvalu', :iso_code => 'TV'},
                               {:name => 'Uganda', :iso_code => 'UG'},
                               {:name => 'Ukraine', :iso_code => 'UA'},
                               {:name => 'United Arab Emirates', :iso_code => 'AE'},
                               {:name => 'United Kingdom', :iso_code => 'GB'},
                               {:name => 'United States', :iso_code => 'US'},
                               {:name => 'United States Minor Outlying Islands', :iso_code => 'UM'},
                               {:name => 'Uruguay', :iso_code => 'UY'},
                               {:name => 'Uzbekistan', :iso_code => 'UZ'},
                               {:name => 'Vanuatu', :iso_code => 'VU'},
                               {:name => 'Venezuela, Bolivarian Republic of', :iso_code => 'VE'},
                               {:name => 'Viet Nam', :iso_code => 'VN'},
                               {:name => 'Virgin Islands, British', :iso_code => 'VG'},
                               {:name => 'Virgin Islands, U.S.', :iso_code => 'VI'},
                               {:name => 'Wallis and Futuna', :iso_code => 'WF'},
                               {:name => 'Western Sahara', :iso_code => 'EH'},
                               {:name => 'Yemen', :iso_code => 'YE'},
                               {:name => 'Zambia', :iso_code => 'ZM'},
                               {:name => 'Zimbabwe', :iso_code => 'ZW'}
 ])

Written by Clay

July 13, 2010 at 17:54

Posted in Code, Resource, Ruby

Tagged with , , ,

Resource: Rails database seeds for US States, including ANSI codes

with one comment

In case you need to add US States to your Rails app. The data are from the US Census website. I’m not sure what the purpose of the “Geographic Names Information System Identifier (GNISID)” is (the number after the State name), but it’s included here in case you need to tie back to the census system.

states = State.create([
                        { :name => 'Alabama', :abbr => 'AL', :ansi_code => '1', :statens => '1779775'},
                        { :name => 'Alaska', :abbr => 'AK', :ansi_code => '2', :statens => '1785533'},
                        { :name => 'Arizona', :abbr => 'AZ', :ansi_code => '4', :statens => '1779777'},
                        { :name => 'Arkansas', :abbr => 'AR', :ansi_code => '5', :statens => '68085'},
                        { :name => 'California', :abbr => 'CA', :ansi_code => '6', :statens => '1779778'},
                        { :name => 'Colorado', :abbr => 'CO', :ansi_code => '8', :statens => '1779779'},
                        { :name => 'Connecticut', :abbr => 'CT', :ansi_code => '9', :statens => '1779780'},
                        { :name => 'Delaware', :abbr => 'DE', :ansi_code => '10', :statens => '1779781'},
                        { :name => 'District of Columbia', :abbr => 'DC', :ansi_code => '11', :statens => '1702382'},
                        { :name => 'Florida', :abbr => 'FL', :ansi_code => '12', :statens => '294478'},
                        { :name => 'Georgia', :abbr => 'GA', :ansi_code => '13', :statens => '1705317'},
                        { :name => 'Hawaii', :abbr => 'HI', :ansi_code => '15', :statens => '1779782'},
                        { :name => 'Idaho', :abbr => 'ID', :ansi_code => '16', :statens => '1779783'},
                        { :name => 'Illinois', :abbr => 'IL', :ansi_code => '17', :statens => '1779784'},
                        { :name => 'Indiana', :abbr => 'IN', :ansi_code => '18', :statens => '448508'},
                        { :name => 'Iowa', :abbr => 'IA', :ansi_code => '19', :statens => '1779785'},
                        { :name => 'Kansas', :abbr => 'KS', :ansi_code => '20', :statens => '481813'},
                        { :name => 'Kentucky', :abbr => 'KY', :ansi_code => '21', :statens => '1779786'},
                        { :name => 'Louisiana', :abbr => 'LA', :ansi_code => '22', :statens => '1629543'},
                        { :name => 'Maine', :abbr => 'ME', :ansi_code => '23', :statens => '1779787'},
                        { :name => 'Maryland', :abbr => 'MD', :ansi_code => '24', :statens => '1714934'},
                        { :name => 'Massachusetts', :abbr => 'MA', :ansi_code => '25', :statens => '606926'},
                        { :name => 'Michigan', :abbr => 'MI', :ansi_code => '26', :statens => '1779789'},
                        { :name => 'Minnesota', :abbr => 'MN', :ansi_code => '27', :statens => '662849'},
                        { :name => 'Mississippi', :abbr => 'MS', :ansi_code => '28', :statens => '1779790'},
                        { :name => 'Missouri', :abbr => 'MO', :ansi_code => '29', :statens => '1779791'},
                        { :name => 'Montana', :abbr => 'MT', :ansi_code => '30', :statens => '767982'},
                        { :name => 'Nebraska', :abbr => 'NE', :ansi_code => '31', :statens => '1779792'},
                        { :name => 'Nevada', :abbr => 'NV', :ansi_code => '32', :statens => '1779793'},
                        { :name => 'New Hampshire', :abbr => 'NH', :ansi_code => '33', :statens => '1779794'},
                        { :name => 'New Jersey', :abbr => 'NJ', :ansi_code => '34', :statens => '1779795'},
                        { :name => 'New Mexico', :abbr => 'NM', :ansi_code => '35', :statens => '897535'},
                        { :name => 'New York', :abbr => 'NY', :ansi_code => '36', :statens => '1779796'},
                        { :name => 'North Carolina', :abbr => 'NC', :ansi_code => '37', :statens => '1027616'},
                        { :name => 'North Dakota', :abbr => 'ND', :ansi_code => '38', :statens => '1779797'},
                        { :name => 'Ohio', :abbr => 'OH', :ansi_code => '39', :statens => '1085497'},
                        { :name => 'Oklahoma', :abbr => 'OK', :ansi_code => '40', :statens => '1102857'},
                        { :name => 'Oregon', :abbr => 'OR', :ansi_code => '41', :statens => '1155107'},
                        { :name => 'Pennsylvania', :abbr => 'PA', :ansi_code => '42', :statens => '1779798'},
                        { :name => 'Rhode Island', :abbr => 'RI', :ansi_code => '44', :statens => '1219835'},
                        { :name => 'South Carolina', :abbr => 'SC', :ansi_code => '45', :statens => '1779799'},
                        { :name => 'South Dakota', :abbr => 'SD', :ansi_code => '46', :statens => '1785534'},
                        { :name => 'Tennessee', :abbr => 'TN', :ansi_code => '47', :statens => '1325873'},
                        { :name => 'Texas', :abbr => 'TX', :ansi_code => '48', :statens => '1779801'},
                        { :name => 'Utah', :abbr => 'UT', :ansi_code => '49', :statens => '1455989'},
                        { :name => 'Vermont', :abbr => 'VT', :ansi_code => '50', :statens => '1779802'},
                        { :name => 'Virginia', :abbr => 'VA', :ansi_code => '51', :statens => '1779803'},
                        { :name => 'Washington', :abbr => 'WA', :ansi_code => '53', :statens => '1779804'},
                        { :name => 'West Virginia', :abbr => 'WV', :ansi_code => '54', :statens => '1779805'},
                        { :name => 'Wisconsin', :abbr => 'WI', :ansi_code => '55', :statens => '1779806'},
                        { :name => 'Wyoming', :abbr => 'WY', :ansi_code => '56', :statens => '1779807'}
])

Written by Clay

July 13, 2010 at 17:49

Posted in Code, Resource, Ruby

Tagged with , , ,

Using Nokogiri and Ruby on Rails to grab citations from PubMed

with 12 comments

Update: I put a working example up on github (Rails 3.0.4 and Nokogiri 1.4.4): https://github.com/ccheaton/pubnoko

##############################

Following on the last post… I needed to provide to a client a custom database of PubMed citations, links, datasets, and other information related to a product area I was researching. The hardest part about this was understanding PubMed’s eutils and their .xml formats.

I did not want to manually enter the citation information. I wanted to avoid Zotero and other similar services because I wanted to integrate the data across domains and use the database that I was developing for them as the basis for a paper I wrote. For the PubMed citations, what I wanted to be able to do was to go to PubMed, perform a search, find an interesting citation, copy the PubMed URL, paste it into my Rails application, which would use the URL to query PubMed in XML format and populate my app’s database with that entry.

I have two models that play into this interaction: Article and Journal. I should note that I’m using Nokogiri to parse the XML and thoughtbot’s Paperclip gem to attach articles when they are available freely. This was done using Rails 2.3.5 and Ruby 1.8.7.

Here’s how the model files look:

journal.rb

class Journal < ActiveRecord::Base   attr_accessible :name, :description, :url, :short_name      has_many :articles      validates_uniqueness_of :name   validates_presence_of :short_name   validates_presence_of :name      before_save :make_short_name      def best_name     if !self.short_name.nil? && self.short_name.length > 0
      return self.short_name
    else
      return self.name
    end
  end

  private

  # Makes sure that a new journal has a short name in case one isn't entered.
  def make_short_name
    if self.short_name.nil? || self.short_name.length == 0
      self.short_name = self.name
    end
  end
end

The Article model is a little more complicated. The logic that I’m using the parse the XML isn’t great, but I’ve found that the structure changes from article to article. I’m still refining it and will update it when I’m finished, after I’ve found the 90% of article that auto-import.

article.rb

class Article < ActiveRecord::Base   require 'open-uri'   attr_accessible :pubmedlink,                    :article_title,                    :abstract,                    :authors,                    :affiliations,                    :pubdate,                    :pubmedid,                    :journal_id,                    :journal_volume,                    :journal_issue,                    :journal_pages,                   :fetched,                   :thearticle_file_name,                   :thearticle_content_type,                   :thearticle_file_size,                   :thearticle_updated_at,                   :thearticle                      has_attached_file :thearticle,                     :path => ":rails_root/public/system/uploads/:class/:id/:basename.:extension",
                    :url => "/system/uploads/:class/:id/:basename.:extension"
  belongs_to :journal
  before_save :make_pubmedid
  validates_uniqueness_of :article_title

  def citation
    cit = ""
    if self.journal_id != nil && self.journal_id > 0
      cit += self.journal.best_name + ". " + self.pubdate.year.to_s + " " + self.pubdate.strftime("%b")
    end
    if self.journal_volume != nil && self.journal_volume.length > 0
      cit += "; " + self.journal_volume
    end
    if self.journal_issue != nil && self.journal_issue.length > 0
      cit += "(" + self.journal_issue + ")"
    end
    if self.journal_pages != nil && self.journal_pages.length > 0
      cit += ": " + self.journal_pages
    end
      return cit
  end

  private

  def make_pubmedid
    if self.pubmedlink != nil && self.pubmedlink.length > 0 && self.fetched == false
      regex = Regexp.new(/[0-9]{4,10}/)
      matchdata = regex.match(self.pubmedlink)
      self.pubmedid = matchdata[0]
      pull_pubmed_data(matchdata[0])
      self.fetched = true
    end
  end

  def pull_pubmed_data(theID)
    @doc = Nokogiri::XML(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=" + theID.to_s + "&retmode=xml").read)

    # The title and abstract
    self.article_title = @doc.xpath("//ArticleTitle").collect(&:text).to_s
    self.abstract = @doc.xpath("//AbstractText").collect(&:text).to_s

    # Setting up the authors
    first_names = @doc.xpath("//ForeName").collect(&:text)
    last_names = @doc.xpath("//LastName").collect(&:text)
    full_names = []

    i = 0
    last_names.length.times do
      the_name = first_names[i].to_s + " " + last_names[i].to_s
      full_names.push the_name
      i += 1
    end
    self.authors = full_names.join(", ")

    # Affiliations
    self.affiliations = @doc.xpath("//Affiliation").collect(&:text).to_s

    # Publication Date - Check if the complete date is at the top. If not use the pub med date.
    theyear = @doc.xpath("//PubDate/Year").collect(&:text)
    theyear = theyear[0]
    themonth = @doc.xpath("//PubDate/Month").collect(&:text)
    themonth = themonth[0]
    theday = @doc.xpath("//PubDate/Day").collect(&:text)

    if theyear.nil? || themonth.nil? || theday.nil? || theyear.length == 0 || themonth.length == 0 || theday.length == 0

      theyear = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Year").collect(&:text)
      theyear = theyear[0]

      themonth = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Month").collect(&:text)
      themonth = themonth[0].to_i
        month_done = 1
      if themonth < 10
        themonth = "0" + themonth.to_s
        month_done = 1
      end

      theday = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Day").collect(&:text)
    end

    if month_done != 1

      themonth = case themonth || "01"
        when "Jan"  then "01"
        when "Feb"  then "02"
        when "Mar"  then "03"
        when "Apr"  then "04"
        when "May"  then "05"
        when "Jun"  then "06"
        when "Jul"  then "07"
        when "Aug"  then "08"
        when "Sep"  then "09"
        when "Oct"  then "10"
        when "Nov"  then "11"
        when "Dec"  then "12"
      end

    end

    if theday.length == 0
      theday = "01"
    elsif theday[0].to_i < 10       theday = "0" + theday[0].to_s     else       theday = theday[0].to_s     end          thedate = theyear.to_s + "-" + themonth.to_s + "-" + theday.to_s     puts "thedate: " + thedate.to_s     self.pubdate = Date.new(theyear.to_i, themonth.to_i, theday.to_i)          # Either referencing the proper journal or creating a new one     thejournal = @doc.xpath("//Journal/Title").collect(&:text)     thejournal = thejournal[0]          theshortname = @doc.xpath("//MedlineTA").collect(&:text)     if theshortname.length == 0       theshortname = ""     else       theshortname = theshortname[0]     end          thejournalid = Journal.find(:first, :conditions => ['lower(name) = ?', thejournal.downcase])

    if !thejournalid.nil?
      self.journal_id = thejournalid.id
    else
      @journal = Journal.new(:name => thejournal.to_s, :short_name => theshortname.to_s)
      @journal.save
      thenewjournal = Journal.find(:first, :order => 'created_at DESC')
      self.journal_id = thenewjournal.id
    end

    # Save the volume, issue, and pages
    thevolume = @doc.xpath("//JournalIssue/Volume").collect(&:text)
    thevolume = thevolume[0].to_s if thevolume.length > 0
    self.journal_volume = thevolume if thevolume.length > 0

    theissue = @doc.xpath("//JournalIssue/Issue").collect(&:text)
    theissue = theissue[0].to_s if theissue.length > 0
    self.journal_issue = theissue if theissue.length > 0

    thepag = @doc.xpath("//Pagination/MedlinePgn").collect(&:text)
    thepag = thepag[0].to_s if thepag.length > 0
    self.journal_pages = thepag if thepag.length > 0

  end

end

The journals_controller.rb file is pretty standard. So is the articles_controller.rb file. Here it is as an example:

articles_controller.rb

class ArticlesController < ApplicationController   def index     @articles = Article.all(:order => "pubdate DESC")
  end

  def show
    @article = Article.find(params[:id])
  end

  def new
    @article = Article.new
    @journals = Journal.all(:order => :name)
  end

  def create
    @article = Article.new(params[:article])
    if @article.save
      flash[:notice] = "Successfully created article."
      redirect_to @article
    else
      render :action => 'new'
    end
  end

  def edit
    @article = Article.find(params[:id])
    @journals = Journal.all(:order => :name)
  end

  def update
    @article = Article.find(params[:id])
    if @article.update_attributes(params[:article])
      flash[:notice] = "Successfully updated article."
      redirect_to @article
    else
      render :action => 'edit'
    end
  end

  def destroy
    @article = Article.find(params[:id])
    @article.destroy
    flash[:notice] = "Successfully destroyed article."
    redirect_to articles_url
  end
end

Here are the view files for articles:

app/views/articles/index.html.erb

<div class="toolbar">
<ul></ul>
</div>
<table>
<tbody>
<tr>
<th></th>
<th>Article Title</th>
<th></th>
</tr>
<tr class="<%= cycle('rowA', 'rowB') %>">
<td></td>
<td><strong> </strong>

		  <em> </em></td>
<td class="deletelink"></td>
</tr>
</tbody>
</table>

app/views/articles/show.html.erb

<div class="toolbar">
<ul></ul>
</div>
	  <strong>PubMed Link:</strong>







	  <strong>View the Article:</strong>





  <strong>Article Title:</strong>



  <strong>Abstract:</strong>



  <strong>Authors:</strong>



  <strong>Affiliations:</strong>



  <strong>Pubdate:</strong>



  <strong>Pubmedid:</strong>



  <strong>Journal:</strong>



  <strong>Journal Volume:</strong>



  <strong>Journal Issue:</strong>



  <strong>Journal Pages:</strong>



app/views/articles/_form.html.erb



<h3>Enter a link to an abstract on PubMed and click Submit or fill out the information below</h3>











		There currently is no file attached.


<hr />












































The Paperclip gem requires :html => { :multipart => true } at the top of the _form file. It’s a very simple gem to use and very powerful. There are Google Groups for both Paperclip and Nokogiri, should you need additional assistance setting them up.

Leave me a comment if this helps you with either Nokogiri or with extracting data from PubMed!

Written by Clay

July 9, 2010 at 18:18

Posted in Code, Ruby, Technology

Tagged with , , , ,