Life as Clay

Posts Tagged ‘Ruby

Paperclip, S3 & Delayed Job in Rails on Heroku – Jan 2012

with 2 comments

Edit: I forgot to mention here that with Paperclip 2.4.5 you have to use the ‘paperclip-aws’ gem in order for Paperclip to work with Amazon’s newer ‘aws-sdk’ gem. That is no longer true with Paperclip 2.5.

I followed this good tutorial on how to push paperclip image processing to the background with delayed_job. My Rails app is deployed to Heroku on the cedar stack. I’ve had problems with the NoMethod errors using delayed_job 3.0.0, so I downgraded to 2.1.4. Also using paperclip version 2.4.5. In the end, I found that I could ditch the struct presented in the aforementioned tutorial and just call handle_asynchronously on my reprocessing method. This is what the codes looks like:


class Image < ActiveRecord::Base
  attr_accessible :profile_pic,
                  :caption,
                  :note,
                  :pic_file_name,
                  :pic_content_type,
                  :pic_file_size,
                  :pic_updated_at,
                  :pic,
                  :pic_attributes,
                  :processing
  
  belongs_to :imageable, :polymorphic => true
  
  # Added for paperclip-aws
  def self.s3_config
      @@s3_config ||= YAML.load(ERB.new(File.read("#{Rails.root}/config/s3.yml")).result)[Rails.env]    
  end
  
  has_attached_file :pic,
                    :styles => { 
                      :large => "500x500>",
                      :thumb => "100x100>", 
                      :tiny => "50>x50>", 
                      :smallest => "24>x24>" },   
                      :default_url => '/:attachment/:style/missing.png',               
                      
                      # Added for paperclip-aws
                      :storage => :aws,
                      :s3_permissions => :authenticated_read,
                      :path => "images/:id/:style/:filename",
                      :s3_credentials => {
                        :access_key_id => self.s3_config['access_key_id'],
                        :secret_access_key => self.s3_config['secret_access_key']
                      },
                      :bucket => self.s3_config['bucket'],
                      :s3_protocol => "https"    
                      
  validates_attachment_content_type :pic, :content_type => [ /^image\/(?:jpeg|gif|png)$/, nil ]
  
  # How to implement on Heroku with processing in the background
  # http://madeofcode.com/posts/42-paperclip-s3-delayed-job-in-rails
  
  # cancel post-processing now, and set flag...
     before_pic_post_process do |image|
       if !image.processing && image.pic_changed?
         image.processing = true
         false # halts processing
       end
     end
     
     # call method from after_save that will be processed in the background
     after_save do |image| 
       if image.processing
         processImageJob(image)
       end
     end
     
     def processImageJob(image)
         image.regenerate_styles!
     end
     handle_asynchronously :processImageJob
     
     # generate styles (downloads original first)
     def regenerate_styles!
       self.pic.reprocess! 
       self.processing = false   
       self.save(:validations => false)
     end
     
     # detect if our source file has changed
     def pic_changed?
       self.pic_file_size_changed? || 
       self.pic_file_name_changed? ||
       self.pic_content_type_changed? || 
       self.pic_updated_at_changed?
     end
                      
end

Written by Clay

January 12, 2012 at 14:29

Posted in Code, Rails, Ruby

Tagged with , , , , ,

Ruby snipped to return elapsed time as text

leave a comment »

I sometimes put code on the blog here that i want to be able to locate later. Here’s a bit of that. I use this (mainly with the “text” parameter) to return in text the time elapsed between a start_time DateTime stamp and an end_time DateTime stamp. It’s not perfect, but works well enough for my needs:

 

def duration(display)
    if start_time.nil? || end_time.nil? || start_time.blank? || end_time.blank?
      return ""
    end
    
    seconds = end_time - start_time
    minutes = seconds.to_i / 60
    hours = minutes.to_i / 60
    days = hours.to_i / 24
    
    new_hours = hours - (days * 24)
    new_minutes = minutes - (hours * 60)
    
    case display
      when "minutes"
        return new_minutes
      when "minutes_total"
        return minutes
      when "hours"
        return new_hours
      when "hours_total"
        return hours
      when "days"
        return days
      when "text"
        the_string = ""
    
        if days == 1
          the_string += "1 day "
        elsif days > 0
          the_string += days.to_s + " days "
        end
    
        if new_hours == 1
          the_string += "1 hour "
        elsif new_hours > 0
          the_string += new_hours.to_s + " hours "
        end 
    
        if new_minutes == 1
          the_string += "1 minute "
        else
          the_string += new_minutes.to_s + " minutes"
        end
    
        return the_string
      when "text_short"
        the_string = ""
    
        if days == 1
          the_string += "1 d "
        elsif days > 0
          the_string += days.to_s + " d "
        end
    
        if new_hours == 1
          the_string += "1 h "
        elsif new_hours > 0
          the_string += new_hours.to_s + " h "
        end 
    
        if new_minutes == 1
          the_string += "1 m "
        else
          the_string += new_minutes.to_s + " m"
        end
    end
    
  end

Written by Clay

November 16, 2011 at 12:32

Posted in Code, Ruby

Tagged with

Parsing US Census data with Ruby

leave a comment »

I am a fan of population pyramids as a visualization of age demographics. They are simple and effective. Sometimes getting the data to generate them isn’t….

I’m building a simple Rails app for a client who needs county-level population data. It doesn’t have to be precise. I went with the 2009 estimates from the Census bureau, located here:
http://www.census.gov/popest/counties/asrh/CC-EST2009-alldata.html

There were a few issues:
1. Each state’s county-level data is in a different file. You can order a CD that has all of the data in a single file, but I didn’t have time to do that.
2. There’s a ton of data that I didn’t need in each file. All I needed were the county totals and the totals for male/female for each age group.
3. The data is in rows — a different row for each age group. I needed the data in columns, with one row for each county.

What I did:
1. Downloaded all 51 of the files (DC has its own file) and put them in a folder on my desktop.
2. Changed the extension on them to .csv (because it changed to .txt when I downloaded, for some reason).
3. Wrote the Ruby script below, which extracts the data that I need and puts it into a single .csv file.
4. Saved the script to the same folder as the Census files as script.rb.
5. Opened terminal and ran ruby script.rb
4. Reconciled the data with my Excel sheet that I was using to collect county-level data so that I could seed my Rails app’s counties table.

I comment my code pretty heavily when I write it so that I can look back and remember what I did. I’ve been burned in the past by not commenting thoroughly.

Here’s the script:

# Script to pull population data out of census files for each county
# Original data files are here: http://www.census.gov/popest/counties/asrh/CC-EST2009-alldata.html

require 'CSV'

# These are the FIPS codes for each state which are used in file names.
state_fips_codes = ["01", "02", "04", "05", "06", "08", "09", "10", "11", "12", "13", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "44", "45", "46", "47", "48", "49", "50", "51", "53", "54", "55", "56"]

add_header = true

# This is the file that all of the data will go into...
File.open("county_age_data.csv", "w") do |output_file|
  
  # counter for iterating through the files
  n = 0
  
  state_fips_codes.length.times do  # This loop goes through all of the files
    
    # Read in the relevant file
    state_data = CSV.parse(File.read("cc-est2009-alldata-" + state_fips_codes[n] + ".csv"))
    puts "Processing data for " + state_data[1][3]

    # Add the header to the file
    if add_header == true
      output_file.puts "state_fips,county_fips,state_name,county_name,:census_total_pop,:census_total_male,:census_grp01_male,:census_grp02_male,:census_grp03_male,:census_grp04_male,:census_grp05_male,:census_grp06_male,:census_grp07_male,:census_grp08_male,:census_grp09_male,:census_grp10_male,:census_grp11_male,:census_grp12_male,:census_grp13_male,:census_grp14_male,:census_grp15_male,:census_grp16_male,:census_grp17_male,:census_grp18_male,:census_total_female,:census_grp01_female,:census_grp02_female,:census_grp03_female,:census_grp04_female,:census_grp05_female,:census_grp06_female,:census_grp07_female,:census_grp08_female,:census_grp09_female,:census_grp10_female,:census_grp11_female,:census_grp12_female,:census_grp13_female,:census_grp14_female,:census_grp15_female,:census_grp16_female,:census_grp17_female,:census_grp18_female"
      add_header = false
    end

    # Iterate through the loaded file.
    row = 1
    while row < state_data.length
      if state_data[row][5].to_i == 12 # Only take data from year coded 12 (2009 estimates)

        ########## MAIN LOGIC #############
        if state_data[row][6].to_i == 0 # It's a new county because this is the group 0 row with totals

          # Reset the data arrays
          row_data    = []
          male_data   = []
          female_data = []

          row_data.push state_data[row][1].to_s     # state_fips
          row_data.push state_data[row][2].to_s     # county_fips
          row_data.push state_data[row][3].to_s     # state_name
          row_data.push state_data[row][4].to_s     # county_name
          row_data.push state_data[row][7].to_s     # census_total_pop

          male_data.push state_data[row][8].to_s    # census_total_male
          female_data.push state_data[row][9].to_s  # census_total_female

        elsif state_data[row][6].to_i == 18 # It's the last row -- push the data into the array

          # push in the final data points
          male_data.push state_data[row][8].to_s    # census_grp18_male
          female_data.push state_data[row][9].to_s  # census_grp18_female

          # append all of the data to the row_data array
          row_data += male_data
          row_data += female_data

          # write the data to the file
          output_file.puts row_data.join(",")

        else # It's not the first or last row

          male_data.push state_data[row][8].to_s    # census_grpXX_male
          female_data.push state_data[row][9].to_s  # census_grpXX_female

        end # End of the if statement for pushing data into arrays
        ################ END OF MAIN LOGIC ####################
        
      end       # End of the if statement to check if it is year 10
      
      row += 1  # Look at the next row
    end

    n += 1      # Increment the counter
  end           # End of loop for going through each file
end

Written by Clay

July 15, 2010 at 21:17

Posted in Code, Ruby

Tagged with , , ,

Resource: Rails country seeds with ISO codes

leave a comment »

In case you need to seed your Rails app with countries. You can use the ISO codes to display flag icons, like these linked ones from famfamfam.com: link.

 countries = Country.create([
                               {:name => 'Åland Islands', :iso_code => 'AX'},
                               {:name => 'Albania', :iso_code => 'AL'},
                               {:name => 'Algeria', :iso_code => 'DZ'},
                               {:name => 'American Samoa', :iso_code => 'AS'},
                               {:name => 'Andorra', :iso_code => 'AD'},
                               {:name => 'Angola', :iso_code => 'AO'},
                               {:name => 'Anguilla', :iso_code => 'AI'},
                               {:name => 'Antarctica', :iso_code => 'AQ'},
                               {:name => 'Antigua and Barbuda', :iso_code => 'AG'},
                               {:name => 'Argentina', :iso_code => 'AR'},
                               {:name => 'Armenia', :iso_code => 'AM'},
                               {:name => 'Aruba', :iso_code => 'AW'},
                               {:name => 'Australia', :iso_code => 'AU'},
                               {:name => 'Austria', :iso_code => 'AT'},
                               {:name => 'Azerbaijan', :iso_code => 'AZ'},
                               {:name => 'Bahamas', :iso_code => 'BS'},
                               {:name => 'Bahrain', :iso_code => 'BH'},
                               {:name => 'Bangladesh', :iso_code => 'BD'},
                               {:name => 'Barbados', :iso_code => 'BB'},
                               {:name => 'Belarus', :iso_code => 'BY'},
                               {:name => 'Belgium', :iso_code => 'BE'},
                               {:name => 'Belize', :iso_code => 'BZ'},
                               {:name => 'Benin', :iso_code => 'BJ'},
                               {:name => 'Bermuda', :iso_code => 'BM'},
                               {:name => 'Bhutan', :iso_code => 'BT'},
                               {:name => 'Bolivia', :iso_code => 'BO'},
                               {:name => 'Bosnia and Herzegovina', :iso_code => 'BA'},
                               {:name => 'Botswana', :iso_code => 'BW'},
                               {:name => 'Bouvet Island', :iso_code => 'BV'},
                               {:name => 'Brazil', :iso_code => 'BR'},
                               {:name => 'British Indian Ocean Territory', :iso_code => 'IO'},
                               {:name => 'Brunei Darussalam', :iso_code => 'BN'},
                               {:name => 'Bulgaria', :iso_code => 'BG'},
                               {:name => 'Burkina Faso', :iso_code => 'BF'},
                               {:name => 'Burundi', :iso_code => 'BI'},
                               {:name => 'Cambodia', :iso_code => 'KH'},
                               {:name => 'Cameroon', :iso_code => 'CM'},
                               {:name => 'Canada', :iso_code => 'CA'},
                               {:name => 'Cape Verde', :iso_code => 'CV'},
                               {:name => 'Cayman Islands', :iso_code => 'KY'},
                               {:name => 'Central African Republic', :iso_code => 'CF'},
                               {:name => 'Chad', :iso_code => 'TD'},
                               {:name => 'Chile', :iso_code => 'CL'},
                               {:name => 'China', :iso_code => 'CN'},
                               {:name => 'Christmas Island', :iso_code => 'CX'},
                               {:name => 'Cocos (Keeling) Islands', :iso_code => 'CC'},
                               {:name => 'Colombia', :iso_code => 'CO'},
                               {:name => 'Comoros', :iso_code => 'KM'},
                               {:name => 'Congo', :iso_code => 'CG'},
                               {:name => 'Congo, The Democratic Republic of The', :iso_code => 'CD'},
                               {:name => 'Cook Islands', :iso_code => 'CK'},
                               {:name => 'Costa Rica', :iso_code => 'CR'},
                               {:name => 'Côte D\'Ivoire', :iso_code => 'CI'},
                               {:name => 'Croatia', :iso_code => 'HR'},
                               {:name => 'Cuba', :iso_code => 'CU'},
                               {:name => 'Cyprus', :iso_code => 'CY'},
                               {:name => 'Czech Republic', :iso_code => 'CZ'},
                               {:name => 'Denmark', :iso_code => 'DK'},
                               {:name => 'Djibouti', :iso_code => 'DJ'},
                               {:name => 'Dominica', :iso_code => 'DM'},
                               {:name => 'Dominican Republic', :iso_code => 'DO'},
                               {:name => 'Ecuador', :iso_code => 'EC'},
                               {:name => 'Egypt', :iso_code => 'EG'},
                               {:name => 'El Salvador', :iso_code => 'SV'},
                               {:name => 'Equatorial Guinea', :iso_code => 'GQ'},
                               {:name => 'Eritrea', :iso_code => 'ER'},
                               {:name => 'Estonia', :iso_code => 'EE'},
                               {:name => 'Ethiopia', :iso_code => 'ET'},
                               {:name => 'Falkland Islands (Malvinas)', :iso_code => 'FK'},
                               {:name => 'Faroe Islands', :iso_code => 'FO'},
                               {:name => 'Fiji', :iso_code => 'FJ'},
                               {:name => 'Finland', :iso_code => 'FI'},
                               {:name => 'France', :iso_code => 'FR'},
                               {:name => 'French Guiana', :iso_code => 'GF'},
                               {:name => 'French Polynesia', :iso_code => 'PF'},
                               {:name => 'French Southern Territories', :iso_code => 'TF'},
                               {:name => 'Gabon', :iso_code => 'GA'},
                               {:name => 'Gambia', :iso_code => 'GM'},
                               {:name => 'Georgia', :iso_code => 'GE'},
                               {:name => 'Germany', :iso_code => 'DE'},
                               {:name => 'Ghana', :iso_code => 'GH'},
                               {:name => 'Gibraltar', :iso_code => 'GI'},
                               {:name => 'Greece', :iso_code => 'GR'},
                               {:name => 'Greenland', :iso_code => 'GL'},
                               {:name => 'Grenada', :iso_code => 'GD'},
                               {:name => 'Guadeloupe', :iso_code => 'GP'},
                               {:name => 'Guam', :iso_code => 'GU'},
                               {:name => 'Guatemala', :iso_code => 'GT'},
                               {:name => 'Guernsey', :iso_code => 'GG'},
                               {:name => 'Guinea', :iso_code => 'GN'},
                               {:name => 'Guinea-Bissau', :iso_code => 'GW'},
                               {:name => 'Guyana', :iso_code => 'GY'},
                               {:name => 'Haiti', :iso_code => 'HT'},
                               {:name => 'Heard Island and Mcdonald Islands', :iso_code => 'HM'},
                               {:name => 'Holy See (Vatican City State)', :iso_code => 'VA'},
                               {:name => 'Honduras', :iso_code => 'HN'},
                               {:name => 'Hong Kong', :iso_code => 'HK'},
                               {:name => 'Hungary', :iso_code => 'HU'},
                               {:name => 'Iceland', :iso_code => 'IS'},
                               {:name => 'India', :iso_code => 'IN'},
                               {:name => 'Indonesia', :iso_code => 'ID'},
                               {:name => 'Iran, Islamic Republic of', :iso_code => 'IR'},
                               {:name => 'Iraq', :iso_code => 'IQ'},
                               {:name => 'Ireland', :iso_code => 'IE'},
                               {:name => 'Isle of Man', :iso_code => 'IM'},
                               {:name => 'Israel', :iso_code => 'IL'},
                               {:name => 'Italy', :iso_code => 'IT'},
                               {:name => 'Jamaica', :iso_code => 'JM'},
                               {:name => 'Japan', :iso_code => 'JP'},
                               {:name => 'Jersey', :iso_code => 'JE'},
                               {:name => 'Jordan', :iso_code => 'JO'},
                               {:name => 'Kazakhstan', :iso_code => 'KZ'},
                               {:name => 'Kenya', :iso_code => 'KE'},
                               {:name => 'Kiribati', :iso_code => 'KI'},
                               {:name => 'Korea, Democratic People\'s Republic of', :iso_code => 'KP'},
                               {:name => 'Korea, Republic of', :iso_code => 'KR'},
                               {:name => 'Kuwait', :iso_code => 'KW'},
                               {:name => 'Kyrgyzstan', :iso_code => 'KG'},
                               {:name => 'Lao People\'s Democratic Republic', :iso_code => 'LA'},
                               {:name => 'Latvia', :iso_code => 'LV'},
                               {:name => 'Lebanon', :iso_code => 'LB'},
                               {:name => 'Lesotho', :iso_code => 'LS'},
                               {:name => 'Liberia', :iso_code => 'LR'},
                               {:name => 'Libyan Arab Jamahiriya', :iso_code => 'LY'},
                               {:name => 'Liechtenstein', :iso_code => 'LI'},
                               {:name => 'Lithuania', :iso_code => 'LT'},
                               {:name => 'Luxembourg', :iso_code => 'LU'},
                               {:name => 'Macao', :iso_code => 'MO'},
                               {:name => 'Macedonia, The Former Yugoslav Republic of', :iso_code => 'MK'},
                               {:name => 'Madagascar', :iso_code => 'MG'},
                               {:name => 'Malawi', :iso_code => 'MW'},
                               {:name => 'Malaysia', :iso_code => 'MY'},
                               {:name => 'Maldives', :iso_code => 'MV'},
                               {:name => 'Mali', :iso_code => 'ML'},
                               {:name => 'Malta', :iso_code => 'MT'},
                               {:name => 'Marshall Islands', :iso_code => 'MH'},
                               {:name => 'Martinique', :iso_code => 'MQ'},
                               {:name => 'Mauritania', :iso_code => 'MR'},
                               {:name => 'Mauritius', :iso_code => 'MU'},
                               {:name => 'Mayotte', :iso_code => 'YT'},
                               {:name => 'Mexico', :iso_code => 'MX'},
                               {:name => 'Micronesia, Federated States of', :iso_code => 'FM'},
                               {:name => 'Moldova, Republic of', :iso_code => 'MD'},
                               {:name => 'Monaco', :iso_code => 'MC'},
                               {:name => 'Mongolia', :iso_code => 'MN'},
                               {:name => 'Montenegro', :iso_code => 'ME'},
                               {:name => 'Montserrat', :iso_code => 'MS'},
                               {:name => 'Morocco', :iso_code => 'MA'},
                               {:name => 'Mozambique', :iso_code => 'MZ'},
                               {:name => 'Myanmar', :iso_code => 'MM'},
                               {:name => 'Namibia', :iso_code => 'NA'},
                               {:name => 'Nauru', :iso_code => 'NR'},
                               {:name => 'Nepal', :iso_code => 'NP'},
                               {:name => 'Netherlands', :iso_code => 'NL'},
                               {:name => 'Netherlands Antilles', :iso_code => 'AN'},
                               {:name => 'New Caledonia', :iso_code => 'NC'},
                               {:name => 'New Zealand', :iso_code => 'NZ'},
                               {:name => 'Nicaragua', :iso_code => 'NI'},
                               {:name => 'Niger', :iso_code => 'NE'},
                               {:name => 'Nigeria', :iso_code => 'NG'},
                               {:name => 'Niue', :iso_code => 'NU'},
                               {:name => 'Norfolk Island', :iso_code => 'NF'},
                               {:name => 'Northern Mariana Islands', :iso_code => 'MP'},
                               {:name => 'Norway', :iso_code => 'NO'},
                               {:name => 'Oman', :iso_code => 'OM'},
                               {:name => 'Pakistan', :iso_code => 'PK'},
                               {:name => 'Palau', :iso_code => 'PW'},
                               {:name => 'Palestinian Territory, Occupied', :iso_code => 'PS'},
                               {:name => 'Panama', :iso_code => 'PA'},
                               {:name => 'Papua New Guinea', :iso_code => 'PG'},
                               {:name => 'Paraguay', :iso_code => 'PY'},
                               {:name => 'Peru', :iso_code => 'PE'},
                               {:name => 'Philippines', :iso_code => 'PH'},
                               {:name => 'Pitcairn', :iso_code => 'PN'},
                               {:name => 'Poland', :iso_code => 'PL'},
                               {:name => 'Portugal', :iso_code => 'PT'},
                               {:name => 'Puerto Rico', :iso_code => 'PR'},
                               {:name => 'Qatar', :iso_code => 'QA'},
                               {:name => 'Réunion', :iso_code => 'RE'},
                               {:name => 'Romania', :iso_code => 'RO'},
                               {:name => 'Russian Federation', :iso_code => 'RU'},
                               {:name => 'Rwanda', :iso_code => 'RW'},
                               {:name => 'Saint Barthélemy', :iso_code => 'BL'},
                               {:name => 'Saint Helena, Ascension and Tristan Da Cunha', :iso_code => 'SH'},
                               {:name => 'Saint Kitts and Nevis', :iso_code => 'KN'},
                               {:name => 'Saint Lucia', :iso_code => 'LC'},
                               {:name => 'Saint Martin', :iso_code => 'MF'},
                               {:name => 'Saint Pierre and Miquelon', :iso_code => 'PM'},
                               {:name => 'Saint Vincent and The Grenadines', :iso_code => 'VC'},
                               {:name => 'Samoa', :iso_code => 'WS'},
                               {:name => 'San Marino', :iso_code => 'SM'},
                               {:name => 'Sao Tome and Principe', :iso_code => 'ST'},
                               {:name => 'Saudi Arabia', :iso_code => 'SA'},
                               {:name => 'Senegal', :iso_code => 'SN'},
                               {:name => 'Serbia', :iso_code => 'RS'},
                               {:name => 'Seychelles', :iso_code => 'SC'},
                               {:name => 'Sierra Leone', :iso_code => 'SL'},
                               {:name => 'Singapore', :iso_code => 'SG'},
                               {:name => 'Slovakia', :iso_code => 'SK'},
                               {:name => 'Slovenia', :iso_code => 'SI'},
                               {:name => 'Solomon Islands', :iso_code => 'SB'},
                               {:name => 'Somalia', :iso_code => 'SO'},
                               {:name => 'South Africa', :iso_code => 'ZA'},
                               {:name => 'South Georgia and The South Sandwich Islands', :iso_code => 'GS'},
                               {:name => 'Spain', :iso_code => 'ES'},
                               {:name => 'Sri Lanka', :iso_code => 'LK'},
                               {:name => 'Sudan', :iso_code => 'SD'},
                               {:name => 'Suriname', :iso_code => 'SR'},
                               {:name => 'Svalbard and Jan Mayen', :iso_code => 'SJ'},
                               {:name => 'Swaziland', :iso_code => 'SZ'},
                               {:name => 'Sweden', :iso_code => 'SE'},
                               {:name => 'Switzerland', :iso_code => 'CH'},
                               {:name => 'Syrian Arab Republic', :iso_code => 'SY'},
                               {:name => 'Taiwan, Province of China', :iso_code => 'TW'},
                               {:name => 'Tajikistan', :iso_code => 'TJ'},
                               {:name => 'Tanzania, United Republic of', :iso_code => 'TZ'},
                               {:name => 'Thailand', :iso_code => 'TH'},
                               {:name => 'Timor-Leste', :iso_code => 'TL'},
                               {:name => 'Togo', :iso_code => 'TG'},
                               {:name => 'Tokelau', :iso_code => 'TK'},
                               {:name => 'Tonga', :iso_code => 'TO'},
                               {:name => 'Trinidad and Tobago', :iso_code => 'TT'},
                               {:name => 'Tunisia', :iso_code => 'TN'},
                               {:name => 'Turkey', :iso_code => 'TR'},
                               {:name => 'Turkmenistan', :iso_code => 'TM'},
                               {:name => 'Turks and Caicos Islands', :iso_code => 'TC'},
                               {:name => 'Tuvalu', :iso_code => 'TV'},
                               {:name => 'Uganda', :iso_code => 'UG'},
                               {:name => 'Ukraine', :iso_code => 'UA'},
                               {:name => 'United Arab Emirates', :iso_code => 'AE'},
                               {:name => 'United Kingdom', :iso_code => 'GB'},
                               {:name => 'United States', :iso_code => 'US'},
                               {:name => 'United States Minor Outlying Islands', :iso_code => 'UM'},
                               {:name => 'Uruguay', :iso_code => 'UY'},
                               {:name => 'Uzbekistan', :iso_code => 'UZ'},
                               {:name => 'Vanuatu', :iso_code => 'VU'},
                               {:name => 'Venezuela, Bolivarian Republic of', :iso_code => 'VE'},
                               {:name => 'Viet Nam', :iso_code => 'VN'},
                               {:name => 'Virgin Islands, British', :iso_code => 'VG'},
                               {:name => 'Virgin Islands, U.S.', :iso_code => 'VI'},
                               {:name => 'Wallis and Futuna', :iso_code => 'WF'},
                               {:name => 'Western Sahara', :iso_code => 'EH'},
                               {:name => 'Yemen', :iso_code => 'YE'},
                               {:name => 'Zambia', :iso_code => 'ZM'},
                               {:name => 'Zimbabwe', :iso_code => 'ZW'}
 ])

Written by Clay

July 13, 2010 at 17:54

Posted in Code, Resource, Ruby

Tagged with , , ,

Resource: Rails database seeds for US States, including ANSI codes

with one comment

In case you need to add US States to your Rails app. The data are from the US Census website. I’m not sure what the purpose of the “Geographic Names Information System Identifier (GNISID)” is (the number after the State name), but it’s included here in case you need to tie back to the census system.

states = State.create([
                        { :name => 'Alabama', :abbr => 'AL', :ansi_code => '1', :statens => '1779775'},
                        { :name => 'Alaska', :abbr => 'AK', :ansi_code => '2', :statens => '1785533'},
                        { :name => 'Arizona', :abbr => 'AZ', :ansi_code => '4', :statens => '1779777'},
                        { :name => 'Arkansas', :abbr => 'AR', :ansi_code => '5', :statens => '68085'},
                        { :name => 'California', :abbr => 'CA', :ansi_code => '6', :statens => '1779778'},
                        { :name => 'Colorado', :abbr => 'CO', :ansi_code => '8', :statens => '1779779'},
                        { :name => 'Connecticut', :abbr => 'CT', :ansi_code => '9', :statens => '1779780'},
                        { :name => 'Delaware', :abbr => 'DE', :ansi_code => '10', :statens => '1779781'},
                        { :name => 'District of Columbia', :abbr => 'DC', :ansi_code => '11', :statens => '1702382'},
                        { :name => 'Florida', :abbr => 'FL', :ansi_code => '12', :statens => '294478'},
                        { :name => 'Georgia', :abbr => 'GA', :ansi_code => '13', :statens => '1705317'},
                        { :name => 'Hawaii', :abbr => 'HI', :ansi_code => '15', :statens => '1779782'},
                        { :name => 'Idaho', :abbr => 'ID', :ansi_code => '16', :statens => '1779783'},
                        { :name => 'Illinois', :abbr => 'IL', :ansi_code => '17', :statens => '1779784'},
                        { :name => 'Indiana', :abbr => 'IN', :ansi_code => '18', :statens => '448508'},
                        { :name => 'Iowa', :abbr => 'IA', :ansi_code => '19', :statens => '1779785'},
                        { :name => 'Kansas', :abbr => 'KS', :ansi_code => '20', :statens => '481813'},
                        { :name => 'Kentucky', :abbr => 'KY', :ansi_code => '21', :statens => '1779786'},
                        { :name => 'Louisiana', :abbr => 'LA', :ansi_code => '22', :statens => '1629543'},
                        { :name => 'Maine', :abbr => 'ME', :ansi_code => '23', :statens => '1779787'},
                        { :name => 'Maryland', :abbr => 'MD', :ansi_code => '24', :statens => '1714934'},
                        { :name => 'Massachusetts', :abbr => 'MA', :ansi_code => '25', :statens => '606926'},
                        { :name => 'Michigan', :abbr => 'MI', :ansi_code => '26', :statens => '1779789'},
                        { :name => 'Minnesota', :abbr => 'MN', :ansi_code => '27', :statens => '662849'},
                        { :name => 'Mississippi', :abbr => 'MS', :ansi_code => '28', :statens => '1779790'},
                        { :name => 'Missouri', :abbr => 'MO', :ansi_code => '29', :statens => '1779791'},
                        { :name => 'Montana', :abbr => 'MT', :ansi_code => '30', :statens => '767982'},
                        { :name => 'Nebraska', :abbr => 'NE', :ansi_code => '31', :statens => '1779792'},
                        { :name => 'Nevada', :abbr => 'NV', :ansi_code => '32', :statens => '1779793'},
                        { :name => 'New Hampshire', :abbr => 'NH', :ansi_code => '33', :statens => '1779794'},
                        { :name => 'New Jersey', :abbr => 'NJ', :ansi_code => '34', :statens => '1779795'},
                        { :name => 'New Mexico', :abbr => 'NM', :ansi_code => '35', :statens => '897535'},
                        { :name => 'New York', :abbr => 'NY', :ansi_code => '36', :statens => '1779796'},
                        { :name => 'North Carolina', :abbr => 'NC', :ansi_code => '37', :statens => '1027616'},
                        { :name => 'North Dakota', :abbr => 'ND', :ansi_code => '38', :statens => '1779797'},
                        { :name => 'Ohio', :abbr => 'OH', :ansi_code => '39', :statens => '1085497'},
                        { :name => 'Oklahoma', :abbr => 'OK', :ansi_code => '40', :statens => '1102857'},
                        { :name => 'Oregon', :abbr => 'OR', :ansi_code => '41', :statens => '1155107'},
                        { :name => 'Pennsylvania', :abbr => 'PA', :ansi_code => '42', :statens => '1779798'},
                        { :name => 'Rhode Island', :abbr => 'RI', :ansi_code => '44', :statens => '1219835'},
                        { :name => 'South Carolina', :abbr => 'SC', :ansi_code => '45', :statens => '1779799'},
                        { :name => 'South Dakota', :abbr => 'SD', :ansi_code => '46', :statens => '1785534'},
                        { :name => 'Tennessee', :abbr => 'TN', :ansi_code => '47', :statens => '1325873'},
                        { :name => 'Texas', :abbr => 'TX', :ansi_code => '48', :statens => '1779801'},
                        { :name => 'Utah', :abbr => 'UT', :ansi_code => '49', :statens => '1455989'},
                        { :name => 'Vermont', :abbr => 'VT', :ansi_code => '50', :statens => '1779802'},
                        { :name => 'Virginia', :abbr => 'VA', :ansi_code => '51', :statens => '1779803'},
                        { :name => 'Washington', :abbr => 'WA', :ansi_code => '53', :statens => '1779804'},
                        { :name => 'West Virginia', :abbr => 'WV', :ansi_code => '54', :statens => '1779805'},
                        { :name => 'Wisconsin', :abbr => 'WI', :ansi_code => '55', :statens => '1779806'},
                        { :name => 'Wyoming', :abbr => 'WY', :ansi_code => '56', :statens => '1779807'}
])

Written by Clay

July 13, 2010 at 17:49

Posted in Code, Resource, Ruby

Tagged with , , ,

Using Nokogiri and Ruby on Rails to grab citations from PubMed

with 12 comments

Update: I put a working example up on github (Rails 3.0.4 and Nokogiri 1.4.4): https://github.com/ccheaton/pubnoko

##############################

Following on the last post… I needed to provide to a client a custom database of PubMed citations, links, datasets, and other information related to a product area I was researching. The hardest part about this was understanding PubMed’s eutils and their .xml formats.

I did not want to manually enter the citation information. I wanted to avoid Zotero and other similar services because I wanted to integrate the data across domains and use the database that I was developing for them as the basis for a paper I wrote. For the PubMed citations, what I wanted to be able to do was to go to PubMed, perform a search, find an interesting citation, copy the PubMed URL, paste it into my Rails application, which would use the URL to query PubMed in XML format and populate my app’s database with that entry.

I have two models that play into this interaction: Article and Journal. I should note that I’m using Nokogiri to parse the XML and thoughtbot’s Paperclip gem to attach articles when they are available freely. This was done using Rails 2.3.5 and Ruby 1.8.7.

Here’s how the model files look:

journal.rb

class Journal < ActiveRecord::Base   attr_accessible :name, :description, :url, :short_name      has_many :articles      validates_uniqueness_of :name   validates_presence_of :short_name   validates_presence_of :name      before_save :make_short_name      def best_name     if !self.short_name.nil? && self.short_name.length > 0
      return self.short_name
    else
      return self.name
    end
  end

  private

  # Makes sure that a new journal has a short name in case one isn't entered.
  def make_short_name
    if self.short_name.nil? || self.short_name.length == 0
      self.short_name = self.name
    end
  end
end

The Article model is a little more complicated. The logic that I’m using the parse the XML isn’t great, but I’ve found that the structure changes from article to article. I’m still refining it and will update it when I’m finished, after I’ve found the 90% of article that auto-import.

article.rb

class Article < ActiveRecord::Base   require 'open-uri'   attr_accessible :pubmedlink,                    :article_title,                    :abstract,                    :authors,                    :affiliations,                    :pubdate,                    :pubmedid,                    :journal_id,                    :journal_volume,                    :journal_issue,                    :journal_pages,                   :fetched,                   :thearticle_file_name,                   :thearticle_content_type,                   :thearticle_file_size,                   :thearticle_updated_at,                   :thearticle                      has_attached_file :thearticle,                     :path => ":rails_root/public/system/uploads/:class/:id/:basename.:extension",
                    :url => "/system/uploads/:class/:id/:basename.:extension"
  belongs_to :journal
  before_save :make_pubmedid
  validates_uniqueness_of :article_title

  def citation
    cit = ""
    if self.journal_id != nil && self.journal_id > 0
      cit += self.journal.best_name + ". " + self.pubdate.year.to_s + " " + self.pubdate.strftime("%b")
    end
    if self.journal_volume != nil && self.journal_volume.length > 0
      cit += "; " + self.journal_volume
    end
    if self.journal_issue != nil && self.journal_issue.length > 0
      cit += "(" + self.journal_issue + ")"
    end
    if self.journal_pages != nil && self.journal_pages.length > 0
      cit += ": " + self.journal_pages
    end
      return cit
  end

  private

  def make_pubmedid
    if self.pubmedlink != nil && self.pubmedlink.length > 0 && self.fetched == false
      regex = Regexp.new(/[0-9]{4,10}/)
      matchdata = regex.match(self.pubmedlink)
      self.pubmedid = matchdata[0]
      pull_pubmed_data(matchdata[0])
      self.fetched = true
    end
  end

  def pull_pubmed_data(theID)
    @doc = Nokogiri::XML(open("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=" + theID.to_s + "&retmode=xml").read)

    # The title and abstract
    self.article_title = @doc.xpath("//ArticleTitle").collect(&:text).to_s
    self.abstract = @doc.xpath("//AbstractText").collect(&:text).to_s

    # Setting up the authors
    first_names = @doc.xpath("//ForeName").collect(&:text)
    last_names = @doc.xpath("//LastName").collect(&:text)
    full_names = []

    i = 0
    last_names.length.times do
      the_name = first_names[i].to_s + " " + last_names[i].to_s
      full_names.push the_name
      i += 1
    end
    self.authors = full_names.join(", ")

    # Affiliations
    self.affiliations = @doc.xpath("//Affiliation").collect(&:text).to_s

    # Publication Date - Check if the complete date is at the top. If not use the pub med date.
    theyear = @doc.xpath("//PubDate/Year").collect(&:text)
    theyear = theyear[0]
    themonth = @doc.xpath("//PubDate/Month").collect(&:text)
    themonth = themonth[0]
    theday = @doc.xpath("//PubDate/Day").collect(&:text)

    if theyear.nil? || themonth.nil? || theday.nil? || theyear.length == 0 || themonth.length == 0 || theday.length == 0

      theyear = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Year").collect(&:text)
      theyear = theyear[0]

      themonth = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Month").collect(&:text)
      themonth = themonth[0].to_i
        month_done = 1
      if themonth < 10
        themonth = "0" + themonth.to_s
        month_done = 1
      end

      theday = @doc.xpath("//PubMedPubDate[@PubStatus='pubmed']/Day").collect(&:text)
    end

    if month_done != 1

      themonth = case themonth || "01"
        when "Jan"  then "01"
        when "Feb"  then "02"
        when "Mar"  then "03"
        when "Apr"  then "04"
        when "May"  then "05"
        when "Jun"  then "06"
        when "Jul"  then "07"
        when "Aug"  then "08"
        when "Sep"  then "09"
        when "Oct"  then "10"
        when "Nov"  then "11"
        when "Dec"  then "12"
      end

    end

    if theday.length == 0
      theday = "01"
    elsif theday[0].to_i < 10       theday = "0" + theday[0].to_s     else       theday = theday[0].to_s     end          thedate = theyear.to_s + "-" + themonth.to_s + "-" + theday.to_s     puts "thedate: " + thedate.to_s     self.pubdate = Date.new(theyear.to_i, themonth.to_i, theday.to_i)          # Either referencing the proper journal or creating a new one     thejournal = @doc.xpath("//Journal/Title").collect(&:text)     thejournal = thejournal[0]          theshortname = @doc.xpath("//MedlineTA").collect(&:text)     if theshortname.length == 0       theshortname = ""     else       theshortname = theshortname[0]     end          thejournalid = Journal.find(:first, :conditions => ['lower(name) = ?', thejournal.downcase])

    if !thejournalid.nil?
      self.journal_id = thejournalid.id
    else
      @journal = Journal.new(:name => thejournal.to_s, :short_name => theshortname.to_s)
      @journal.save
      thenewjournal = Journal.find(:first, :order => 'created_at DESC')
      self.journal_id = thenewjournal.id
    end

    # Save the volume, issue, and pages
    thevolume = @doc.xpath("//JournalIssue/Volume").collect(&:text)
    thevolume = thevolume[0].to_s if thevolume.length > 0
    self.journal_volume = thevolume if thevolume.length > 0

    theissue = @doc.xpath("//JournalIssue/Issue").collect(&:text)
    theissue = theissue[0].to_s if theissue.length > 0
    self.journal_issue = theissue if theissue.length > 0

    thepag = @doc.xpath("//Pagination/MedlinePgn").collect(&:text)
    thepag = thepag[0].to_s if thepag.length > 0
    self.journal_pages = thepag if thepag.length > 0

  end

end

The journals_controller.rb file is pretty standard. So is the articles_controller.rb file. Here it is as an example:

articles_controller.rb

class ArticlesController < ApplicationController   def index     @articles = Article.all(:order => "pubdate DESC")
  end

  def show
    @article = Article.find(params[:id])
  end

  def new
    @article = Article.new
    @journals = Journal.all(:order => :name)
  end

  def create
    @article = Article.new(params[:article])
    if @article.save
      flash[:notice] = "Successfully created article."
      redirect_to @article
    else
      render :action => 'new'
    end
  end

  def edit
    @article = Article.find(params[:id])
    @journals = Journal.all(:order => :name)
  end

  def update
    @article = Article.find(params[:id])
    if @article.update_attributes(params[:article])
      flash[:notice] = "Successfully updated article."
      redirect_to @article
    else
      render :action => 'edit'
    end
  end

  def destroy
    @article = Article.find(params[:id])
    @article.destroy
    flash[:notice] = "Successfully destroyed article."
    redirect_to articles_url
  end
end

Here are the view files for articles:

app/views/articles/index.html.erb

<div class="toolbar">
<ul></ul>
</div>
<table>
<tbody>
<tr>
<th></th>
<th>Article Title</th>
<th></th>
</tr>
<tr class="<%= cycle('rowA', 'rowB') %>">
<td></td>
<td><strong> </strong>

		  <em> </em></td>
<td class="deletelink"></td>
</tr>
</tbody>
</table>

app/views/articles/show.html.erb

<div class="toolbar">
<ul></ul>
</div>
	  <strong>PubMed Link:</strong>







	  <strong>View the Article:</strong>





  <strong>Article Title:</strong>



  <strong>Abstract:</strong>



  <strong>Authors:</strong>



  <strong>Affiliations:</strong>



  <strong>Pubdate:</strong>



  <strong>Pubmedid:</strong>



  <strong>Journal:</strong>



  <strong>Journal Volume:</strong>



  <strong>Journal Issue:</strong>



  <strong>Journal Pages:</strong>



app/views/articles/_form.html.erb



<h3>Enter a link to an abstract on PubMed and click Submit or fill out the information below</h3>











		There currently is no file attached.


<hr />












































The Paperclip gem requires :html => { :multipart => true } at the top of the _form file. It’s a very simple gem to use and very powerful. There are Google Groups for both Paperclip and Nokogiri, should you need additional assistance setting them up.

Leave me a comment if this helps you with either Nokogiri or with extracting data from PubMed!

Written by Clay

July 9, 2010 at 18:18

Posted in Code, Ruby, Technology

Tagged with , , , ,

Learn to Program: English numbers meet the beer song

leave a comment »

This one is quite simple and only took me a few minutes to put together, since it combines the previous entry with the 99 Bottles of Beer on the Wall song. That said, I learned a lesson while doing it, which I guess is the important part.

If you look at the code you’ll see that I use a ‘while’ clause to iterate through the verses. The number decrements at the end of each verse. It works great.

The first time I implemented it, I did it recursively, calling the beersong method on the decremented number at the end of each verse… Wrong! The recursion creates nested method calls and ends up causing a stack overflow due to unnecessary memory usage.

The moral of the story is to only use recursion when necessary. In this case, a simple loop was the best way to tackle the problem.

def english_number number
  ones  = ['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine']
  teens = ['ten', 'eleven', 'twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen', 'seventeen', 'eighteen', 'ninteen']
  tens  = ['ten', 'twenty', 'thirty', 'forty', 'fifty', 'sixty', 'seventy', 'eighty', 'ninety']
  
  readout = ''
  working = number.to_i
  
  # The idea here... create the variable m_count to check if the input number is over def over 
  # one million. If it is, recurse m_count (which will be less than 1000) through the method
  # to generate the number of millions. Then, tack " million " onto the end of it and change 
  # the working number to what it originally was, minus the number of millions. Then create
  # the variable t_count to check the number of thousands in the remainder and repeat the process.
  # Once the remainder is under 1000, we do not recurse again. You can easily expand this for billions
  # and higher numbers by copying the case for millions, creating a new 'billions' variable, then
  # pulling the same trick and tacking " billion " onto the end of the number returned.
  
  if working > 999999999
    puts 'Enter a number under 1 billion.'
    return
  end
  
  # Case for millions
  m_count = working / 1000000
  if m_count >0
    millions = english_number m_count
    readout = readout + millions + ' million'
    working = working - (m_count * 1000000)
    
    if working > 0
      readout = readout + ' '
    end
    
  end
  
  # Case for thousands
  t_count = working / 1000
  if t_count >0
    thousands = english_number t_count
    readout = readout + thousands + ' thousand'
    working = working - (t_count * 1000)
    
    if working > 0
      readout = readout + ' '
    end
    
  end
  
  # Case for hundreds
  h_count = working / 100
  if h_count > 0
    hundreds = ones[h_count - 1]
    readout = readout + hundreds + ' hundred'
    working = working - (h_count * 100)
    
    if working > 0
      readout = readout + ' '
    end
    
  end
  
  # Case for over 20 and less than 100
  if working < 100 and working >= 20
    ten_count = working / 10
    working = working - (ten_count * 10)
    readout = readout + tens[ten_count -1]
    
    if working > 0
      readout = readout + '-'
    end
    
  end
  
  # Case for teens
  if working >= 10 and working < 20
    working = working - 10
    readout = readout + teens[working]
    working = 0
  end
  
  # Case for singles
  if working > 0 and working < 10
    readout = readout + ones[working -1]
  end
  
  if readout == '' and working == 0
    readout = 'zero'
  end
  
  readout
  
end

def beersong number
  
  start = number
  
  while start > 0
  
    sing = english_number(start)
    decrement = english_number(start - 1)
  
    puts
  
    if start == 1
      puts sing + ' bottle of beer on the wall,'
      puts sing + ' bottle of beer...'
    else
      puts sing + ' bottles of beer on the wall,'
      puts sing + ' bottles of beer...'
    end
  
    puts 'take one down,'
    puts 'pass it around,'
  
    if (start - 1) == 1 
      puts decrement + ' bottle of beer on the wall!'
    else
      puts decrement + ' bottles of beer on the wall!'
    end
  
    puts
  
    start -= 1 # Here I originally called beersong recursively and that was wrong. 
  end
  
  return
  
end

puts 'Enter a number: '

number = gets.chomp.to_i

beersong number

Written by Clay

October 14, 2009 at 01:46

Posted in Code, Ruby

Tagged with , ,