Life as Clay

Parsing .xml files with Objective-C

with 2 comments


Well, this was a good lesson for me to learn some objective-C while working. The problem is that I was faced with a lot of .xml files, each of which had thousands of values concatenated as a single string in several list elements. This is what the .xml files looked like:
View the entire post to see the code.

<?xml version="1.0" encoding="UTF-8"?>  
<ExportCenter filename="filename.xml">  
  <Section name="File">  
    <Item type="String" name="Creator">A Creator</Item>  
    <Item type="String" name="EvaluationScheme">A_scheme.xml</Item>  
    <Item type="DateTime" name="CreationTime">2007-01-01T00:00:00</Item>  
  </Section>  
  <Section name="Study">  
    <Item type="String" name="Id">Identity A</Item>  
    <Item type="String" name="Site">Location A</Item>  
  </Section>  
  <Section name="Subject">  
    <Item type="String" name="Id">123456</Item>  
    <Item type="Enum" name="PersonType">0</Item>  
    <Item type="String" name="DateOfBirth">1999-09-09</Item>  
    <Item type="Integer" name="Height" unit="cm">150</Item>  
    <Item type="Integer" name="Weight" unit="kg">75</Item>  
  </Section>  
  <Section name="References">  
    <Item type="String" name="Type">Site A</Item>  
    <Item type="String" name="Method">DEFG</Item>  
    <List type="DateTime" name="DateTime" separator=";">2;2005-08-28T13:30:00;2005-08-28T14:31:00; ... + 1000s of values</List>  
    <List type="Float" name="Reading" unit="mg" separator=";">74;104;119;89;73; ... + 1000s of values</List>  
  </Section>  
  <Section name="Experiment">  
    <Item type="String" name="Id">Identity B</Item>  
    <Item type="DateTime" name="Start">2005-08-24T18:37:00</Item>  
    <Item type="DateTime" name="Stop">2005-08-28T20:41:00</Item>  
    <Item type="Integer" name="Interval" unit="s">60</Item>  
    <List type="Float" name="Reading" unit="mg" separator=";">18.1449;26.1845;... + 1000s of values</List>  
    <List type="Boolean" name="Validity" separator=";">0;0;0;0;0;... + 1000s of values</List>  
  </Section>  
</ExportCenter>

and this is the script that I wrote to grab the values from the List elements and turn them into a .csv file that I can import into an Access database that I’m working with:

#import <Foundation/Foundation.h>  
  
int main (int argc, const char * argv[]) {  
      
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];  
      
    // Variable declarations  
    NSString *path = @"/path/to/my/directory/with/.xml/files/";  
    NSString *extensionToParse = @".xml";  
    NSString *extensionToCreate = @".csv";  
    NSString *currentFile, *newFileName, *lastFourChars, *subjectID;  
    NSArray  *referencesDateTimeList, *referencesReadingList, *experimentReadingList, *experimentValidityList;  
    NSString *refDateTime, *refReading, *expReading, *expValidity;  
      
    int largestArraySize;  
      
    // Print the path to the log  
    NSLog(@"Path set to %@", path);  
      
    // Load the contents of the path into the array  
    NSArray *content = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:path error:NULL];  
      
    // Counts the number of files (will use for # of times we iterate the loop)  
    int n = [content count];  
    NSLog(@"Number of files: %i", n);  
      
    // Loop for iterating through the files  
    for (int i=0; i<n; i++){  
        currentFile = [content objectAtIndex:i];  
          
        // Get the last four characters of the file name  
        lastFourChars = [currentFile substringWithRange: NSMakeRange([currentFile length] - 4, 4)];  
          
        if ([lastFourChars isEqualToString:extensionToParse]) {  
            // THESE ARE FILES THAT MATCH THE EXTENSION TO PARSE AND WILL BE PARSED (.XML)  
              
            newFileName = [[currentFile substringToIndex:[currentFile length] - 4] stringByAppendingString:extensionToCreate];  
              
            // Load the current file as loadedFile  
            NSFileHandle *loadedFile = [NSFileHandle fileHandleForReadingAtPath:[path stringByAppendingString:currentFile]];  
              
            // Create fileData and load the data from the current file  
            NSData *fileData = [loadedFile readDataToEndOfFile];  
              
            // Loaded file pushed into data object, so close the file  
            [loadedFile closeFile];  
              
            // Process the .xml  
            // Create an NSXMLDocument object from the data read from the file  
            NSXMLDocument *xmlDoc = [[NSXMLDocument alloc] initWithData:fileData options:NSXMLDocumentTidyXML error:NULL];   
              
            // Set the root element of the NSXMLDocument so that we know how to traverse it  
            NSXMLElement  *rootElement = [xmlDoc rootElement];   
              
            NSArray *sections = [rootElement elementsForName:@"Section"];    
            for (int i=0; i<[sections count]; i++)    
            {    
                NSXMLElement *element = [sections objectAtIndex:i];    
                NSString *sectionName = [[element attributeForName:@"name"] stringValue];    
                if ([sectionName isEqualToString:@"References"])    
                {    
                    NSArray *lists = [element elementsForName:@"List"];    
                    for (int x=0; x<[lists count]; x++)    
                    {    
                        element = [lists objectAtIndex:x];    
                        NSString *separator = [[element attributeForName:@"separator"] stringValue];    
                        if ([[[element attributeForName:@"name"] stringValue] isEqualToString:@"DateTime"])    
                        {                                
                            referencesDateTimeList = [[element stringValue] componentsSeparatedByString:separator];    
                        }    
                        else if ([[[element attributeForName:@"name"] stringValue] isEqualToString:@"Reading"])    
                        {    
                            referencesReadingList = [[element stringValue] componentsSeparatedByString:separator];    
                        }    
                    }    
                }    
                else if ([sectionName isEqualToString:@"Experiment"])    
                {    
                    NSArray *lists = [element elementsForName:@"List"];    
                    for (int x=0; x<[lists count]; x++)    
                    {    
                        element = [lists objectAtIndex:x];    
                        NSString *separator = [[element attributeForName:@"separator"] stringValue];    
                        if ([[[element attributeForName:@"name"] stringValue] isEqualToString:@"Reading"])    
                        {    
                            experimentReadingList = [[element stringValue] componentsSeparatedByString:separator];    
                        }    
                        else if ([[[element attributeForName:@"name"] stringValue] isEqualToString:@"Validity"])    
                        {    
                            experimentValidityList = [[element stringValue] componentsSeparatedByString:separator];    
                        }    
                    }    
                }  
                else if ([sectionName isEqualToString:@"Subject"])  
                {  
                    NSArray *items = [element elementsForName:@"Item"];  
                    for (int x = 0; x<[items count]; x++)  
                    {  
                        element = [items objectAtIndex:x];  
                        if ([[[element attributeForName:@"name"] stringValue] isEqualToString:@"Id"]) {  
                            subjectID = [@"Subject ID: " stringByAppendingString:[element stringValue]];  
                        }  
                    }  
                }  
            }   
              
            // Create the new file  
            [[NSFileManager defaultManager] createFileAtPath:[path stringByAppendingString:newFileName] contents:nil attributes:nil];  
              
            // Set which file to write the data to -- the one that we just created  
            NSFileHandle *fileToWriteTo = [NSFileHandle fileHandleForWritingAtPath:[path stringByAppendingString:newFileName]];  
              
            // Create the strings from the data arrays  
              
            // Write the identifier at the top  
            NSString *identifiers = [NSString stringWithFormat:@"%@\n", subjectID];  
            [fileToWriteTo writeData:[identifiers dataUsingEncoding:NSUTF8StringEncoding]];  
              
            // Compare the length of the arrays b/c we don't know which is largest  
            largestArraySize = [referencesDateTimeList count];  
            if ([referencesReadingList count] > largestArraySize) {  
                largestArraySize = [referencesReadingList count];  
            }  
            if ([experimentReadingList count] > largestArraySize) {  
                largestArraySize = [referencesReadingList count];  
            }  
            if ([experimentValidityList count] > largestArraySize) {  
                largestArraySize = [experimentValidityList count];  
            }  
              
            for (int i=0; i<largestArraySize; i++)    
            {    
                // Set the appropriate values for the variables, based on the length of the largest array  
                if (i >= [referencesDateTimeList count]) {  
                    refDateTime = @"";  
                }  
                else {  
                    // This fixes the datetime to drop the "T" in the middle of the string  
                    refDateTime = [[[[referencesDateTimeList objectAtIndex:i] substringToIndex:10] stringByAppendingString:@" "] stringByAppendingString:[[referencesDateTimeList objectAtIndex:i] substringFromIndex:12]];  
                }  
                  
                if (i >= [referencesReadingList count]) {  
                    refReading = @"";  
                }  
                else {  
                    refReading = [referencesReadingList objectAtIndex:i];  
                }  
                  
                if (i >= [experimentReadingList count]) {  
                    expReading = @"";  
                }  
                else {  
                    expReading = [experimentReadingList objectAtIndex:i];  
                }  
                  
                if (i >= [experimentValidityList count]) {  
                    expValidity = @"";  
                }  
                else {  
                    expValidity = [experimentValidityList objectAtIndex:i];  
                }  
                  
                NSString *csv = [NSString stringWithFormat:@"%@,%@,%@,%@\n",    
                                 refDateTime,    
                                 refReading,    
                                 expReading,    
                                 expValidity];  
                  
                // Write this to the file  
                [fileToWriteTo writeData:[csv dataUsingEncoding:NSUTF8StringEncoding]];     
            }    
            ///////////////////  
              
            // Close the file that we just wrote to  
              
            NSLog(@"Completed Writing %@", newFileName);  
            [fileToWriteTo closeFile];  
        }  
        else {  
            // Ignore the files that are not .xml files  
            NSLog(@"%@ is NOT an .xml file.", currentFile);  
        }  
    }  
  
    [pool drain];  
    return 0;  
}  

It’s unwieldy, but works. If anybody has suggestions about how to do this more efficiently in Obj-C, please let me know. It probably would be a lot easier in Python or something, but right now I’m focused on learning Obj-C.

Advertisements

Written by Clay

December 13, 2009 at 23:27

2 Responses

Subscribe to comments with RSS.

  1. Hi Clay,

    i developed a xsd2code generator some time ago, i believe it can be used to at least take the burden of parsing the xml of your hands. Do you have an XSD? the data looks like a mix of XML and XSD (eg. 2007-01-01T00:00:00)
    Anyway, if you have an XSD i can run it through the generator and see what happens

    cheers

    Chris

    lukassen

    January 19, 2010 at 19:59

  2. Hey Chris. All of the files that were trouble for me matched the above structural example, so the code here took care of them. Thankfully, I won’t have any more incoming, so I think that I’m all set.

    Thanks for the offer, though!

    Clay

    January 19, 2010 at 20:07


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: