Structure

This post is an extension of the previous one “Optimizing Project Planning and Material transportation”. The previous post describes some techniques how for optimizing the schedule, where the data is prepared manually. So, the current article will show the method of obtaining the data from Primavera P6 project files (*.xer).

Looking for libraries in Python programming language, I have found two:

Both look to have the same features at first glance. The first one is chosen for this case.

The content below is based on the ipython notebook, so there is an option to check it out on nbviewer.org.

Intro

This notebook demonstrates an example of how to use the Primavera P6 xer files reader library for data preparation.

The XER file contains the same data presented in the notebook.

By obtaining the file via URL link, the required data—task names, durations, precedences, and release dates (intervals between each house’s earliest starting date)—is transformed into Python objects.

Installing xerparser

import importlib

# Check if xerparser is already installed
try:
    importlib.import_module('xerparser')
    print("xerparser is already installed.")
except ImportError:
    try:
        # Attempt installation
        %pip install xerparser
    except Exception as e:
        print("An error occurred while installing xerparser:", e)

Getting the XER file

import requests

url = "https://raw.githubusercontent.com/inigmat/exupery/main/files/schedule/MDL4D.xer"

try:
    response = requests.get(url)
    if response.status_code == 200:
        print("Request successful!")
    else:
        print(f"Failed to retrieve data. Status code: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Read the downloaded file with checking for errors

from xerparser import Xer, CorruptXerFile

try:
    xer = Xer(response.text)
except CorruptXerFile as e:
    for error in e.errors:
        print(error)

Getting the project data

project = list(xer.projects.values())[0]

We have to get the following data

NbWorkers = 3
NbHouses  = 5

TaskNames = ("masonry","carpentry","plumbing",
             "ceiling","roofing","painting",
             "windows","facade","garden","moving")

Duration =  [35, 15, 40, 15, 5, 10, 5, 10, 5, 5]

ReleaseDate = [31, 0, 90, 120, 90]

Precedences = [("masonry", "carpentry"), ("masonry", "plumbing"), ("masonry", "ceiling"),
               ("carpentry", "roofing"), ("ceiling", "painting"), ("roofing", "windows"),
               ("roofing", "facade"), ("plumbing", "facade"), ("roofing", "garden"),
               ("plumbing", "garden"), ("windows", "moving"), ("facade", "moving"),
               ("garden", "moving"), ("painting", "moving")]

Start with getting the WBS (house numbers)

We skip the first two levels using WBS_LVL variable

WBS_LVL = 2
houses = [obj.name for obj in project.wbs_nodes[WBS_LVL:]]
NbHouses= len(houses)

Obtain the labor resources of the project to get the number of workers

workers = {}
for res in project.resources:
  if res.rsrc_type == "RT_Labor":
    workers[res.rsrc_id] = res.resource.name
NbWorkers = len(workers)

Prepare pandas dataframe to check out the tasks data such as ‘TASK ID’,’Name’,’Type’, ‘WBS ID’, ‘Duration’, ‘Successors (ID, Link, Lag)’

import pandas as pd

tasks_df = pd.DataFrame(columns=['TASK ID','Name','Type', 'WBS ID', 'Duration', 'Successors (ID, Link, Lag)'])

for task in project.tasks:
    task_sucs = []
    for pred_link in task.successors:
        task_suc = pred_link.task.uid
        task_link = pred_link.link
        task_lag = pred_link.lag
        task_sucs.append((task_suc, task_link, task_lag))

    tasks_df = pd.concat([tasks_df, pd.DataFrame(
        {'TASK ID': [task.uid],
         'Name': [task.name],
         'Type': [task.type],
         'WBS ID': [task.wbs_id],
         'Duration': [task.duration],
         'Successors (ID, Link, Lag)': [task_sucs],
         })], ignore_index=True)
tasks_df.head(15)

Output:

TASK ID Name Type WBS ID Duration Successors (ID, Link, Lag)
0 104836 House 1 Start TaskType.TT_Mile 26152 0 []
1 104837 Masonry TaskType.TT_Task 26154 35 [(104838, FS, 0), (104839, FS, 0), (104840, FS...
2 104838 Carpentry TaskType.TT_Task 26154 15 [(104841, FS, 0)]
3 104839 Plumbing TaskType.TT_Task 26154 40 [(104844, FS, 0), (104845, FS, 0)]
4 104840 Ceiling TaskType.TT_Task 26154 15 [(104842, FS, 0)]
5 104841 Roofing TaskType.TT_Task 26154 5 [(104843, FS, 0), (104844, FS, 0), (104845, FS...
6 104842 Painting TaskType.TT_Task 26154 10 [(104846, FS, 0)]
7 104843 Windows TaskType.TT_Task 26154 5 [(104846, FS, 0)]
8 104844 Facade TaskType.TT_Task 26154 10 [(104846, FS, 0)]
9 104845 Garden TaskType.TT_Task 26154 5 [(104846, FS, 0)]
10 104846 Moving TaskType.TT_Rsrc 26154 5 [(104883, FF, 0)]
11 104847 House 2 Start TaskType.TT_Mile 26152 0 []
12 104848 Masonry TaskType.TT_Task 26155 35 [(104849, FS, 0), (104850, FS, 0), (104851, FS...
13 104849 Carpentry TaskType.TT_Task 26155 15 [(104852, FS, 0)]
14 104850 Plumbing TaskType.TT_Task 26155 40 [(104855, FS, 0), (104856, FS, 0)]

Get the list of task names. Filter by wbs_id due to the repeating structure of activities on each house

def get_task_dict_by_wbs_id(df, wbs_id):
    filtered_df = df[df['WBS ID'] == wbs_id]
    task_dict = dict(zip(filtered_df['TASK ID'], filtered_df['Name']))
    return task_dict

wbs_id = '26154' # wbs_id of activites on House 1 (taken from the dataframe above)
task_dict = get_task_dict_by_wbs_id(tasks_df, wbs_id)
TaskNames = tuple(task_dict.values())

Get tasks duration

durations = {task_id: tasks_df.loc[tasks_df['TASK ID'] == task_id, 'Duration'].values[0] for task_id in task_dict.keys()}
Duration =  list(durations.values())

Get the precedences of the tasks

prec = []
for task in project.tasks:
  if task.uid in task_dict:
    for pred_link in task.successors:
      suc_uid = pred_link.task.uid
      if suc_uid in task_dict:
        suc_name = pred_link.task.name
        prec.append((task.name,suc_name))
Precedences = prec

Get the release dates. Dates ordered by the house number

lags = {}
for rel in project.relationships:
    if str(rel.predecessor.type) == 'TaskType.TT_Mile':
      lags[rel.successor.wbs.name] = rel.lag
ReleaseDate_dict = {key: lags.get(key) for key in sorted(lags)}
ReleaseDate = list(ReleaseDate_dict.values())

Finally obtain the project data

print("NbWorkers =", NbWorkers, "\n")
print("NbHouses =", NbHouses, "\n")
print("TaskNames =", TaskNames, "\n")
print("Duration =", Duration, "\n")
print("ReleaseDate =", ReleaseDate, "\n")
print("Precedences =", Precedences, "\n")

Output:

NbWorkers = 3 

NbHouses = 5 

TaskNames = ('Masonry', 'Carpentry', 'Plumbing', 'Ceiling', 'Roofing', 'Painting', 'Windows', 'Facade', 'Garden', 'Moving') 

Duration = [35, 15, 40, 15, 5, 10, 5, 10, 5, 5] 

ReleaseDate = [31, 0, 90, 120, 90] 

Precedences = [('Masonry', 'Carpentry'), ('Masonry', 'Plumbing'), ('Masonry', 'Ceiling'), ('Carpentry', 'Roofing'), ('Plumbing', 'Facade'), ('Plumbing', 'Garden'), ('Ceiling', 'Painting'), ('Roofing', 'Windows'), ('Roofing', 'Facade'), ('Roofing', 'Garden'), ('Painting', 'Moving'), ('Windows', 'Moving'), ('Facade', 'Moving'), ('Garden', 'Moving')]