Structure

This post is an extension of the previous one “Optimizing Project Planning and Material transportation”. The previous post describes some techniques how for optimizing the schedule, where the data is prepared manually. So, the current article will show the method of obtaining the data from Primavera P6 project files (*.xer).

Looking for libraries in Python programming language, I have found two:

xerparser
PyP6Xer.

Both look to have the same features at first glance. The first one is chosen for this case.

The content below is based on the ipython notebook, so there is an option to check it out on nbviewer.org.

Intro

This notebook demonstrates an example of how to use the Primavera P6 xer files reader library for data preparation.

The XER file contains the same data presented in the notebook.

By obtaining the file via URL link, the required data—task names, durations, precedences, and release dates (intervals between each house’s earliest starting date)—is transformed into Python objects.

Installing xerparser

import importlib

# Check if xerparser is already installed
try:
    importlib.import_module('xerparser')
    print("xerparser is already installed.")
except ImportError:
    try:
        # Attempt installation
        %pip install xerparser
    except Exception as e:
        print("An error occurred while installing xerparser:", e)

Getting the XER file

import requests

url = "https://raw.githubusercontent.com/inigmat/exupery/main/files/schedule/MDL4D.xer"

try:
    response = requests.get(url)
    if response.status_code == 200:
        print("Request successful!")
    else:
        print(f"Failed to retrieve data. Status code: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

Read the downloaded file with checking for errors

from xerparser import Xer, CorruptXerFile

try:
    xer = Xer(response.text)
except CorruptXerFile as e:
    for error in e.errors:
        print(error)

Getting the project data

project = list(xer.projects.values())[0]

We have to get the following data

NbWorkers = 3
NbHouses  = 5

TaskNames = ("masonry","carpentry","plumbing",
             "ceiling","roofing","painting",
             "windows","facade","garden","moving")

Duration =  [35, 15, 40, 15, 5, 10, 5, 10, 5, 5]

ReleaseDate = [31, 0, 90, 120, 90]

Precedences = [("masonry", "carpentry"), ("masonry", "plumbing"), ("masonry", "ceiling"),
               ("carpentry", "roofing"), ("ceiling", "painting"), ("roofing", "windows"),
               ("roofing", "facade"), ("plumbing", "facade"), ("roofing", "garden"),
               ("plumbing", "garden"), ("windows", "moving"), ("facade", "moving"),
               ("garden", "moving"), ("painting", "moving")]

Start with getting the WBS (house numbers)

We skip the first two levels using WBS_LVL variable

WBS_LVL = 2
houses = [obj.name for obj in project.wbs_nodes[WBS_LVL:]]
NbHouses= len(houses)

Obtain the labor resources of the project to get the number of workers

workers = {}
for res in project.resources:
  if res.rsrc_type == "RT_Labor":
    workers[res.rsrc_id] = res.resource.name
NbWorkers = len(workers)

Prepare pandas dataframe to check out the tasks data such as ‘TASK ID’,’Name’,’Type’, ‘WBS ID’, ‘Duration’, ‘Successors (ID, Link, Lag)’

import pandas as pd

tasks_df = pd.DataFrame(columns=['TASK ID','Name','Type', 'WBS ID', 'Duration', 'Successors (ID, Link, Lag)'])

for task in project.tasks:
    task_sucs = []
    for pred_link in task.successors:
        task_suc = pred_link.task.uid
        task_link = pred_link.link
        task_lag = pred_link.lag
        task_sucs.append((task_suc, task_link, task_lag))

    tasks_df = pd.concat([tasks_df, pd.DataFrame(
        {'TASK ID': [task.uid],
         'Name': [task.name],
         'Type': [task.type],
         'WBS ID': [task.wbs_id],
         'Duration': [task.duration],
         'Successors (ID, Link, Lag)': [task_sucs],
         })], ignore_index=True)

tasks_df.head(15)

Output:

	TASK ID	Name	Type	WBS ID	Duration	Successors (ID, Link, Lag)
0	104836	House 1 Start	TaskType.TT_Mile	26152	0	[]
1	104837	Masonry	TaskType.TT_Task	26154	35	[(104838, FS, 0), (104839, FS, 0), (104840, FS...
2	104838	Carpentry	TaskType.TT_Task	26154	15	[(104841, FS, 0)]
3	104839	Plumbing	TaskType.TT_Task	26154	40	[(104844, FS, 0), (104845, FS, 0)]
4	104840	Ceiling	TaskType.TT_Task	26154	15	[(104842, FS, 0)]
5	104841	Roofing	TaskType.TT_Task	26154	5	[(104843, FS, 0), (104844, FS, 0), (104845, FS...
6	104842	Painting	TaskType.TT_Task	26154	10	[(104846, FS, 0)]
7	104843	Windows	TaskType.TT_Task	26154	5	[(104846, FS, 0)]
8	104844	Facade	TaskType.TT_Task	26154	10	[(104846, FS, 0)]
9	104845	Garden	TaskType.TT_Task	26154	5	[(104846, FS, 0)]
10	104846	Moving	TaskType.TT_Rsrc	26154	5	[(104883, FF, 0)]
11	104847	House 2 Start	TaskType.TT_Mile	26152	0	[]
12	104848	Masonry	TaskType.TT_Task	26155	35	[(104849, FS, 0), (104850, FS, 0), (104851, FS...
13	104849	Carpentry	TaskType.TT_Task	26155	15	[(104852, FS, 0)]
14	104850	Plumbing	TaskType.TT_Task	26155	40	[(104855, FS, 0), (104856, FS, 0)]

Get the list of task names. Filter by wbs_id due to the repeating structure of activities on each house

def get_task_dict_by_wbs_id(df, wbs_id):
    filtered_df = df[df['WBS ID'] == wbs_id]
    task_dict = dict(zip(filtered_df['TASK ID'], filtered_df['Name']))
    return task_dict

wbs_id = '26154' # wbs_id of activites on House 1 (taken from the dataframe above)
task_dict = get_task_dict_by_wbs_id(tasks_df, wbs_id)
TaskNames = tuple(task_dict.values())

Get tasks duration

durations = {task_id: tasks_df.loc[tasks_df['TASK ID'] == task_id, 'Duration'].values[0] for task_id in task_dict.keys()}
Duration =  list(durations.values())

Get the precedences of the tasks

prec = []
for task in project.tasks:
  if task.uid in task_dict:
    for pred_link in task.successors:
      suc_uid = pred_link.task.uid
      if suc_uid in task_dict:
        suc_name = pred_link.task.name
        prec.append((task.name,suc_name))
Precedences = prec

Get the release dates. Dates ordered by the house number

lags = {}
for rel in project.relationships:
    if str(rel.predecessor.type) == 'TaskType.TT_Mile':
      lags[rel.successor.wbs.name] = rel.lag

ReleaseDate_dict = {key: lags.get(key) for key in sorted(lags)}
ReleaseDate = list(ReleaseDate_dict.values())

Finally obtain the project data

print("NbWorkers =", NbWorkers, "\n")
print("NbHouses =", NbHouses, "\n")
print("TaskNames =", TaskNames, "\n")
print("Duration =", Duration, "\n")
print("ReleaseDate =", ReleaseDate, "\n")
print("Precedences =", Precedences, "\n")

Output:

NbWorkers = 3 

NbHouses = 5 

TaskNames = ('Masonry', 'Carpentry', 'Plumbing', 'Ceiling', 'Roofing', 'Painting', 'Windows', 'Facade', 'Garden', 'Moving') 

Duration = [35, 15, 40, 15, 5, 10, 5, 10, 5, 5] 

ReleaseDate = [31, 0, 90, 120, 90] 

Precedences = [('Masonry', 'Carpentry'), ('Masonry', 'Plumbing'), ('Masonry', 'Ceiling'), ('Carpentry', 'Roofing'), ('Plumbing', 'Facade'), ('Plumbing', 'Garden'), ('Ceiling', 'Painting'), ('Roofing', 'Windows'), ('Roofing', 'Facade'), ('Roofing', 'Garden'), ('Painting', 'Moving'), ('Windows', 'Moving'), ('Facade', 'Moving'), ('Garden', 'Moving')]