Tools

MACS2

class mg_process_macs2.tool.macs2.Macs2(configuration=None)[source]

Tool for peak calling for ChIP-seq data

static get_macs2_params(params)[source]

Function to handle to extraction of commandline parameters and formatting them for use in the aligner for BWA ALN

Parameters:params (dict) –
Returns:list – List of lists with each list is the parameter and the matching value
Return type:list
macs2_peak_calling(**kwargs)[source]

Function to run MACS2 for peak calling on aligned sequence files and normalised against a provided background set of alignments.

Parameters:
  • name (str) – Name to be used to identify the files
  • bam_file (str) – Location of the aligned FASTQ files as a bam file
  • bai_file (str) – Location of the bam index file
  • bam_file_bgd (str) – Location of the aligned FASTQ files as a bam file representing background values for the cell
  • bai_file_bgd (str) – Location of the background bam index file
  • narrowpeak (str) – Location of the output narrowpeak file
  • summits_bed (str) – Location of the output summits bed file
  • broadpeak (str) – Location of the output broadpeak file
  • gappedpeak (str) – Location of the output gappedpeak file
  • chromosome (str) – If the tool is to be run over a single chromosome the matching chromosome name should be specified. If None then the whole bam file is analysed
Returns:

  • narrowPeak (file) – BED6+4 file - ideal for transcription factor binding site identification
  • summitPeak (file) – BED4+1 file - Contains the peak summit locations for everypeak
  • broadPeak (file) – BED6+3 file - ideal for histone binding site identification
  • gappedPeak (file) – BED12+3 file - Contains a merged set of the broad and narrow peak files
  • Definitions defined for each of these files have come from the MACS2
  • documentation described in the docs at https (//github.com/taoliu/MACS)

macs2_peak_calling_nobgd(**kwargs)[source]

Function to run MACS2 for peak calling on aligned sequence files without a background dataset for normalisation.

Parameters:
  • name (str) – Name to be used to identify the files
  • bam_file (str) – Location of the aligned FASTQ files as a bam file
  • bai_file (str) – Location of the bam index file
  • narrowpeak (str) – Location of the output narrowpeak file
  • summits_bed (str) – Location of the output summits bed file
  • broadpeak (str) – Location of the output broadpeak file
  • gappedpeak (str) – Location of the output gappedpeak file
  • chromosome (str) – If the tool is to be run over a single chromosome the matching chromosome name should be specified. If None then the whole bam file is analysed
Returns:

  • narrowPeak (file) – BED6+4 file - ideal for transcription factor binding site identification
  • summitPeak (file) – BED4+1 file - Contains the peak summit locations for everypeak
  • broadPeak (file) – BED6+3 file - ideal for histone binding site identification
  • gappedPeak (file) – BED12+3 file - Contains a merged set of the broad and narrow peak files
  • Definitions defined for each of these files have come from the MACS2
  • documentation described in the docs at https (//github.com/taoliu/MACS)

run(input_files, input_metadata, output_files)[source]

The main function to run MACS 2 for peak calling over a given BAM file and matching background BAM file.

Parameters:
  • input_files (list) – List of input bam file locations where 0 is the bam data file and 1 is the matching background bam file
  • metadata (dict) –
Returns:

  • output_files (list) – List of locations for the output files.
  • output_metadata (list) – List of matching metadata dict objects