Uploaded image for project: 'DMC - Development'
  1. DMC - Development
  2. DMC-832

Segmentation Fault in gridftp plugin with thread locking

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: gfal2-python 1.8.4
    • Fix Version/s: gfal2-python 1.8.5
    • Component/s: gfal2 python
    • Security Level: Public Data (This ticket is visible to anyone on the internet and will be indexed by search engines)
    • Labels:
      None

      Description

      Dear Alejandro,

      we have observed some strange behavior when using context.listdir and the gridftp plugin (2.11.1 and 2.12.0, el6, x86_64, tested on SL6.5, GLOBUS_THREAD_MODEL=none). It took us quite some time to write a minimal test, because the error is not always reproducible which implies race-conditions:

      Unable to find source-code formatter for language: python. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
      # -*- coding: utf-8 -*-
      
      import gfal2
      
      import threading
      from time import sleep
      from multiprocessing.pool import ThreadPool
      
      
      lock = threading.Lock()
      
      n = 15
      pool = ThreadPool(n)
      
      def run(*args, **kwargs):
          while True:
              with lock:
                  print "in lock"
                  sleep(2)
              print "out of lock"
      
      pool.map_async(run, range(n))
      
      ctx = gfal2.creat_context()
      def func():
          for elem in ["data_2015D_e", "data_2015D_e", "ttJets_powheg", "QCD_Ht500To700"]:
              print len(ctx.listdir("gsiftp://dcache-door-cms16.desy.de:2811/pnfs/desy.de/cms/tier2/store/user/mrieger/analyses/ttH_bb_semi/CreatePxlioFiles/RunIIFall15MiniAODv2_13TeV_25bx_76X/%s/nominal/prod1/allEvents" % elem))
      
      for i in range(30):
          func()
      

      You might need to call the script a few times to make the segfault apear. This is an example output:

      Unable to find source-code formatter for language: python. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
      in lock
      8472
      5750
      out of lock
       in lock
      Segmentation fault
      

      Sometimes we even get fatal python GC erros:

      Unable to find source-code formatter for language: python. Available languages are: actionscript, html, java, javascript, none, sql, xhtml, xml
      in lock
      8472
      5750
      out of lock
       in lock
      Fatal Python error: GC object already tracked
      Aborted
      

      We observed that the error occurs only when a thread acquired the lock. Can you reproduce the error? Maybe https://gitlab.cern.ch/dmc/gfal2/blob/develop/src/plugins/gridftp/gridftpwrapper.cpp#L681 is a candidate, but we're really not sure about this.

      Cheers,
      Marcel

        Attachments

          Activity

            People

            • Assignee:
              aalvarez Alejandro Alvarez Ayllon
              Reporter:
              mrieger Marcel Rieger
              Component Watchers:
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: