Details
-
Type:
Bug
-
Status: Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: gfal2-python 1.8.4
-
Fix Version/s: gfal2-python 1.8.5
-
Component/s: gfal2 python
-
Security Level: Public Data (This ticket is visible to anyone on the internet and will be indexed by search engines)
-
Labels:None
Description
Dear Alejandro,
we have observed some strange behavior when using context.listdir and the gridftp plugin (2.11.1 and 2.12.0, el6, x86_64, tested on SL6.5, GLOBUS_THREAD_MODEL=none). It took us quite some time to write a minimal test, because the error is not always reproducible which implies race-conditions:
# -*- coding: utf-8 -*- import gfal2 import threading from time import sleep from multiprocessing.pool import ThreadPool lock = threading.Lock() n = 15 pool = ThreadPool(n) def run(*args, **kwargs): while True: with lock: print "in lock" sleep(2) print "out of lock" pool.map_async(run, range(n)) ctx = gfal2.creat_context() def func(): for elem in ["data_2015D_e", "data_2015D_e", "ttJets_powheg", "QCD_Ht500To700"]: print len(ctx.listdir("gsiftp://dcache-door-cms16.desy.de:2811/pnfs/desy.de/cms/tier2/store/user/mrieger/analyses/ttH_bb_semi/CreatePxlioFiles/RunIIFall15MiniAODv2_13TeV_25bx_76X/%s/nominal/prod1/allEvents" % elem)) for i in range(30): func()
You might need to call the script a few times to make the segfault apear. This is an example output:
in lock 8472 5750 out of lock in lock Segmentation fault
Sometimes we even get fatal python GC erros:
in lock 8472 5750 out of lock in lock Fatal Python error: GC object already tracked Aborted
We observed that the error occurs only when a thread acquired the lock. Can you reproduce the error? Maybe https://gitlab.cern.ch/dmc/gfal2/blob/develop/src/plugins/gridftp/gridftpwrapper.cpp#L681 is a candidate, but we're really not sure about this.
Cheers,
Marcel