Tohban OVRO-LWA Imaging Tutorial
The OVRO-LWA has three solar modes that can operate concurrently. These are (1) the beamformer, which creates a high-resolution spectrogram of the solar activity each day, (2) a slow visibility mode that records data in CASA ms format for all 352 antennas and all 3072 frequencies at 10-s cadence, and (3) a fast visibility mode that records data for a 48-antenna subset (generally the outer antennas) and 768 frequencies at 1-s cadence. The recorders that record the data are all activated separately, so it is not guaranteed that data from all three modes are available at any one time. Also, because of the vast data volume most of the recorded data are not saved, but rather are overwritten after a day or so, hence any data that are wanted must be explicitly saved by copying it to another location. Again because of the large volume of data, such copying is too slow to save much data (at least at present), so we can generally save only about an hour of data per day.
Note: This tutorial only describes how to work with the slow visibility data at the moment.
The imaging pipeline is written in Python 3, so in order to use it one must set up a Python 3 environment. These instructions assume you are working in your own home directory on the Pipeline machine at OVRO. First enter the bash shell if you are not already in it. Type
echo $0 to see what shell you are in. If that returns something other than -bash, type
bash to enter the shell. Next check if you have the line
alias loadpyenv3.8='source /home/user/.setenv_pyenv38' in your ~/.bash_aliases file. If not, add it using your favorite editor, then activate it with
source ~/.bash_aliases. From there, you can type
loadpyenv3.8 to enter the Python 3.8 environment. Finally, from your home folder, type
git clone https://github.com/binchensun/ovro-lwa-solar to install the OVRO-LWA code. To test your Python environment, log out and log in again fresh, then type
$> loadpyenv3.8 $> ipython --pylab import sys sys.path.append('/home/dgary/ovro-lwa-solar') # Replace with your own home directory import solar_pipeline
If that succeeds, you should be ready to proceed.
Where to Find Data
The next step is to find the data you want to work with. You will need some calibration data as well as the solar data for your target date. As of this writing, the existing solar data on Pipeline, is in two separate places: /nas5/ovro-lwa-data (data up to 2023-09-03) and /nas6/ovro-lwa-data (data from 2023-09-18 and later). All of the existing beamformed data (spectrograms) are in /nas5/ovro-lwa-data/beam/beam-data.
This tutorial uses the example of the type II burst on 2023-07-28.
Examining the Spectrogram for Your Date
It is good practice to examine the spectrogram for your date/time, to guide your selection of frequencies and times to use for imaging. You can check the folders and subfolders in /nas5/ovro-lwa-data/beam/beam-data to see what files exist. Note that the filenames have the Modified Julian Data (mjd) followed by hours, minutes, seconds in the format <mjdday>.<hh><mm><ss>?????????? where the ? indicate more digits of the fraction of a second. The type II burst we are interested in started around 15:43 UT on 2023 July 28, which is MJD 060154, so the file we want is
/nas5/ovro-lwa-data/beam/beam-data/202307/beam20230728/060153_152717110834334d2be, which starts at 15:27:17 UT. Generally these files contain 30 min of data. The type II continues into the next file, which is
To read and display this file, in iPython type
import sys # If not already loaded sys.path.append('/nas5/ovro-lwa-data/beam/software/') from lwa import lwa_read, lwa_plot datadir = '/nas5/ovro-lwa-data/beam/beam-data/202307/beam20230728/' data = lwa_read(datadir+'060153_152717110834334d2be', stokes='IV', timebin=1, freqbin=4) lwa_plot(data, vmax=15000,vmin=10)
which defaults to log-scaled amplitudes and viridis color table for stokes I and linear-scaled amplitudes and grayscale for stokes V, as shown at left. You can examine lwa_plot? for more options.
Calibration and Imaging Script
The script below assumes some previous setup. First, a "home" directory needs to be created and the script must be run from that directory. Because of the large amount of disk space required, create your "home" directory on /data1. Mine is /data1/dgary/OVRO-LWA/20230728_workdir. Before running the script, you'll need to change the 7 lines indicated with the ***Change comments.
- The first such line is the list of frequency bands you want to image. In this case I have all 13 useful bands. Frequencies below 27 MHz rarely image well and in many cases we did not save the data for those frequencies anyway.
- The second is a string representing the date of the event, including an underscore (this is part of a filename).
- The third line is a list of solar times. These times have to exactly match existing filenames, so you'll have to do a listing of the data directory to check them. Warning: Doing a listing of the entire data directory is time consuming and not useful, since there are many thousands of files there. Instead, use something like:
ls /nas5/ovro-lwa-data/20230728/slow/20230728_1553*to limit the number of files returned.
- The fourth line is the date string of the calibration data. This will almost always be the same as the date string of the data, but it is possible to use a calibration from a different date if not too far apart.
- The fifth line is the time of the calibration data. Again, this must exist. Usually the calibration is done at night so the time will be quite different, e.g. 0500 UT, and a command like
ls /nas5/ovro-lwa-data/20230728/slow | head -20will list the first 20 files in the folder, which are likely the calibration files. Unfortunately, no nighttime calibration exists for this date, so I had to use a daytime time, 15:40 UT.
- The sixth line is the path to the data.
- The seventh line is the path to the calibration data, again usually the same as that for the data.
import os, glob import utils from time import time import solar_pipeline freqs=[27,32,36,41,46,50,55,59,64,69,73,78,82] # ***Change to the bands you want to image datstr = '20230728_' # ***Change to the date of your event solar_times = ['155306','155316','155326'] # ***Change to the times to use for solar imaging -- these times must exist! caldatstr = '20230728_' # ***Change to the date of your cal data cal_time = '154003' # ***Change to the time for your calibration datapath = '/nas5/ovro-lwa-data/20230728/slow/' # ***Change to path to your data calpath = '/nas5/ovro-lwa-data/20230728/slow/' # ***Change to path to your calibration data home=os.getcwd() for solar_time in solar_times: for freq in freqs: calib_ms=caldatstr+cal_time+'_'+str(freq)+"MHz.ms" # Will be copied from calpath solar_ms=datstr+solar_time+'_'+str(freq)+"MHz.ms" # Will be copied from datapath bcal='caltables/'+calib_ms.replace('ms','bcal') # Will be created if it doesn't already exist imagename=datstr+solar_time+'_'+str(freq)+"MHz" image_fold = 'images/' # Create frequency folder, if it doesn't exist freq_fold=str(freq)+"MHz" if not os.path.isdir(freq_fold): os.mkdir(freq_fold) # Copy the solar data for this time (will be deleted later) print('Copying solar data to frequency folder') os.system("cp -r "+os.path.join(datapath,solar_ms)+"* "+freq_fold+"/") # Copy the calibration data (will be deleted later) print('Copying calibration data to frequency folder') os.system("cp -r "+os.path.join(datapath,calib_ms)+"* "+freq_fold+"/") os.chdir(freq_fold) if not os.path.isdir(image_fold): os.mkdir(image_fold) if not os.path.isfile(bcal): bcal = None if not os.path.isdir('caltables'): os.mkdir('caltables') if not os.path.isdir('final_ms'): os.mkdir('final_ms') try: solar_pipeline.image_ms(solar_ms=solar_ms,calib_ms=calib_ms,bcal=bcal,\ imagename=imagename,do_final_imaging=False,logfile='analysis_'+str(freq)+'.log') msname = datstr+solar_time+'_'+str(freq)+'MHz_final.ms' os.system("mv *calibrated_selfcalibrated_sun_only_sun_selfcalibrated_sun_only.ms final_ms/"+msname) os.system("rm -rf *.ms* *.fits *.gcal *.cl *.badants") # Make 10 images for this band (integrates over 19 or 20 subchannels, bandwidth ~0.4545 MHz) os.system('wsclean -no-dirty -size 1024 1024 -scale 1arcmin -weight uniform -minuv-l 10 -name '+imagename+' -niter 10000 -mgain 0.8 -beam-fitting-size 1 -pol I -join-channels -channels-out 10 final_ms/'+msname) # Convert images to heliocentric, move them to the final image folder, and delete all fits files files = glob.glob('*-image.fits') for imgfile in files: utils.correct_primary_beam('final_ms/'+msname, imgfile.split('-image.fits')) helio_image = utils.convert_to_heliocentric_coords('final_ms/'+msname, imgfile) os.system('mv '+helio_image+' '+image_fold) os.system('rm *.fits') except: pass os.chdir(home)
What Happens When You Run the Script
One way to run this script is to cut-and-paste into a file, say process.py, and then in an iPython session type
import sys sys.path.append('/home/dgary/ovro-lwa-solar') # Change to your path where you cloned the git repository run 'process.py'
If all goes well, after many hours you will have all of your images. If you examine the script, you will see that there are two loops, an inner one over frequency and an outer one over time. The inner loop will create a subdirectory for the frequency it is working on (first will be subdirectory named 27MHz), then do the calibration for that frequency and create a subfolder caltables with a .bcal file in it. Luckily, this only has to be done once and then the .bcal file will be used for subsequent times so its creation will be skipped. Other files with .gcal extension will be created for the first data time, and also will be reused for subesquent times up to one hour later. When a new .gcal file is needed, the pipeline will create it automatically for you. The gain files take about 10 min for each frequency, but again is only done once for an hour of data. After the calibration is complete,
wsclean is used to create images (in 10 subbands of each 4.5 GHz band, plus an MFS image integrated over the whole band). They are converted to heliographic coordinates and you will find them in 27MHz/images when done. This takes another 10 minutes or so.
When all of that is done for the first frequency, the whole process starts again for the next, and so one until all images for the first time are done. In this example, then, it will take about 20/min per frequency * 13 frequencies = 260 minutes (> 4 hours!) to make all 143 images for the first time (10 images per band + 1 MFS image). For subsequent times, though, the calibration step is skipped so each subsequent time will take 10 min * 13 frequencies (around 2 hours). That means the entire script will run in about 8 hours and produce 429 images.