Trouble Shooting Guide

From EOVSA Wiki
Revision as of 15:03, 1 November 2017 by Ychai (talk | contribs)
Jump to navigation Jump to search

This is a trouble shooting guide for tohbans monitoring EOVSA remotely using MobaXterm and VNC Viewer.

<General checklist for solar observation>

1. Check Antenna Status Page to see if any antenna is under work.

2. In Schedule window, click "Today", "File", choose "Save" (overwrite if prompted), and "Go".

Since Feb. 2017, the schedule setup is slightly different. Do the following:

2.1. Load 'solar.scd' and hit Today. Save it (overwrite if prompted).

2.2. Open 'solar_plus3c84_Feb2017.scd' in Texteditor (in ~/Dropbox/PythonCode/Current folder).

2.3. Update the sunrise and the sunset time according to the solar.scd file that you just updated.

2.4. Update the PHASECAL (and refcal, which is 1-hr PHASECAL, if necessary) times by subtracting 4 minutes from each scan (to account for the day-to-day sidereal time shift of each calibrator source). Shift the times of previous and next lines (usually ACQUIRE and SUN) accordingly.

2.5. Save the updated 'solar_plus3c84_Feb2017.scd'. Don't forget to update the DATE as well.

2.6. Load the updated 'solar_plus3c84_Feb2017.scd' and hit Go.

3. Antenna Tracking - are all antenna tracking (in white color)?

4. Frequency Tuning - LO1A Sweep Status = "Sweeping", FSeqFile = FSEQ-FILE on the schedule, ErrorMsg = "No error"

5. Phase Tracking - "ON"

6. Power and Attenuation - Are all dBm on both H- and V- Channels within the second and third numbers shown in "AGC" on the schedule window? You can also see SaveList hpol and vpol to check this.

7. Temps - no fluctuation?

8. CryoRX - this is for antenna 14 control system. If it is down (Eg: FEMA Outlets & Receiver Voltage/Current values are zero and status is OFF), then issue the command 'ctlgo' in the terminal.

9. Make sure that EOVSA Observing Status Page is being updated and that the data is being recorded. You can check if the data is recorded by typing "ls /data1/IDB |tail" in DPP terminal too.

10. STOW antennas at the end of the observation, if needed (see possible problem with Ant 10)

11. Checking the PHASECAL plots PHASECAL plot page, if you notice any unusual noisy data on ants 9, 10, 11 or 13, generally it means the antenna did not stow properly on a previous occasion, so you should issue the commands (for example with ant 13): step 1: stop ant13

step 2: stow ant13 (wait for it to completely stow--repeat steps 1 and 2 if it seems like it is not stowing after 5 minutes or so)

step 3: track ant13

New! 12. After the day's observation is over, take a look at the results of all PHASECAL by going to PHASECAL plot page. Note any scan that didn't go well without the effect of WINDSCRAM (if it was under WINDSCRAM then the data points would appear in red). Record your comments on them in tohban log at EOVSA tohban log page. Log other activities during your duty.

New! 13. Do the reference gain calibration analysis by following the procedures explained in Reference Gain Calibration by 1 pm on the next day.


Schedule window

I accidentally closed the schedule

1. Click "Schedule" (on the left task bar) just once.

2. Click "Today".

“Error: Could not write stateframe to SQL”

1. hit STOP on the schedule

2. type $scan-stop in Raw Command window (to stop the data recording)

3. close the schedule (exit out of it)

4. restart the program (by clicking on the icon at the left)

5. hit GO to start the observation again

Stateframe

Stateframe is frozen

1. Close the old one using command 'kill #' where # is the number followed by My PID in the terminal of sched@helios

2. Open a new Stateframe from the menu on the left ('sf_display')

3. Check the log box of the new stateframe

“ACC down?”

1. Open pdudigital.solar.pvt on web browser

2. Go to “Actions”

3. Go to “Loads” (on the left)

4. Click item 14 (ACC)

5. Hit “Cycle” and “Ok” when prompted

After rebooting, if Stateframe hangs up & does not respond, open a new Stateframe and give "kill ##" (## = “My PID” on the upper right corner of frozen Stateframe) command to sched@helios.solar.pvt server.

CryoRX tab - Status are OFF, all values are zeroes (Checklist #7 is false)

What you should be seeing is that FEMA Outlets and Receiver Voltages/Currents are all zeroes, and Status are all OFF (except for Noise Diode, and RFSwitch when using low frequency receiver). This means that the control system for receiver has died. You would still see that antennas are tracking fine and data is recorded, and it doesn't mean that these data are "wrong" or "unusable". They have to be ON and non-zeroes whenever you want to change receiver setting or modify attenuation setting, which sometimes happens during the observation.

To reboot, execute "starburstControl start" in antctl@feanta server (ssh connect from helios, if disconnected).

<12/18/2016> To reboot, just type "ctlgo" in a terminal window on helios in VNC Viewer (you may have to stop the schedule). If for any reason you want to stop the control system, type "ctlstop".

Antenna(s) down

Don’t forget to check the Antenna Status page before considering to “fix” any of the antennas!!

Symptoms: Not tracking, showing ‘AT STOW’ or other unwanted coordinates, both AZ and EL permits ON or only EL permit ON, or Axis Lock is ON

1. Ant 9, 10, 11, 13 could be in this state early in the schedule because they just can't move to commanded position (out of declination limit). In this case, you just have to wait a while (~few hours?)

2. In cold morning, large spike in the current may cause large position error in AT STOW state.

3. Proceed if neither #1 nor #2 is the case. If only AZ permit is ON (the first column), try "reboot 1 ant2" for rebooting ant2, for example.

4. If both AZ and EL permit is ON (the second column) or only EL permit is ON, then give command "$pcycle ant2" for resetting antenna 2. This switches OFF the power to antenna for 15 seconds and switches ON. In Communication tab, Ant 2 line will go red. Wait till it becomes white. If it does not become white, then try "sync ant2". If cRIO does not respond to this, it may be in “safe mode”, in which case you can type "$pcycle crio ant2" (if on ants 1-8 or 12) and it will cycle the power on the cRIO. Note that cRIO takes at least 2 minutes to reboot and come back online. If this sequence does not work, you may try $pcycle again, but keep in mind that this command in general should only be used when needed (i.e. discouraged if it can be avoided), to save wear and tear on the components.

5. Give "tracktable [the current tracktable ***.radec] ant2" and "track ant2" to initiate the tracking. If this does not work, look for temperature to raise (if temperature is low).

Ant14's cRIO's "Ant" value (last column) is showing negative value

What you may see is that ant14's cRIO's "Ant" value (the very last column) showing negative value (not necessarily the extremely large value like you see for some antennas that are down, but some random number with negative sign). When you observe this, go to "ant14.solar.pvt" on web browser and see if it says in red "Slot1 - Maths error" on the left side. It is believed to occur when the controller is interpolating coordinates for the last-entered track table, and the calculation blows up (i.e. pcal_tab.radec file would have had a day change in it when it was not supposed to).

This should not happen beyond 12/18/2016, but if you observe it beyond this date, report to Dr. Gary, and proceed to do the followings:

1. In "ant14.solar.pvt", go to "Log-in" and log-in (if you need ID/PW, ask Dr. Gary or Natsuha.)

2. Click "Parameters", then select "#10 - Status And Trips".

3. Choose "#1 - 10.00".

4. Enter "1070" to "Update values", and hit "change".

5. Go to "Parameters" again, and choose "#38 - User Trip".

6. Enter "100" to "Update values", and hit "change".

This is supposed to reset the controller. Watch for cRIO's Ant value changes to positive values. Take note on the time you did this procedure, and report it to Dr. Gary.

BRIGHTSCRAM

Find out which antenna is experiencing this by looking at FITS image files. BRIGHTSCRAM should appear as data-gap like features on the dynamics spectra. If more than two antennas are having BRIGHTSCRAM, then ALL antennas show BRIGHTSCRAM.

Wait for a while (~10 min) to see if it automatically goes away. After it goes away, give "tracktable [the current tracktable ***.radec] ant#" and "track ant#" to initiate the tracking.

Frequency Tuning's Sweep Status is “stopped” or "Queue overflow"

1. Try "Stop" and "Go" the schedule.

2. If #1 does not work, try "lo1a-reboot" in Raw command window

3. If #2 does not work, try the following raw commands:

fseq-off
fseq-init
fseq-file [the current frequency receiver setting ***.fsq] (should be in the right side of the schedule window, like solarhi.fsq)
fseq-on

Temperature is fluctuating too much

Try rebooting the temperature controller by typing "tec$bc ant2" for ant2, for example (tec => Thermo-Electric Controller).

nd-on is on (Attenuation)

Send "nd-off ant#" raw command to turn off the local noise diode.

Figure 1: Example of the oscillation from unbalanced attenuation for ant 12 (orange).
Figure 2: Example of the oscillation from unbalanced attenuation for ant 2 (red) and 5 (cyan).

hpol/vpol plot (Savelist) is showing unusual oscillating behavior

What you should see is the dBm values of the antenna fluctuating very violently like in Figure 1 and 2. Notice that the amplitude of the fluctuation is ~3 dB, which was one FEMATTN step (at this date). This happens when hattn/vattn settings of the antenna get changed somehow and two polarizations get very unbalanced. The result is that the automatic gain control is not being able to find a happy level for both at the same time, and went into an oscillation. To calm it down, first issue the commands:

femauto-off ant#
hattn 0 0 ant#
vattn 0 0 ant#

which turns off the automatic gain control. If the antenna is on the Sun, temporarily move it off the Sun using

radecoff 0 10 ant#

With the antenna off the Sun, set the hattn and vattn settings until both power levels are around 3 dB, i.e.:

hattn 0 12 ant#
vattn 0 11 ant#

where the choice of attenuations (12 and 11 in this example) are those that set the power level close to 3 dB. Finally, turn the gain control back on, with

femauto-on ant#

If you issued the radecoff command, be sure to remove it with

radecoff 0 0 ant#

If the fluctuation is within one FEMATTN step (2 dB as of 12/12/16, check Schedule Command - FEMATTN level), the cause might be just interference. In this case, leave it for a while and see if the oscillation goes away.

Antenna does not stow (Ant 10)

This mostly seems to happen on Ant 10 (as of ~July 2017). The symptom is that Ant 10 keeps staying at "TO STOW" status while all other (old) antennas are AT STOW already at the end of the observation. You might have tried the command "stow ant10", but it did not change the status. If this continues for more than a minute or so, it is likely that the antenna is running into a limit, and cannot be stowed properly with just "stow ant10" command (you also cannot trust if it does go to STOW by itself much later). To properly stow the antenna, issue "stop ant10" first, then do "stow ant10". You may need to do this multiple times. If you don't properly stow the antenna this way, it may not start tracking automatically next morning, and you will miss the data from this antenna.

Antenna tab is blank and an attempt to switch to it causes the Stateframe to freeze

This occurred around early June of 2017. The cause turned out to be a change in numpy behavior. Dr. Gary updated the numpy at some point and a subtle difference caused it. This means that we should think about software upgrade as one of the causes of malfunctions of our system sometimes.

Data recording (DPP)

Data recording has stopped (ls /data1/IDB |tail does not return the most recent file)

You need to delete dpplock.txt file. Follow these steps:

1. Enter "top" into user@dpp.solar.pvt command line (if user@dpp.solar.pvt is not there, open a new terminal/terminal tab in VNC viewer & type “ssh -X user@dpp.solar.pvt).

2. Look for "dppxmp” under “command” column. If it is there, do NOT delete dpplock.txt. If it’s not there, then quit top by hitting “q” and proceed.

3. Type “rmlock" on DPP terminal. Check if the data recording has recovered by sending "ls /data1/IDB |tail".

Network

Cannot open VNC Viewer, or VNC Viewer's response is too slow

Open the “local” raw command window and Stateframe window by following these steps:

1. Type "cd /common/python/current" in helios.solar.pvt terminal of MobaXterm

2. Type "./sched_commands.py" for raw command window

3. Type "./sf_display.py" for Stateframe window (add “ &” in the end if you want to keep typing the command in the same helios window) -- note that this Stateframe window may take a while (~5 min or more) to load.

Others

Figure 3: Geosynchronous satellite signals seen in flare monitor.

Strong interference in flare monitor

Twice per year (for 1-2 weeks centered around Mar. 5 and Oct. 5), the Sun enters in geosynchronous satellite belt. In this case, we see strong signals on flare monitor, like in Figure 3 (blue line). These are radio signals from man-made satellites, which will not harm the system and cannot be avoided, so don't be alarmed.

The “streak” in the lowest frequency of the dynamic spectrum

If you are seeing this at the beginning or at the end of the day, this is the Sun! See Figure 4 and Figure 5 for sample images. When the baseline is foreshortened (as in near sunrise or sunset), the response is quite strong to the solar disk. As the Sun rises, the intensity goes down because the baselines start to get longer. You will actually see the reverse trend in the afternoon, although often the RFI is stronger so the color scale is more blue than in the morning.

Figure 4: The solar signal at the beginning of 2015-12-02 observation period. Notice that the low frequency intensity is decreasing as the Sun rises.
Figure 5: The solar signal at the end of 2015-12-02 observation period. Notice the reverse effect compared to Figure 4.