damiano123
9.07.2012, 10:38:59
Witam,
Dla naszego serwisu e-commerce mamy oddzielny serwer dedykowany dla bazy sql - jest to serwer HP 1U z dyskami SSD M4 Crucial 128 MB w raidzie softwerowym 1.
Serwer obsługuje Debian 64.
Od wczoraj jest nagły problem, ciągle wypisuje coś w typie "Cant read from swap" z tysiącami numerów linii.
Co to może być?
Jak serwer zresetuje w serwerowni to chwilę działa i znowu klapa.
Zawieszenie popoduje brak dostępu przez ssh.
Użyłem komend czyszczenia logów.
Czy jakiś czache się zapchał czy to jakaś usterka?
Proszę o pomoc!
Mogę dać mocny link w tym serwisie na ok 2 mln podstron.
redeemer
9.07.2012, 10:59:10
Przydałby się szczegółowy opis błędu. Sprawdź dysk, jeżeli będzie ok, sprawdź RAM. Możesz też zawsze spróbować zrobić numer z swapoff, mkswap i swapon.
damiano123
9.07.2012, 11:08:19
Przydałby się szczegółowy opis błędu. Sprawdź dysk, jeżeli będzie ok, sprawdź RAM. Możesz też zawsze spróbować zrobić numer z swapoff, mkswap i swapon.
Zacny przyjacielu, jestem troszkę zielony, komendy wpisuję po literce z kartki.
Jak mogę sprawdzić ten ram i dyski? Czy wyjąć czy jakaś komenda na to jest?
Mamy tam 2 lub 4 GB ramu. Serwer jest nowy
http://www.powerserver.pl/d7189_196421061777188.htmlDwa dyski Crucial M4 128 GB w Raid 1 softerowy + 250 GB Sata
Jak zrobić ten "numer" z swapoff, mkswap i swapon

Czy są na to jakieś komendy? Będę musiał tam pojechać bo jak resetuje to zanim wrócę to już się zawiesza.
redeemer
9.07.2012, 11:24:08
Najlepiej zwróć się do kogoś kto w miarę ogarnia system (może ten kto Ci go zainstalował)? Samemu bym nie grzebał, jeśli za bardzo nie czujesz się w temacie.
Kod
cat /etc/fstab
pokaże Ci gdzie masz zamontowaną partycję swap (linijka "/dev/md0 none swap sw 0 0" oznacza to że masz ją zamontowaną pod /dev/md0). Na początek bym proponował taki numer:
Kod
swapoff /dev/md0
mke2fs -c /dev/md0
W tym miejscu nie powinieneś mieć żadnych błędów, jeśli nie ma to:
Kod
mkswap /dev/md0
swapon /dev/md0
Robisz to jednak na własne ryzyko (chociaż nic nie powinno się stać, chyba że pomylisz urządzenia i /dev/md0 nie będzie partycją swap, ale np. głównym filesystemem)
damiano123
10.07.2012, 16:47:25
Serwer działa ok i nagle się dzieje takie coś
End request I/O Error, dev sdb, Sector
Read error on SWAP-DEVICE (8:16:7393280)
Kernel Panic - not syncing : Attempted to kill init
Polecenie dmesg (częsciowe) część 1 z 2
[ 4.539252] igb 0000:05:00.0: eth0: (PCIe:2.5Gb/s:Width x4) 78:e3:b5:19:2d:18
[ 4.539330] igb 0000:05:00.0: eth0: PBA No: FFFFFF-0FF
[ 4.539333] igb 0000:05:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 4.539356] alloc irq_desc for 42 on node -1
[ 4.539358] alloc kstat_irqs on node -1
[ 4.539366] igb 0000:05:00.1: PCI INT B -> GSI 42 (level, low) -> IRQ 42
[ 4.539387] igb 0000:05:00.1: setting latency timer to 64
[ 4.539651] alloc irq_desc for 65 on node -1
[ 4.539652] alloc kstat_irqs on node -1
[ 4.539657] igb 0000:05:00.1: irq 65 for MSI/MSI-X
[ 4.539659] alloc irq_desc for 66 on node -1
[ 4.539660] alloc kstat_irqs on node -1
[ 4.539663] igb 0000:05:00.1: irq 66 for MSI/MSI-X
[ 4.539665] alloc irq_desc for 67 on node -1
[ 4.539666] alloc kstat_irqs on node -1
[ 4.539669] igb 0000:05:00.1: irq 67 for MSI/MSI-X
[ 4.539671] alloc irq_desc for 68 on node -1
[ 4.539672] alloc kstat_irqs on node -1
[ 4.539675] igb 0000:05:00.1: irq 68 for MSI/MSI-X
[ 4.539677] alloc irq_desc for 69 on node -1
[ 4.539678] alloc kstat_irqs on node -1
[ 4.539681] igb 0000:05:00.1: irq 69 for MSI/MSI-X
[ 4.539682] alloc irq_desc for 70 on node -1
[ 4.539684] alloc kstat_irqs on node -1
[ 4.539687] igb 0000:05:00.1: irq 70 for MSI/MSI-X
[ 4.539688] alloc irq_desc for 71 on node -1
[ 4.539690] alloc kstat_irqs on node -1
[ 4.539692] igb 0000:05:00.1: irq 71 for MSI/MSI-X
[ 4.539694] alloc irq_desc for 72 on node -1
[ 4.539695] alloc kstat_irqs on node -1
[ 4.539698] igb 0000:05:00.1: irq 72 for MSI/MSI-X
[ 4.539700] alloc irq_desc for 73 on node -1
[ 4.539701] alloc kstat_irqs on node -1
[ 4.539704] igb 0000:05:00.1: irq 73 for MSI/MSI-X
[ 4.539725] igb 0000:05:00.1: 0 vfs allocated
[ 4.635671] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 4.636208] ata1.00: failed to enable AA(error_mask=0x1)
[ 4.636298] ata1.00: ATA-8: VB0250EAVER, HPG0, max UDMA/100
[ 4.636301] ata1.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 4.637023] ata1.00: failed to enable AA(error_mask=0x1)
[ 4.637127] ata1.00: configured for UDMA/100
[ 4.647569] igb 0000:05:00.1: Intel® Gigabit Ethernet Network Connection
[ 4.647572] igb 0000:05:00.1: eth1: (PCIe:2.5Gb/s:Width x4) 78:e3:b5:19:2d:19
[ 4.647650] igb 0000:05:00.1: eth1: PBA No: FFFFFF-0FF
[ 4.647652] igb 0000:05:00.1: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[ 4.651048] scsi 0:0:0:0: Direct-Access ATA VB0250EAVER HPG0 PQ: 0 ANSI: 5
[ 4.930462] usb 4-1: new low speed USB device using uhci_hcd and address 2
[ 5.104978] usb 4-1: New USB device found, idVendor=413c, idProduct=2003
[ 5.104981] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 5.104984] usb 4-1: Product: Dell USB Keyboard
[ 5.104986] usb 4-1: Manufacturer: Dell
[ 5.105049] usb 4-1: configuration #1 chosen from 1 choice
[ 5.373520] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 5.373844] ata2.00: ATA-9: M4-CT128M4SSD2, 0009, max UDMA/100
[ 5.373848] ata2.00: 250069680 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 5.374249] ata2.00: configured for UDMA/100
[ 5.389572] scsi 1:0:0:0: Direct-Access ATA M4-CT128M4SSD2 0009 PQ: 0 ANSI: 5
[ 6.112072] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 6.112423] ata3.00: ATA-9: M4-CT128M4SSD2, 0009, max UDMA/100
[ 6.112427] ata3.00: 250069680 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 6.112871] ata3.00: configured for UDMA/100
[ 6.128126] scsi 2:0:0:0: Direct-Access ATA M4-CT128M4SSD2 0009 PQ: 0 ANSI: 5
[ 6.447496] ata4: SATA link down (SStatus 0 SControl 300)
[ 7.186049] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 7.187647] ata5.00: ATAPI: hp DVD RAM UJ892, 1.23, max UDMA/100
[ 7.190155] ata5.00: configured for UDMA/100
[ 7.212668] scsi 4:0:0:0: CD-ROM hp DVD RAM UJ892 1.23 PQ: 0 ANSI: 5
[ 7.529539] ata6: SATA link down (SStatus 0 SControl 300)
[ 7.548509] usbcore: registered new interface driver hiddev
[ 7.551465] sd 0:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/232 GiB)
[ 7.551478] sd 1:0:0:0: [sdb] 250069680 512-byte logical blocks: (128 GB/119 GiB)
[ 7.551526] sd 1:0:0:0: [sdb] Write Protect is off
[ 7.551528] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 7.551536] sd 0:0:0:0: [sda] Write Protect is off
[ 7.551539] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 7.551546] sd 1:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7.551557] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7.551721] sd 2:0:0:0: [sdc] 250069680 512-byte logical blocks: (128 GB/119 GiB)
[ 7.551729] sda:
[ 7.551758] sd 2:0:0:0: [sdc] Write Protect is off
[ 7.551760] sd 2:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 7.551777] sd 2:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 7.551787] sdb:
[ 7.551868] sdc: sdb1 sdb2 sdb3 sdb4 < sdc1 sdc2 sdc3 sdc4 < sdb5 sdc5 sdb6 sdc6 sdb7 sdc7 sdb8 >
[ 7.552672] sdc8 >
[ 7.553238] sd 2:0:0:0: [sdc] Attached SCSI disk
[ 7.553255] sd 1:0:0:0: [sdb] Attached SCSI disk
[ 7.558568] sda1
[ 7.558799] sd 0:0:0:0: [sda] Attached SCSI disk
[ 7.563123] input: Dell Dell USB Keyboard as /devices/pci0000:00/0000:00:1d.0/usb4/4-1/4-1:1.0/input/input1
[ 7.563167] generic-usb 0003:413C:2003.0001: input,hidraw0: USB HID v1.10 Keyboard [Dell Dell USB Keyboard] on usb-0000:00:1d.0-1/input0
[ 7.563185] usbcore: registered new interface driver usbhid
[ 7.563187] usbhid: v2.6:USB HID core driver
[ 7.567262] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[ 7.567266] Uniform CD-ROM driver Revision: 3.20
Część 2 z 2
[ 7.567350] sr 4:0:0:0: Attached scsi CD-ROM sr0
[ 7.570212] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 7.570242] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 7.570270] sd 2:0:0:0: Attached scsi generic sg2 type 0
[ 7.570296] sr 4:0:0:0: Attached scsi generic sg3 type 5
[ 7.694808] md: raid1 personality registered for level 1
[ 7.697907] mdadm: sending ioctl 1261 to a partition!
[ 7.710944] md: md0 stopped.
[ 7.711808] md: bind<sdc1>
[ 7.711979] md: bind<sdb1>
[ 7.712829] raid1: raid set md0 active with 2 out of 2 mirrors
[ 7.712846] md0: detected capacity change from 0 to 199217152
[ 7.713472] md0: unknown partition table
[ 7.918222] md: md1 stopped.
[ 7.918939] md: bind<sdc3>
[ 7.919096] md: bind<sdb3>
[ 7.920502] raid1: raid set md1 active with 2 out of 2 mirrors
[ 7.920521] md1: detected capacity change from 0 to 1998573568
[ 7.921111] md1: unknown partition table
[ 8.125797] md: md2 stopped.
[ 8.126511] md: bind<sdc5>
[ 8.126683] md: bind<sdb5>
[ 8.128056] raid1: raid set md2 active with 2 out of 2 mirrors
[ 8.128074] md2: detected capacity change from 0 to 19997319168
[ 8.128661] md2: unknown partition table
[ 8.333233] md: md3 stopped.
[ 8.333950] md: bind<sdc6>
[ 8.334123] md: bind<sdb6>
[ 8.335464] raid1: md3 is not clean -- starting background reconstruction
[ 8.335467] raid1: raid set md3 active with 2 out of 2 mirrors
[ 8.335487] md3: detected capacity change from 0 to 79997886464
[ 8.336083] md3: unknown partition table
[ 8.540620] md: md4 stopped.
[ 8.541331] md: bind<sdc7>
[ 8.541489] md: bind<sdb7>
[ 8.542659] raid1: raid set md4 active with 2 out of 2 mirrors
[ 8.542676] md4: detected capacity change from 0 to 14997708800
[ 8.543270] md4: unknown partition table
[ 8.747749] md: md5 stopped.
[ 8.748468] md: bind<sdc8>
[ 8.748624] md: bind<sdb8>
[ 8.749452] raid1: raid set md5 active with 2 out of 2 mirrors
[ 8.749471] md5: detected capacity change from 0 to 6833557504
[ 8.750061] md5: unknown partition table
[ 8.984001] PM: Starting manual resume from disk
[ 8.984003] PM: Resume from partition 8:18
[ 8.984005] PM: Checking hibernation image.
[ 8.984183] PM: Error -22 checking image file
[ 8.984186] PM: Resume from disk failed.
[ 8.998911] EXT4-fs (md1): INFO: recovery required on readonly filesystem
[ 8.998916] EXT4-fs (md1): write access will be enabled during recovery
[ 9.009065] EXT4-fs (md1): recovery complete
[ 9.011690] EXT4-fs (md1): mounted filesystem with ordered data mode
[ 9.136488] udev[564]: starting version 164
[ 9.153308] ACPI: SSDT 00000000bf77e1d0 008F0 (v01 DpgPmm P001Ist 00000011 INTL 20051117)
[ 9.153729] ACPI: SSDT 00000000bf77eac0 004D5 (v01 PmRef P001Cst 00003001 INTL 20051117)
[ 9.167661] Monitor-Mwait will be used to enter C-1 state
[ 9.170136] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input2
[ 9.170142] ACPI: Power Button [PWRB]
[ 9.182551] input: PC Speaker as /devices/platform/pcspkr/input/input3
[ 9.186302] Monitor-Mwait will be used to enter C-3 state
[ 9.200536] Monitor-Mwait will be used to enter C-3 state
[ 9.200647] processor LNXCPU:00: registered as cooling_device0
[ 9.200753] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input4
[ 9.200812] ACPI: Power Button [PWRF]
[ 9.219787] processor LNXCPU:01: registered as cooling_device1
[ 9.221659] processor LNXCPU:02: registered as cooling_device2
[ 9.223820] processor LNXCPU:03: registered as cooling_device3
[ 9.233331] Error: Driver 'pcspkr' is already registered, aborting...
[ 9.412506] Adding 3906552k swap on /dev/sdc2. Priority:-1 extents:1 across:3906552k SS
[ 9.476958] loop: module loaded
[ 10.363486] md: resync of RAID array md3
[ 10.363489] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[ 10.363491] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
[ 10.363497] md: using 128k window, over a total of 78122936 blocks.
[ 10.514297] EXT4-fs (md3): mounted filesystem with ordered data mode
[ 10.609414] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 10.688164] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 12.184736] igb: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[ 12.185646] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 12.967105] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 12.968064] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
[ 22.651652] eth0: no IPv6 routers present
[ 23.581802] eth1: no IPv6 routers present
[ 112.942576] usb 4-1: USB disconnect, address 2
[ 514.414722] md: md3: resync done.
[ 514.428232] RAID1 conf printout:
[ 514.428235] --- wd:2 rd:2
[ 514.428238] disk 0, wo:0, o:1, dev:sdb6
[ 514.428241] disk 1, wo:0, o:1, dev:sdc6
POLECENIE cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point> <type> <options> <dump> <pass>
proc /proc proc defaults 0 0
# / was on /dev/md1 during installation
UUID=11562caa-7099-4d83-950d-3abae113d811 / ext4 errors=remount-ro 0 1
# /backup was on /dev/sda1 during installation
UUID=9377e4c7-8a49-4223-b9ce-a8a710278447 /backup ext4 defaults,discard 0 2
# /boot was on /dev/md0 during installation
UUID=04ee570a-3f0c-4177-97c7-db04b6317663 /boot ext4 defaults,discard 0 2
# /home was on /dev/md4 during installation
UUID=0fe0084c-a018-46fd-874a-ed513d489775 /home ext4 defaults,discard 0 2
# /tmp was on /dev/md5 during installation
UUID=5a6ab4bb-248c-4fec-816b-8a46425551e5 /tmp ext4 defaults,discard 0 2
# /usr was on /dev/md2 during installation
UUID=f42cb660-ed97-4090-872e-1adf31524fac /usr ext4 defaults,discard 0 2
# /var was on /dev/md3 during installation
UUID=ba98305f-6ba9-4ca6-9d1c-4386df9c7afe /var ext4 defaults,discard 0 2
# swap was on /dev/sdb2 during installation
#UUID=56d52264-87a8-4106-aeeb-fbcad676862a none swap sw 0 0
# swap was on /dev/sdc2 during installation
UUID=e28a319f-3a9d-4905-9db1-1aa0b727bebd none swap sw 0 0
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto 0 0
JohnnyB
10.07.2012, 21:41:07
Cytat(damiano123 @ 10.07.2012, 17:47:25 )

Serwer działa ok i nagle się dzieje takie coś
End request I/O Error, dev sdb, Sector
Read error on SWAP-DEVICE (8:16:7393280)
sam sobie odpowiedziałeś, dysk sdb padł
Cytat(damiano123 @ 10.07.2012, 17:47:25 )

# swap was on /dev/sdb2 during installation
#UUID=56d52264-87a8-4106-aeeb-fbcad676862a none swap sw 0 0
# swap was on /dev/sdc2 during installation
UUID=e28a319f-3a9d-4905-9db1-1aa0b727bebd none swap sw 0 0
/dev/scd0 /media/cdrom0 udf,iso9660 user,noauto 0 0
wg fstab masz obecnie wyłączony swap na sdb czyli powinno działać (swapon -s dla sprawdzenia). Na przyszłość zrób swapa na raidzie to nie będzie problemu, że nie wspomnę o dorzuceniu ramu.
damiano123
14.07.2012, 13:14:14
Hmm odnaleźliśmy takie cudo dla dysków Crucial M4 - działają tylko 5000 godzin a po tym czasie jeśli przekroczy w pracy 15K to się resetuje sam. Taki mały błąd firmware chipseta w Crucial M4 na cały świat. Nasz chyba właśnie coś koło tego jakiś6 miesiąc działa.
Po update software do 000F serwer widzi dyski ale gdy się włącza po wyborze F10 - F12 pisze Chacking Ram i nagle nic, ciemna plansza i migający po lewo u góry kursor.
Zmiana firmware miała nie kasować danych, wykonana była Windows 7 updater
http://www.crucial.com/support/firmware.aspxCo to może być?
Już ok, to był błąd Crucial M4 5000 godzin i finisz.