Teburin Abubuwan Ciki
1.97× - 3.39×
Ƙarfin gudun da CodedTeraSort ya samu
33%
Lokacin da aka yi na jujjuya bayanai a cikin tarin Facebook Hadoop
70%
Lokacin jujjuyawa a cikin aikace-aikacen haɗa kai na Amazon EC2
1. Gabatarwa
Tsare-tsaren rarraba kwamfuta kamar MapReduce da Spark sun kawo sauyi ga sarrafa bayanai masu girma, amma suna fuskantar matsala ta asali: nauyin sadarwa yayin lokacin jujjuya bayanai. Wannan takarda tana magana ne akan muhimmin tambaya ta yadda za a yi ciniki mafi kyau na ƙarin ƙarfin lissafi don rage nauyin sadarwa a cikin tsarin rarraba kwamfuta.
Binciken ya nuna cewa lissafi da nauyin sadarwa suna kishiyawar juna, suna kafa dangantakar tsaka-tsaki ta asali. Tsarin da aka tsara na Rarraba Kwamfuta Mai Lambobi (CDC) ya nuna cewa ƙara nauyin lissafi da kashi r yana haifar da damar lambobi wanda ke rage nauyin sadarwa da wannan kashi.
2. Tsarin Matsakaicin Tsaka-tsaki
2.1 Tsarin Tsarin
Tsarin rarraba kwamfuta ya ƙunshi K naɗorin kwamfuta waɗanda ke sarrafa bayanan shiga ta hanyar ayyukan Taswira da Ragewa. Kowane naɗo yana sarrafa wani yanki na fayilolin shiga kuma yana samar da ƙimomin tsaka-tsaki, waɗanda daga baya ake musayar su yayin lokacin jujjuyawa don ƙididdige sakamako na ƙarshe.
2.2 Lissafi da Nauyin Sadarwa
An ayyana nauyin lissafi r a matsayin jimillar adadin aiwatar da aikin Taswira da aka daidaita ta adadin fayilolin shiga. An ayyana nauyin sadarwa L a matsayin jimillar adadin bayanai (a cikin ragi) da aka musayar yayin jujjuyawa da aka daidaita ta girman duka na ƙimomin tsaka-tsaki.
3. Rarraba Kwamfuta Mai Lambobi (CDC)
3.1 Ƙirar Algorithm na CDC
Tsarin CDC yana ƙirƙira sanyawa bayanai da aikin aiki a hankali don ƙirƙirar damar watsa lambobi. Ta hanyar kimanta kowane aikin Taswira a zaɓaɓɓun naɗora r, tsarin yana ba da damar naɗora su ƙididdige saƙonnin lambobi waɗanda suke da amfani lokaci ɗaya ga masu karɓa da yawa.
3.2 Tsarin Lissafi
Mahimmin fahimta shine cewa tare da nauyin lissafi r, ana iya rage nauyin sadarwa zuwa:
$$L(r) = \frac{1}{r} \left(1 - \frac{r}{K}\right)$$
Wannan yana wakiltar dangantakar kishiyawa inda ƙara r da kashi yana rage L da wannan kashi, yana cimma matsakaicin ciniki mafi kyau.
4. Bincike na Ka'ida
4.1 Ƙaramin Iyaka na Ka'idar Bayanai
Takardar ta kafa ƙaramin iyaka na ka'idar bayanai akan nauyin sadarwa:
$$L^*(r) \geq \frac{1}{r} \left(1 - \frac{r}{K}\right)$$
An samu wannan iyaka ta amfani da hujjojin yanke-seti da dabarun rashin daidaiton bayanai.
4.2 Tabbacin Mafi Kyau
Tsarin CDC ya cimma wannan ƙaramin iyaka daidai, yana tabbatar da mafi kyawunsa. Hujjar ta ƙunshi nuna cewa duk wani tsari tare da nauyin lissafi r dole ne ya sami nauyin sadarwa aƙalla L*(r), kuma CDC ya cimma wannan ƙima daidai.
5. Sakamakon Gwaji
5.1 Aiwatar da CodedTeraSort
An yi amfani da dabarun lamba ga ma'aunin Hadoop TeraSort don haɓaka CodedTeraSort. Wannan aiwatarwar tana kiyaye API ɗaya da daidaitaccen TeraSort yayin haɗa ka'idodin CDC.
5.2 Kimanta Aiki
Sakamakon gwaji ya nuna cewa CodedTeraSort yana haɓaka gudunar aikin gabaɗaya da 1.97× zuwa 3.39× don saitunan da ake so na yau da kullun. Haɓaka aikin yana daidaitawa da ma'aunin nauyin lissafi r.
Muhimman Fahimta
- Matsakaicin Tsaka-tsaki: Lissafi da nauyin sadarwa suna kishiyawar juna
- Damar Lambobi: Ƙarin lissafi yana haifar da sababbin damar lambobi waɗanda ke rage sadarwa
- Tsarin Mafi Kyau: CDC ya cimma ƙaramin iyaka na ka'idar bayanai
- Tasiri na Aiki: Haɓaka gudun 1.97×-3.39× a aikace-aikacen rarrabuwa na ainihi
6. Aiwatar da Lamba
Lambar Karya na CodedTeraSort
class CodedTeraSort {
// Lokacin Taswira tare da nauyin lissafi r
void map(InputSplit split) {
for (int i = 0; i < r; i++) {
// Sarrafa ɓangaren bayanai tare da lamba
intermediateValues = processWithCoding(split, i);
}
}
// Lokacin Jujjuyawa tare da sadarwa mai lamba
void shuffle() {
// Samar da saƙonnin lambobi maimakon ɗanyen bayanai
codedMessages = generateCodedMessages(intermediateValues);
broadcast(codedMessages);
}
// Lokacin Ragewa tare da warware lamba
void reduce(CodedMessage[] messages) {
// Warware lamba don samun ƙimomin tsaka-tsaki da ake buƙata
decodedValues = decode(messages);
// Yi ragewa
output = performReduction(decodedValues);
}
}
7. Aikace-aikacen Gaba
Tsarin CDC yana da muhimman tasiri ga fagage daban-daban na rarraba kwamfuta:
- Koyon Injin: Horar da rarraba manyan hanyoyin sadarwar jijiyoyi tare da rage kayan sadarwa
- Lissafin Gefe: Ingantaccen lissafi a cikin mahallin da ke da ƙarancin bandwidth
- Koyo na Haɗin Kai: Horar da samfurin rarraba mai kiyaye sirri
- Sarrafa Rukuni: Sarrafa bayanai na ainihi tare da ingantaccen amfani da albarkatu
8. Nassoshi
- Li, S., Maddah-Ali, M. A., Yu, Q., & Avestimehr, A. S. (2017). A Fundamental Tradeoff between Computation and Communication in Distributed Computing. IEEE Transactions on Information Theory.
- Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM.
- Zaharia, M., et al. (2016). Apache Spark: A unified engine for big data processing. Communications of the ACM.
- Isard, M., et al. (2007). Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS.
- Apache Hadoop. (2023). Hadoop TeraSort Benchmark Documentation.
Binciken Kwararre: Juyin Juya Halin Ciniki na Lissafi-Sadarwa
Mai Kaifin Hankali: Wannan takarda ta ba da babbar abin mamaki ga al'adar hikima a cikin tsarin rarraba - ta tabbatar da cewa mun bar fa'idodin aiki masu yawa a kan teburin ta hanyar ɗaukar lissafi da sadarwa a matsayin matsalolin ingantawa masu zaman kansu. Haɓaka gudun 1.97×-3.39× ba kawai haɓaka ne kawai ba; shaida ce ta rashin ingancin gine-gine na asali a cikin tsare-tsaren rarraba na yanzu.
Sarkar Ma'ana: Binciken ya kafa kyakkyawar dangantakar lissafi: nauyin lissafi (r) da nauyin sadarwa (L) suna kishiyawar juna ($L(r) = \frac{1}{r}(1-\frac{r}{K})$). Wannan ba ka'ida kawai bane - ana iya samun sa ta hanyar ƙirar lamba mai kyau. Sarkar a bayyane take: ƙara lissafi na gida → yana haifar da damar lambobi → yana ba da damar ribar watsa shirye-shirye → yana rage kayan sadarwa → yana haɓaka aiwatarwa gabaɗaya. Wannan yayi daidai da ka'idodin da aka gani a cikin adabin lambar cibiyar sadarwa amma yana amfani da su a cikin tsare-tsaren lissafi.
Abubuwan Haske da Ragewa: Hikimar ta ta ta'allaka ne a cimma ƙaramin iyaka na ka'idar bayanai - lokacin da ka buga mafi kyawun ka'ida, ka san cewa ka warware matsalar gaba ɗaya. Aiwatar da CodedTeraSort tana nuna tasiri na ainihi, ba kawai kyawun ka'ida ba. Duk da haka, takardar ba ta nuna sarƙaƙƙiyar aiwatarwa ba - haɗa CDC cikin tsare-tsaren da ke akwai kamar Spark yana buƙatar manyan canje-canjen gine-gine. Kayan kula da ƙwaƙwalwar ajiya daga adana ƙimomin lissafi da yawa ba maras muhimmanci bane, kuma misalan Facebook da Amazon EC2 na takardar (33-70% lokacin jujjuyawa) suna nuna cewa tsarin na yanzu ba su da inganci.
Wayar da Kai na Aiki: Masu gine-ginen tsarin rarraba yakamata su sake kimanta ma'aunin lissafi-sadarwa nan da nan. Yuwuwar haɓaka gudun 3.39× yana nufin cewa masu gudanar da sarrafa bayanai masu girma za su iya cimma sakamako iri ɗaya tare da ƙananan tarin ko saurin komawa. Wannan yana da mahimmanci musamman ga horar da koyon injin inda matsalolin sadarwa suka tabbatar. Binciken ya nuna cewa yakamata mu ƙirƙira tsare-tsaren da suka ƙware suna ƙara lissafi a cikin gida don ceton duniya - wata hanyar da ba ta dace ba amma ta lissafi.
Idan aka kwatanta da hanyoyin gargajiya kamar DryadLINQ ko ingantaccen ginin Spark, CDC yana wakiltar sauyin yanayi maimakon haɓaka kawai. Yayin da tsarin rarraba ke ci gaba da haɓaka, wannan aikin zai zama kamar takardar MapReduce ta asali - yana canza yadda muke tunani game da cinikin albarkatu a cikin lissafin rarraba.